Claude Opus 4: World's Best Coding Model, and Shopify's AI Shopping Agents
Anthropic releases Claude Opus 4 with a 72.5% SWE-bench score — the highest ever measured — while Claude Sonnet 4 matches it at one-fifth the price. Shopify launches its AI-powered product Catalog and voice Sidekick, signalling that agentic commerce is no longer theoretical. May 2025 was the month AI became the best programmer in the room and started selling things.

Giovanni van Dam
IT & Business Development Consultant
Claude Opus 4: 72.5% on SWE-bench — A New Standard for AI Coding
On 22 May 2025, Anthropic released Claude Opus 4, and the benchmark results were unambiguous: 72.5% on SWE-bench Verified, the highest score ever recorded. SWE-bench evaluates an AI model's ability to resolve real-world software engineering issues from open-source repositories — not toy problems, but actual bugs, feature requests, and refactoring tasks from production codebases.
Opus 4 was not merely incrementally better. It demonstrated sustained autonomous coding capability, working through complex multi-file changes, maintaining context across large codebases, and producing code that passed existing test suites without human intervention. Anthropic positioned it as the world's best coding model, and no competing benchmark challenged that claim.
For software teams, the implications were immediate. Code review, bug triage, test generation, and refactoring — tasks that consume 40–60% of senior developer time — could now be meaningfully delegated to an AI agent. Not as a suggestion engine, but as an autonomous contributor that writes, tests, and submits code.
Claude Sonnet 4: The Same Capability at One-Fifth the Price
Perhaps more significant than Opus 4 itself was the simultaneous release of Claude Sonnet 4. Sonnet 4 matched or closely approached Opus 4's coding performance at approximately one-fifth the cost. On many practical benchmarks, the difference between the two models was within the margin of error.
This pricing structure reflected Anthropic's strategic bet that the market was bifurcating: a premium tier for the most demanding, highest-stakes workloads, and a high-performance tier that would capture the vast majority of production usage. For most engineering teams, Sonnet 4 would be the right choice — delivering 95% of Opus 4's capability at 20% of the cost.
The competitive pressure this placed on the rest of the market was severe. OpenAI's models, Google's Gemini, and open-weight alternatives all had to contend with a model that was simultaneously the best coder available and aggressively priced for production deployment.
Shopify Catalog and Sidekick: AI Agents Enter Commerce
While Anthropic was transforming software engineering, Shopify was transforming commerce. At its Spring 2025 announcements, Shopify introduced two significant AI capabilities:
- Shopify Catalog: An AI-powered product enrichment system that automatically generates descriptions, categorises products, optimises search metadata, and creates structured data for millions of SKUs. For merchants with large catalogues, this eliminated hundreds of hours of manual content creation.
- Sidekick (Voice): A voice-enabled AI assistant for Shopify merchants that could answer questions about store performance, suggest marketing strategies, and execute administrative tasks through natural conversation. "What were my top-selling products last week?" became a voice query rather than a dashboard drill-down.
These were not experimental features. Shopify serves over 2 million merchants globally, and embedding AI directly into the merchant experience normalised agentic commerce for a massive user base. The message was clear: AI is not a feature you add to commerce — it is becoming the commerce platform itself.
AI Coding in the Enterprise: From Copilot to Colleague
Opus 4's SWE-bench performance accelerated a transition that had been building throughout 2024: the shift from AI as a coding assistant (suggesting completions, answering questions) to AI as a coding colleague (autonomously completing tasks, submitting pull requests, resolving issues).
The practical adoption pattern emerging in enterprise software teams followed a clear progression:
- Level 1 — Autocomplete: AI suggests code as you type. Productivity gain: 10–20%. This is table stakes by mid-2025.
- Level 2 — Chat-based assistance: AI answers questions, explains code, generates tests. Productivity gain: 20–40%.
- Level 3 — Autonomous task completion: AI receives an issue, writes the code, runs the tests, opens a pull request. Productivity gain: 40–70% for routine tasks.
- Level 4 — Sustained autonomous work: AI works on complex, multi-step engineering tasks for hours with minimal human oversight. This is where Opus 4 operates.
Most enterprise teams in May 2025 were at Level 2, with early adopters pushing into Level 3. The gap between where most teams are and where the technology allows them to be represents an enormous productivity opportunity. Discuss how to structure AI-augmented development for your engineering team.
Agentic Commerce Is No Longer Theoretical
The convergence of Opus 4's coding capability and Shopify's commerce AI pointed to a broader truth about May 2025: agentic AI had moved from research demos to production deployments. AI agents were writing code, managing product catalogues, answering merchant questions, and automating customer interactions — not in laboratory conditions, but at the scale of millions of users.
For businesses in e-commerce, retail, and technology, the competitive clock was now ticking. Merchants using Shopify's AI tools would produce better product content faster. Engineering teams using Opus 4 or Sonnet 4 would ship features more quickly. The productivity gap between AI-adopting and non-adopting businesses would widen with each quarter.
The strategic imperative was no longer to evaluate AI — it was to deploy it. Start with the highest-leverage workflows, measure the impact, and expand systematically. Learn how embedded technology leadership accelerates AI deployment across commerce and engineering.
Frequently Asked Questions
Further Reading
Related Articles
Agent-to-Agent Communication: Google's A2A Protocol and Agentic Infrastructure
Google launches the Agent-to-Agent (A2A) open protocol, creating a standard for AI agents to discover and communicate with each other. Meta releases Llama 4 Scout and Maverick, Amazon unveils Nova Act for browser automation, and Google reveals its Ironwood TPU. April 2025 was the month agentic AI got its infrastructure layer.
Apple's Liquid Glass and Cautious AI: What WWDC25 Tells Us About Enterprise Readiness
Apple unveils Liquid Glass — its most significant design overhaul in a decade — while quietly delaying advanced Siri capabilities to 2026. WWDC25 revealed a company that prioritises on-device processing, privacy, and polish over the speed-at-all-costs approach of its competitors. For enterprise leaders, Apple's caution is a signal worth reading carefully.

Giovanni van Dam
MBA-qualified entrepreneur in IT & business development. I help founder-led businesses scale through technology via GVDworks and build AI-powered SaaS at Veldspark Labs.