AI News Daily Briefing — February 13, 2026
Google's Gemini 3 Deep Think Hits 84.6% on ARC-AGI-2, Rewriting the Reasoning Ceiling
Google dropped the most consequential model update of the week: Gemini 3 Deep Think, an upgraded reasoning mode that scored 84.6% on the ARC-AGI-2 benchmark — a test François Chollet himself certified as a legitimate breakthrough. It also set new state-of-the-art marks on Humanity's Last Exam and frontier math and science benchmarks. As Ethan Mollick pointed out, ARC-AGI-2 is approaching saturation less than a year after it was introduced, which says more about the pace of progress than any single score. The model is available now to Gemini Ultra subscribers and select researchers via API, with Vertex AI early access for developers.
What makes this release strategically interesting isn't just the benchmark numbers — it's the application layer. Google is explicitly positioning Deep Think as a scientific research tool. Demos include a Rutgers mathematician using it to explore connections between general relativity and quantum mechanics, Duke researchers optimizing semiconductor crystal growth recipes, and the model converting hand-drawn sketches into 3D-printable STL files. This is Google making a bet that reasoning models will differentiate not on chatbot vibes but on domain-specific utility for professionals who can actually validate the output.
For builders, the implication is clear: the reasoning model wars are shifting from 'who scores highest' to 'who delivers the most useful chain-of-thought for real workflows.' If your product depends on LLM reasoning — agents, code generation, scientific tooling — the Deep Think API access via Vertex AI is worth evaluating immediately. The benchmark arms race is nearly over; the application arms race is just starting.
OpenAI Ships GPT-5.3-Codex-Spark at 1000+ Tokens/Second for Pro Users
Sam Altman announced a research preview of GPT-5.3-Codex-Spark, a new model pushing inference past 1,000 tokens per second — a speed that makes real-time code generation feel instantaneous. Available only to Codex Pro subscribers, this is OpenAI doubling down on the premium developer tier as its growth engine.
GLM-5 Emerges as First Truly Competitive Open-Weight Coding Model
GLM-5 is turning heads: multiple developers report it matches Gemini 3.0-level performance on agentic coding tasks while being fully open-weight. Theo ranked it alongside Opus 4.6 and Codex 5.3 as the only three models worth using for code generation — a significant milestone for the open-source camp.
Anthropic Donates $20M to Shape Bipartisan AI Policy
Anthropic contributed $20M to Public First Action, a bipartisan AI policy organization. It's a substantial bet that shaping regulation proactively is cheaper than reacting to it — and a signal that Anthropic sees policy as a competitive moat, not just a compliance cost.
Waymo's 6th-Gen System Begins Fully Driverless Testing
Waymo unveiled its sixth-generation autonomous driving hardware, designed for mass production with Hyundai and now testing without safety drivers on public roads. This is the clearest signal yet that robotaxis are entering the manufacturing-scale phase, not just the R&D phase.
Tensol AI Converts AI Models into Autonomous Employees (YC-Backed)
YC-backed Tensol AI is building a platform that turns OpenClaw into 24/7 AI workers for support, engineering, and sales. The pitch is less 'copilot' and more 'headcount replacement' — a framing shift that's becoming harder for the industry to dance around.
Andrew Ng Launches A2A Agent-to-Agent Protocol Course
New free course from DeepLearning.AI, built with Google Cloud and IBM Research, teaches developers how to connect AI agents across frameworks using the A2A protocol. If multi-agent orchestration is your roadmap, this is the fastest on-ramp available.
Databricks CTO Warns of Persistent AI Demo-to-Production Gap
Matei Zaharia called out the industry's dirty secret: demos that work 50% of the time don't become reliable production systems without fundamental engineering effort. A useful reality check as agent hype accelerates.
Karpathy Simplifies Micrograd's Backpropagation with Cleaner Chain Rule
Andrej Karpathy shared an elegant refactor of micrograd's backward pass — returning local gradients to simplify chain rule logic. A small code change, but a pedagogically important one for anyone learning autograd internals from the most-watched ML educator alive.
Cloudflare Ships Real-Time Markdown Conversion for AI Agents at the Edge
Cloudflare now automatically converts web content to Markdown for AI agents via content negotiation — no scraping hacks required. This is infrastructure-level acknowledgment that AI agents are a first-class consumer of the web, and it meaningfully simplifies agentic web access.
Docker Sandboxes Use MicroVMs to Isolate AI Coding Agents
Docker's new Sandboxes feature runs coding agents in microVM isolation, preventing system-level access. As autonomous coding agents proliferate, this kind of containment layer shifts from nice-to-have to table stakes.
Karpathy Backs Simile AI's $100M Series A for Digital Twin Technology
Simile AI raised a $100M Series A led by Index Ventures, with an angel investment from Andrej Karpathy, to build digital twins from interview data and real-world information. Karpathy described it as working on an 'under-explored dimension' of LLM capabilities — vague, but his involvement alone will draw attention to the company.
IBM to Triple US Entry-Level Hiring with AI-Adapted Roles
IBM plans to triple entry-level hiring in 2026 by restructuring jobs around AI capabilities — a counter-narrative to the 'AI kills junior roles' doom loop. Worth watching whether this is a genuine workforce model or a PR play.
Amazon's Ring Cancels Flock Safety Partnership After Privacy Backlash
Ring terminated its surveillance data-sharing partnership with Flock Safety following user backlash. A reminder that even in the smart home era, there's a line where consumer tolerance for data sharing snaps — and companies find out after they cross it.
Supabase Hit by US-East-2 Outage, Reverts Networking Config
Supabase experienced a significant outage in its US-East-2 region with complete network traffic loss, traced to an internal networking configuration change. They identified and reverted the issue, but if you're building on Supabase, this is a good prompt to review your multi-region resilience strategy.
Memory Cost Surge Forces Cloud Providers to Rethink GPU Infrastructure
SemiAnalysis reports a threefold increase in memory costs is forcing cloud providers to raise prices or halt GPU server expansion. This supply-side pressure could materially impact inference pricing and availability for startups relying on cloud GPU capacity through 2026.
YouTube Finally Launches Native Apple Vision Pro App
YouTube released a dedicated Vision Pro app nearly two years after the headset launched — a conspicuous delay that says more about Google's strategic calculus around Apple's spatial computing platform than any technical constraint.
Today's news makes one thing unmistakable: the frontier model competition is shifting from benchmark bragging rights to domain-specific utility. Google is positioning Deep Think for scientists, OpenAI is optimizing for raw developer speed, and GLM-5 is proving open-weight models can finally compete on code. Meanwhile, Cloudflare and Docker are building the plumbing that treats AI agents as first-class citizens of the internet. For builders, the strategic question is no longer 'which model is best' — it's 'which model is best for my specific workflow, and what infrastructure do I need around it.'