AI News Daily Briefing — February 13, 2026

The Big Story

Google's Gemini 3 Deep Think Hits 84.6% on ARC-AGI-2, Rewriting the Reasoning Ceiling

Google dropped the most consequential model update of the week: Gemini 3 Deep Think, an upgraded reasoning mode that scored 84.6% on the ARC-AGI-2 benchmark — a test François Chollet himself certified as a legitimate breakthrough. It also set new state-of-the-art marks on Humanity's Last Exam and frontier math and science benchmarks. As Ethan Mollick pointed out, ARC-AGI-2 is approaching saturation less than a year after it was introduced, which says more about the pace of progress than any single score. The model is available now to Gemini Ultra subscribers and select researchers via API, with Vertex AI early access for developers.

What makes this release strategically interesting isn't just the benchmark numbers — it's the application layer. Google is explicitly positioning Deep Think as a scientific research tool. Demos include a Rutgers mathematician using it to explore connections between general relativity and quantum mechanics, Duke researchers optimizing semiconductor crystal growth recipes, and the model converting hand-drawn sketches into 3D-printable STL files. This is Google making a bet that reasoning models will differentiate not on chatbot vibes but on domain-specific utility for professionals who can actually validate the output.

For builders, the implication is clear: the reasoning model wars are shifting from 'who scores highest' to 'who delivers the most useful chain-of-thought for real workflows.' If your product depends on LLM reasoning — agents, code generation, scientific tooling — the Deep Think API access via Vertex AI is worth evaluating immediately. The benchmark arms race is nearly over; the application arms race is just starting.

@demishassabis Read source View tweet 4,022 engagement

AI & Machine Learning

OpenAI Ships GPT-5.3-Codex-Spark at 1000+ Tokens/Second for Pro Users

Sam Altman announced a research preview of GPT-5.3-Codex-Spark, a new model pushing inference past 1,000 tokens per second — a speed that makes real-time code generation feel instantaneous. Available only to Codex Pro subscribers, this is OpenAI doubling down on the premium developer tier as its growth engine.

@sama Read source View tweet 13,716 engagement

GLM-5 Emerges as First Truly Competitive Open-Weight Coding Model

GLM-5 is turning heads: multiple developers report it matches Gemini 3.0-level performance on agentic coding tasks while being fully open-weight. Theo ranked it alongside Opus 4.6 and Codex 5.3 as the only three models worth using for code generation — a significant milestone for the open-source camp.

@theo Read source View tweet 2,154 engagement

Anthropic Donates $20M to Shape Bipartisan AI Policy

Anthropic contributed $20M to Public First Action, a bipartisan AI policy organization. It's a substantial bet that shaping regulation proactively is cheaper than reacting to it — and a signal that Anthropic sees policy as a competitive moat, not just a compliance cost.

@AnthropicAI Read source View tweet 851 engagement

Waymo's 6th-Gen System Begins Fully Driverless Testing

Waymo unveiled its sixth-generation autonomous driving hardware, designed for mass production with Hyundai and now testing without safety drivers on public roads. This is the clearest signal yet that robotaxis are entering the manufacturing-scale phase, not just the R&D phase.

@Waymo Read source View tweet 381 engagement

Tensol AI Converts AI Models into Autonomous Employees (YC-Backed)

YC-backed Tensol AI is building a platform that turns OpenClaw into 24/7 AI workers for support, engineering, and sales. The pitch is less 'copilot' and more 'headcount replacement' — a framing shift that's becoming harder for the industry to dance around.

@ycombinator Read source View tweet 564 engagement

Andrew Ng Launches A2A Agent-to-Agent Protocol Course

New free course from DeepLearning.AI, built with Google Cloud and IBM Research, teaches developers how to connect AI agents across frameworks using the A2A protocol. If multi-agent orchestration is your roadmap, this is the fastest on-ramp available.

@AndrewYNg Read source View tweet 171 engagement

Databricks CTO Warns of Persistent AI Demo-to-Production Gap

Matei Zaharia called out the industry's dirty secret: demos that work 50% of the time don't become reliable production systems without fundamental engineering effort. A useful reality check as agent hype accelerates.

@databricks Read source View tweet 32 engagement

Developer Tools & Infrastructure

Karpathy Simplifies Micrograd's Backpropagation with Cleaner Chain Rule

Andrej Karpathy shared an elegant refactor of micrograd's backward pass — returning local gradients to simplify chain rule logic. A small code change, but a pedagogically important one for anyone learning autograd internals from the most-watched ML educator alive.

@karpathy Read source View tweet 1,598 engagement

Cloudflare Ships Real-Time Markdown Conversion for AI Agents at the Edge

Cloudflare now automatically converts web content to Markdown for AI agents via content negotiation — no scraping hacks required. This is infrastructure-level acknowledgment that AI agents are a first-class consumer of the web, and it meaningfully simplifies agentic web access.

@Cloudflare Read source View tweet 2,270 engagement

Docker Sandboxes Use MicroVMs to Isolate AI Coding Agents

Docker's new Sandboxes feature runs coding agents in microVM isolation, preventing system-level access. As autonomous coding agents proliferate, this kind of containment layer shifts from nice-to-have to table stakes.

@Docker Read source View tweet 45 engagement

Startups & Funding

Karpathy Backs Simile AI's $100M Series A for Digital Twin Technology

Simile AI raised a $100M Series A led by Index Ventures, with an angel investment from Andrej Karpathy, to build digital twins from interview data and real-world information. Karpathy described it as working on an 'under-explored dimension' of LLM capabilities — vague, but his involvement alone will draw attention to the company.

@karpathy Read source View tweet 4,381 engagement

IBM to Triple US Entry-Level Hiring with AI-Adapted Roles

IBM plans to triple entry-level hiring in 2026 by restructuring jobs around AI capabilities — a counter-narrative to the 'AI kills junior roles' doom loop. Worth watching whether this is a genuine workforce model or a PR play.

@TechCrunch Read source View tweet 31 engagement

Security

Amazon's Ring Cancels Flock Safety Partnership After Privacy Backlash

Ring terminated its surveillance data-sharing partnership with Flock Safety following user backlash. A reminder that even in the smart home era, there's a line where consumer tolerance for data sharing snaps — and companies find out after they cross it.

@verge Read source View tweet 564 engagement

Supabase Hit by US-East-2 Outage, Reverts Networking Config

Supabase experienced a significant outage in its US-East-2 region with complete network traffic loss, traced to an internal networking configuration change. They identified and reverted the issue, but if you're building on Supabase, this is a good prompt to review your multi-region resilience strategy.

@supabase Read source View tweet 181 engagement

In Other News

Memory Cost Surge Forces Cloud Providers to Rethink GPU Infrastructure

SemiAnalysis reports a threefold increase in memory costs is forcing cloud providers to raise prices or halt GPU server expansion. This supply-side pressure could materially impact inference pricing and availability for startups relying on cloud GPU capacity through 2026.

@SemiAnalysis_ Read source View tweet 112 engagement

YouTube Finally Launches Native Apple Vision Pro App

YouTube released a dedicated Vision Pro app nearly two years after the headset launched — a conspicuous delay that says more about Google's strategic calculus around Apple's spatial computing platform than any technical constraint.

@9to5mac Read source View tweet 350 engagement

Quick Hits

Runway launches Story Panels for consistent multi-shot AI video generation

@runwayml

Moonshot AI's Kimi K2.5 manages parallel coding, research, and browsing workflows

@DeepLearningAI

Apple confirms new Siri still on track for 2026 despite internal delays

@9to5mac

Golpo AI raises $4.1M seed to turn documents into whiteboard explainer videos

@ycombinator

Step 3.5 Flash: open model with 11B active params reaches frontier performance

@_akhaliq

WebKit team outlines 2026 browser interoperability priorities

@9to5mac

LangChain's Interrupt conference set for May 13-14 in SF with Andrew Ng keynote

@LangChain

GitHub reliability complaints mount as uptime dips below 99%

@ThePrimeagen

The Takeaway

Today's news makes one thing unmistakable: the frontier model competition is shifting from benchmark bragging rights to domain-specific utility. Google is positioning Deep Think for scientists, OpenAI is optimizing for raw developer speed, and GLM-5 is proving open-weight models can finally compete on code. Meanwhile, Cloudflare and Docker are building the plumbing that treats AI agents as first-class citizens of the internet. For builders, the strategic question is no longer 'which model is best' — it's 'which model is best for my specific workflow, and what infrastructure do I need around it.'

AI News Daily Briefing — February 13, 2026

Google's Gemini 3 Deep Think Hits 84.6% on ARC-AGI-2, Rewriting the Reasoning Ceiling

OpenAI Ships GPT-5.3-Codex-Spark at 1000+ Tokens/Second for Pro Users

GLM-5 Emerges as First Truly Competitive Open-Weight Coding Model

Anthropic Donates $20M to Shape Bipartisan AI Policy

Waymo's 6th-Gen System Begins Fully Driverless Testing

Tensol AI Converts AI Models into Autonomous Employees (YC-Backed)

Andrew Ng Launches A2A Agent-to-Agent Protocol Course

Databricks CTO Warns of Persistent AI Demo-to-Production Gap

Karpathy Simplifies Micrograd's Backpropagation with Cleaner Chain Rule

Cloudflare Ships Real-Time Markdown Conversion for AI Agents at the Edge

Docker Sandboxes Use MicroVMs to Isolate AI Coding Agents

Karpathy Backs Simile AI's $100M Series A for Digital Twin Technology

IBM to Triple US Entry-Level Hiring with AI-Adapted Roles

Amazon's Ring Cancels Flock Safety Partnership After Privacy Backlash

Supabase Hit by US-East-2 Outage, Reverts Networking Config

Memory Cost Surge Forces Cloud Providers to Rethink GPU Infrastructure

YouTube Finally Launches Native Apple Vision Pro App

Get this briefing in your inbox