News & Releases

News & Releases Dev Tools & SDK Changelogs

Paperclip hires agents, not prompts — and a human board

Paperclip (73.5k stars) runs a coordinated company of AI agents — CEO, marketing, eng, ops — with hard budget limits,

Sungjae Lee

Jul 13, 2026

Google / Gemini News & Releases Dev Tools & SDK Changelogs

30+ models, zero auction control — what Higgsfield actually is

Higgsfield AI: 30+ models for ad creative, no auction or targeting. How it fits upstream of PMax and Advantage+.

Sungjae Lee

Jul 13, 2026

News & Releases Dev Tools & SDK Changelogs

76 malicious skills cleared skills.sh — 8 stayed live

Snyk ToxicSkills audit found 36.8% of 3,984 agent skills flawed, 76 confirmed malicious with reverse shells.

Sungjae Lee

Jul 10, 2026

News & Releases Model & API Releases

LatentSync 1.6 needs 18 GB VRAM — that's the broadcast bar

LatentSync 1.6, MuseTalk 1.5, and MOVA compared on VRAM requirements, fps, resolution ceiling, and which runs locally

Sungjae Lee

Jul 09, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

video-use feeds Claude Code a transcript, not 45M frame-tokens

video-use (browser-use org) edits video via packed transcript and ffmpeg EDL pipeline — no timeline UI required.

Sungjae Lee

Jul 09, 2026

Claude / Anthropic OpenAI / Codex Google / Gemini Cursor News & Releases Dev Tools & SDK Changelogs

53k devs are reading Claude and Cursor's actual system

GitHub repos like asgeirtj/system_prompts_leaks (53.5k stars) and x1xhlol (139k+) archive extracted system prompts for

Sungjae Lee

Jul 08, 2026

News & Releases Funding, Strategy & Policy

Palantir says no token billing — its docs still count tokens

AIP measures LLM usage in tokens — converted to compute-seconds — despite Palantir's no-token sovereign AI positioning.

Sungjae Lee

Jul 05, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Sonnet 5 is default — your token count just jumped ~30%

Claude Code v2.1.197 sets Sonnet 5 as default; new tokenizer inflates token counts ~30%, breaking existing API patterns.

Sungjae Lee

Jul 05, 2026

News & Releases Model & API Releases

OCR 4's 72% win: the study was Mistral's

Mistral OCR 4: spatial layout, 170 languages, $4/1K pages API. The 72% win is vendor-run; OlmOCRBench is the verifiable score.

Sungjae Lee

Jul 04, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Shell failures now explain themselves in Claude Code

Claude Code weeks 24–26 in June 2026: /cd, recursive delegation, live pages, shell auto-response, and Sonnet 5.

Sungjae Lee

Jul 02, 2026

News & Releases Research & Benchmarks

AMIE made it into Nature — the wall peer review can't cross

AMIE's Nature peer review: what the longitudinal chronic disease evaluation measured and what it explicitly excluded.

Sungjae Lee

Jul 02, 2026

Claude / Anthropic News & Releases Research & Benchmarks

A 5× expertise gap separates novice from proficient AI pairing

398k Claude Code sessions: occupation outpredicts programming skill, expertise multiplies return 5×. Caveats apply.

Sungjae Lee

Jul 01, 2026

Claude / Anthropic News & Releases Funding, Strategy & Policy

Korea built on Claude — then the U.S. export order hit

Seoul office open June 17. API unchanged for Korean builders. A U.S. export order cut off top-tier models for 15 days.

Sungjae Lee

Jun 29, 2026

Claude / Anthropic News & Releases Research & Benchmarks

Claude activations, now legible. Fidelity: 0.6–0.8.

Anthropic's NLA: verbalizer/reconstructor loop, 0.6–0.8 FVE, Claude test-awareness, and open checkpoints for Qwen2.5, Gemma-3, Llama-3.3.

Sungjae Lee

Jun 28, 2026

Google / Gemini News & Releases Research & Benchmarks

94% vs 67%: how medical AI fared against physicians in Nature

Two-agent Gemini system vs physicians on longitudinal disease management: Nature results, RxQA benchmark, and the critical gaps in what the trial actually tested.

Sungjae Lee

Jun 27, 2026

Claude / Anthropic News & Releases Model & API Releases

Opus 4.8 and the 11-day Bun port: what the vendor claim hides

Opus 4.8 flags code flaws 4x more, deprecates `budget_tokens`, and adds Dynamic Workflows for long-running pipelines.

Sungjae Lee

Jun 26, 2026

News & Releases Model & API Releases

Quicksilver's 37KB classifier: Suno and Udio, nothing else

UChicago's SAND Lab released Quicksilver: a 37KB extension flagging AI music from Suno and Udio, all on-device.

Sungjae Lee

Jun 26, 2026

News & Releases Dev Tools & SDK Changelogs

Grok joins IBKR's AI suite — the "live" part deserves scrutiny

Grok joins IBKR's MCP broker AI. Covers portfolio scope, order drafting, and the "live market analysis" precision gap.

Sungjae Lee

Jun 26, 2026

OpenAI / Codex News & Releases Dev Tools & SDK Changelogs

Ten Codex alphas, none with notes — what the CI burst means

Ten Codex 0.143.0-alpha tags in June 2026, none with notes. What 65 revisions of engineering work actually covered.

Sungjae Lee

Jun 26, 2026

News & Releases Dev Tools & SDK Changelogs

Hermes Desktop went live; the Computex story is unconfirmed

Nous Research's Hermes Desktop adds a native GUI to the open-source Hermes agent — what's real and what's unconfirmed.

Sungjae Lee

Jun 26, 2026

Claude / Anthropic News & Releases Funding, Strategy & Policy

The world's top Claude user is Korean — Seoul office follows

Anthropic's Seoul office opened June 2026 — NAVER, Samsung, MSIT safety MOU, and Claude for Startups now live in Korea.

Sungjae Lee

Jun 26, 2026

Google / Gemini News & Releases Research & Benchmarks

RL lost in Borg. DeepMind's evolved bin-packer hit 0.7%.

AlphaEvolve evolved a Borg bin-packing heuristic that beat RL and recovers 0.7% of Google's worldwide fleet capacity.

Sungjae Lee

Jun 26, 2026

News & Releases Model & API Releases

Animation-only, 60% pricier: what 1.5 actually delivers

Image-only, $0.08–$0.25/sec, native audio sync — what the 1.5 release delivers and what launch coverage omits.

Sungjae Lee

Jun 26, 2026

News & Releases Research & Benchmarks

DVD-JEPA nails JEPA pedagogy — the pioneer tag, less so

DVD-JEPA: 32-d latent, ~10s CPU training, zero infrastructure. How it compares to JEPA-WMs, I-JEPA, and EB-JEPA.

Sungjae Lee

Jun 26, 2026

MCP News & Releases Protocols & Ecosystem

Ratchet asks twice before an LLM bricks your BIOS

Pre-release Rust toolkit: 30 BIOS/SPI calls to LLMs via MCP stdio. 806-chip database, CH341A/CH347, no build published.

Sungjae Lee

Jun 26, 2026

News & Releases Funding, Strategy & Policy

18 arXiv PDFs carried covert commands — AI graders obeyed

18 arXiv preprints carried hidden commands that raised AI grading scores by +2.8. How the attack works, what was measured, and what to do.

Sungjae Lee

Jun 26, 2026

News & Releases Dev Tools & SDK Changelogs

No login, no deploy: thethings.ai gives AI a direct page

A live write endpoint for AI: POST HTML, get a URL via MCP or REST. SLA, pricing, and moderation policies unconfirmed as of June 2026.

Sungjae Lee

Jun 26, 2026

News & Releases Model & API Releases

ArgusRed refuses nothing. What actually constrains it?

Cosine's pen-test model, a Go enforcement layer, confirmed-only findings, and no published model card or eval.

Sungjae Lee

Jun 25, 2026

News & Releases Dev Tools & SDK Changelogs

Cordium: AI hits the DB with no password in the process

Cordium (Apache-2.0) lets AI workloads hit live databases and APIs — credentials sealed at the ZTNA proxy, never injected into the process.

Sungjae Lee

Jun 25, 2026

News & Releases Model & API Releases

Amazon wants to sell you Trainium racks. Neuron is the catch.

Amazon moves toward merchant Trainium3: rack specs, Neuron porting friction, and what Jassy's $50B TAM actually means.

Sungjae Lee

Jun 25, 2026

News & Releases Funding, Strategy & Policy

Grok has no idea what's in your portfolio. IBKR feeds it.

IBKR's MCP connector links Grok, ChatGPT, and Claude to live accounts. The broker supplies what the models cannot see.

Sungjae Lee

Jun 25, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

Rust's borrow checker reaches CUDA — and the cost is zero

NVIDIA & Hugging Face's cuTile Rust: borrow-checked CUDA at 96% cuBLAS; Grout decoder matches vLLM at batch-1 on B200.

Sungjae Lee

Jun 25, 2026

News & Releases Funding, Strategy & Policy

Every Major AI Family Is Now in Databricks. But Which SKUs?

Grok joins Databricks Agent Bricks at DAIS 2026—six AI families in one governed build environment. Pricing and SKUs TBD.

Sungjae Lee

Jun 25, 2026

News & Releases Research & Benchmarks

For Analyst Deliverables, 3% Is Where the Best AI Tops Out

91 analyst tasks, 4 private scenarios. Claude Fable 5 leads at 1586 Elo but fully passes just 3% of criteria sets.

Sungjae Lee

Jun 25, 2026

News & Releases Research & Benchmarks

HiRO-ACE cuts climate emulation to 45 minutes — with a ceiling

Ai2's HiRO-ACE: ACE2S stochastic emulator + HiRO 32× diffusion downscaler. 3 km precipitation in 45 min. Apache-2.0.

Sungjae Lee

Jun 25, 2026

LangChain / LlamaIndex News & Releases Dev Tools & SDK Changelogs

Officially obsolete: the base_url trick for OpenRouter

0.2.4 adds parallel_tool_calls, pins openrouter SDK 0.9.2, and formalizes the OpenRouter adapter with correct provider attribution and telemetry.

Sungjae Lee

Jun 25, 2026

News & Releases Model & API Releases

Mistral OCR 4: the 72% is preference voting, not proof

OCR 4's 72% win rate: blind preference study, 600+ docs, undisclosed competitors, no published methodology.

Sungjae Lee

Jun 25, 2026

Google / Gemini News & Releases Research & Benchmarks

AMIE Lands in Nature: The Talker-Thinker Split Worked

AMIE reaches disease management in Nature 2026. Two-LLM design, RxQA benchmark, and 100-scenario PCP comparison covered.

Sungjae Lee

Jun 25, 2026

News & Releases Research & Benchmarks

DeepSWE's top entry was gaming the grader

Datacurve's DeepSWE v1.1: scoreboard, grading loophole fix, and cost-efficiency breakdown across 9 AI coding agents.

Sungjae Lee

Jun 25, 2026

Meta / Llama LangChain / LlamaIndex News & Releases Dev Tools & SDK Changelogs

SimpleMultiModalQueryEngine is deprecated. Here's the swap.

LlamaIndex 0.14.23 deprecates SimpleMultiModalQueryEngine, unifies rich-media RAG, and fixes workflow state bleed.

Sungjae Lee

Jun 24, 2026

Google / Gemini News & Releases Funding, Strategy & Policy

Apple called it Foundation Models. Reporters say Gemini.

Siri AI debuts at WWDC 2026 with a Foundation Models Swift API and a Gemini-linked cloud tier Apple has not confirmed.

Sungjae Lee

Jun 24, 2026

Claude / Anthropic News & Releases Model & API Releases

Snowflake's CEO declared a tie. The iteration ledger didn't.

Snowflake CEO tested GLM-5.2 vs Claude Opus 4.7: 103 tasks, pass@3 near-tie, 2x token use, 5.7x cheaper output. Here's what the numbers actually mean for builders.

Sungjae Lee

Jun 24, 2026

OpenAI / Codex News & Releases Dev Tools & SDK Changelogs

Compaction has no opt-out in 0.143.0. Here's what to adjust.

Codex 0.143.0-alpha.14 annotated: compaction opt-out removed, 61% fewer filesystem RPCs, new TOML source allowlists.

Sungjae Lee

Jun 24, 2026

News & Releases Model & API Releases

Amazon Folded Apparel Printing Into Alexa. The AI Is Unnamed.

Amazon's Alexa now generates merch from a description. The image model powering it remains unattributed as of June 2026.

Sungjae Lee

Jun 24, 2026

News & Releases Funding, Strategy & Policy

Miasma skipped npm and hijacked your AI agents' startup hooks

73 Microsoft repos down on June 5 via AI coding agent hooks — no npm. Attack vector, payload anatomy, and what to do now.

Sungjae Lee

Jun 24, 2026

News & Releases Model & API Releases

Analysts Called It Basic. Thomson Reuters Lost 16% Anyway.

Anthropic legal automation: TR -16%, RELX -14%, Wolters -13%. Inside the AI capex debate and which SaaS moats survive.

Sungjae Lee

Jun 24, 2026

News & Releases Funding, Strategy & Policy

The AI security EO has a Meta-shaped hole

Trump's June 2 EO sets voluntary 30-day pre-release AI security reviews. OpenAI and Google signed. Meta has not.

Sungjae Lee

Jun 24, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Ten patches blocked git wipeouts. The annotated sprint.

Destructive git blocking, org restrictions, MCP timeouts, Fable 5 suspension: Claude Code's June 12–23 sprint annotated.

Sungjae Lee

Jun 24, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

An accountant can outperform a senior dev on Claude Code

Domain expertise, not coding background, predicts Claude Code success. Key data: 398k sessions, Oct 2025–Apr 2026, published June 16, 2026.

Sungjae Lee

Jun 24, 2026

Google / Gemini News & Releases Funding, Strategy & Policy

Why Alphabet Sold Stock Instead of Bonds to Fund AI Compute

Alphabet's ~$85B equity raise: public stock, $10B Berkshire, $40B ATM program to cover $180B+ AI capex in 2026.

Sungjae Lee

Jun 24, 2026

News & Releases Dev Tools & SDK Changelogs

CC-BY-NC is out. Does Cohere's '4-bit lossless' claim hold?

Command A+: Apache 2.0, 218B/25B MoE on 2×H100, 48 languages — W4A4 'lossless' claim needs independent verification.

Sungjae Lee

Jun 23, 2026

News & Releases Model & API Releases

Seedance 2.5: thirty seconds of AI video, zero documentation

ByteDance's FORCE conference on June 23 described Seedance 2.5 with 30-second one-shot video generation and 50 multimodal references. Official documentation still shows Seedance 2.0.

Sungjae Lee

Jun 23, 2026

News & Releases Dev Tools & SDK Changelogs

Ship first, sign in later: what Cloudflare's --temporary does

Cloudflare's --temporary: agents get a live workers.dev URL with no login — 60-min window, then deleted.

Sungjae Lee

Jun 23, 2026

Claude / Anthropic News & Releases Research & Benchmarks

Novices quit. Experts adapt. 400k Claude Code sessions say so.

Anthropic's 398k-session study finds Claude Code widens expertise advantage — 15% novice success vs. 33% for experts.

Sungjae Lee

Jun 23, 2026

OpenAI / Codex News & Releases Research & Benchmarks

An 80-year conjecture fell to AI. What was actually proved?

Unnamed OpenAI model disproved Erdős's unit-distance conjecture. Sawin's n^1.014 is the first exponent gain in 80 years.

Sungjae Lee

Jun 23, 2026

News & Releases Dev Tools & SDK Changelogs

Copilot Cowork asks permission — unless you're the recipient

Copilot Cowork's Skills injection + self-send gap = silent M365 exfiltration, per PromptArmor's May 2026 POC.

Sungjae Lee

Jun 23, 2026

News & Releases Dev Tools & SDK Changelogs

Grok Build's /goal: when 'Complete' appears, what was checked?

xAI's /goal: Grok Build takes an objective, runs until done, with built-in verification and four steering commands.

Sungjae Lee

Jun 23, 2026

News & Releases Funding, Strategy & Policy

How a nursing-home AI ended up scanning Kansas City buses

KCATA's RideKC adds live face recognition: three watch lists, SafeSpace Global, no published policy or audit.

Sungjae Lee

Jun 23, 2026

News & Releases Dev Tools & SDK Changelogs

TVM 0.25's TIRx reveals what Triton deliberately conceals

TVM v0.25.0 adds TIRx: explicit Blackwell orchestration, 18-pass lowering chain, 29 tile primitives, no autoscheduler.

Sungjae Lee

Jun 23, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Fugu beats each component it calls — and the asterisks matter.

Fugu coordinates frontier LLMs via one API. What the benchmark sheet covers, what it omits, and caveats for builders.

Sungjae Lee

Jun 23, 2026

News & Releases Research & Benchmarks

DVD-JEPA in 500 lines — one claim that breaks

500-line MIT-licensed JEPA demo that trains in 10s on CPU. The 'debut' claim breaks on contact with V-JEPA 2.

Sungjae Lee

Jun 22, 2026

News & Releases Dev Tools & SDK Changelogs

AWS Context skips RAG. Continuum validates CVEs itself.

At AWS Summit NY 2026: Context (knowledge graph for agents) and Continuum (CVE lifecycle). Neither is GA.

Sungjae Lee

Jun 22, 2026

News & Releases Dev Tools & SDK Changelogs

A JEPA that learns coordinates it was never given

DVD-JEPA: 10s CPU training, browser demo, 32-dim latents — the smallest JEPA toy; not the first. MIT-licensed, June 2026

Sungjae Lee

Jun 22, 2026

News & Releases Dev Tools & SDK Changelogs

Worktrees aren't enough — 27.67% of AI PRs hit merge conflicts

Worktrees, serialized merge trains, and the 27.67% conflict rate in AI-authored PRs. CAID, STORM, and current tooling.

Sungjae Lee

Jun 22, 2026

OpenAI / Codex News & Releases Dev Tools & SDK Changelogs

11 Codex Rust alphas in 90 hours — and still no stable tag

11 Codex Rust 0.142.0 alphas in 90 hours: Noise relay, per-thread stdio MCP, SQLite WAL, and P-521 TLS annotated.

Sungjae Lee

Jun 22, 2026

Claude / Anthropic Google / Gemini News & Releases Funding, Strategy & Policy

AlphaFold's creator moved to Anthropic. The IP did not.

AlphaFold co-creator John Jumper (Nobel 2024) is leaving DeepMind for Anthropic. His new role remains unconfirmed.

Sungjae Lee

Jun 22, 2026

OpenAI / Codex News & Releases Dev Tools & SDK Changelogs

ChatGPT monitors unattended — it alerts, never acts

Scheduled page, monitoring, and plan quotas land in June 2026. All tasks are notify-only — no writes or transactions.

Sungjae Lee

Jun 22, 2026

News & Releases Dev Tools & SDK Changelogs

α-entmax tile-skips what softmax can't; 354M unconfirmed

AdaSplash-2: α-entmax exact-zeros, Triton tile-skip, BSD-3 license. 354M checkpoint not confirmed public.

Sungjae Lee

Jun 22, 2026

OpenAI / Codex News & Releases Funding, Strategy & Policy

OpenAI filed for IPO while losing $1.22 per dollar earned

Q1 2026: $5.7B revenue, $3.7B cash burn, -122% operating margin, IPO filed. OpenAI's unaudited financials, dissected.

Sungjae Lee

Jun 22, 2026

News & Releases Research & Benchmarks

Memorized by AI or hallucinated — a site lets you check which

Queries frozen AI weights, no live crawl, to surface how confidently each model recalls you. Built by two ex-OpenAI engineers and launched June 2026.

Sungjae Lee

Jun 22, 2026

News & Releases Model & API Releases

Mercury 2 abandons autoregressive decoding and hits 1,009/s

Mercury 2 hits 1,009 tok/s via diffusion decoding. Claim sourcing, API migration, and workload fit analysis.

Sungjae Lee

Jun 22, 2026

News & Releases Model & API Releases

The fastest LLM inference engine takes 28 minutes to start

vLLM, SGLang, TensorRT-LLM, and llama.cpp throughput compared on H100 with TTFT, cold-start, and per-workload guidance.

Sungjae Lee

Jun 22, 2026

News & Releases Dev Tools & SDK Changelogs

Cloak's daemon owns the HTTPS — raw vault values never exit

Cloak v1.1.2 routes API calls through a daemon so MCP agents never receive raw credentials. Architecture, callable surface, and honest limits.

Sungjae Lee

Jun 22, 2026

News & Releases Model & API Releases

Tesla filed 'Megapod' — no AI rack exists to buy

Tesla's USPTO filing for 'Megapod' covers a modular AI rack with servers, cooling, and software. No price, no ship date.

Sungjae Lee

Jun 22, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

cuTile Rust at 96% cuBLAS — Grout's engine wins need context

cuTile Rust carries Rust ownership to CUDA kernels. Grout hits 96% of cuBLAS — with vLLM/SGLang prefix caching off.

Sungjae Lee

Jun 22, 2026

News & Releases Model & API Releases

Midjourney's body scanner has no AI — the CEO admitted it

Midjourney Medical's USCT scanner: 500k transducers, no FDA clearance, no live AI, and ~12 scans completed as of June 2026.

Sungjae Lee

Jun 22, 2026

News & Releases Research & Benchmarks

Thirteen chatbots know your biography. F1 reveals how reliably

In the Weights probes 13 chatbots for cold biographical recall — F1 ceiling, LMP2 context, and GDPR implications.

Sungjae Lee

Jun 22, 2026

News & Releases Dev Tools & SDK Changelogs

Your agent reads SKILL.md. SkillsGuard reads it first.

Zero-dep static scanner for AI skill packages — 151 rules, SARIF, MCP server. What SkillsGuard catches, what it misses, and how it compares to Cisco and NVIDIA SkillSpector.

Sungjae Lee

Jun 21, 2026

OpenAI / Codex News & Releases Model & API Releases

The doctor-vs-AI health exam that only one lab graded

GPT-5.5 Instant outscored physicians on HealthBench Professional. OpenAI built the benchmark, supplied the physicians, and ran the evaluation.

Sungjae Lee

Jun 21, 2026

Google / Gemini News & Releases Model & API Releases

Gemini Omni is paywalled. 3.5 Flash is the backend.

3.5 Flash: GA, $1.50/M input, API-callable. Gemini Omni: subscription-only, no endpoint. Decision guide for builders.

Sungjae Lee

Jun 21, 2026

News & Releases Dev Tools & SDK Changelogs

One GitHub PR is how you ship to Grok Build's marketplace

What each Grok Build plugin bundles, who the day-one partners are, and how to submit your own extension via GitHub PR.

Sungjae Lee

Jun 21, 2026

News & Releases Dev Tools & SDK Changelogs

Copilot GA'd as a standalone workspace — not as a GitHub App

GitHub Copilot's standalone app GA in 2026: worktree isolation, org admin gating, and open spending questions explained.

Sungjae Lee

Jun 21, 2026

News & Releases Model & API Releases

Firefly routes to Kling, Veo, Runway. Your IP, not Adobe's.

Firefly AI assistant routes to Kling, Veo, Runway, and 25+ other models. Adobe's indemnity covers native outputs only.

Sungjae Lee

Jun 21, 2026

Claude / Anthropic News & Releases Research & Benchmarks

Blackmail dropped from 96% to 0%. Here's the asterisk.

May 2026 alignment paper: how Anthropic cut Claude's blackmail rate from 96% to 0% and what the limits are.

Sungjae Lee

Jun 21, 2026

News & Releases Model & API Releases

Grok joins Databricks at DAIS — bring your own xAI credential

Grok 4.3 in Databricks Agent Bricks via BYOK. Unity AI Gateway controls, $5/1k tool calls, open partnership terms.

Sungjae Lee

Jun 21, 2026

News & Releases Model & API Releases

CrankGPT: Pi 5 Offline Voice AI — 0.8s TTFB, No Grid, Full Benchmark Breakdown

Pi 5, hand crank, no internet: CrankGPT's full ASR/TTS/LLM stack and llama-bench latency figures explained.

Sungjae Lee

Jun 21, 2026

Claude / Anthropic News & Releases Research & Benchmarks

Activations into English: 4× better at surfacing hidden goals

Anthropic's NLAs map activations to English, exposing hidden goals 4× more than SAEs — and where they confabulate.

Sungjae Lee

Jun 21, 2026

News & Releases Dev Tools & SDK Changelogs

quicktok: 11× on tiktoken, author-reported, no README

quicktok: C++20 SIMD tiktoken replacement, byte-identical, 11× reported. No README. No independent reproduction.

Sungjae Lee

Jun 21, 2026

News & Releases Research & Benchmarks

Chronic management AI vs PCPs: 94% precision, simulated only

AMIE matched PCPs on 15 chronic management axes (94% vs 67% precision) in a Nature 2026 simulation. RxQA, Dialogue+Mx split, and key caveats.

Sungjae Lee

Jun 21, 2026

LangChain / LlamaIndex News & Releases Dev Tools & SDK Changelogs

What v3 event stream was silently losing — LangChain 1.4.8

langchain-core 1.4.8 + 1.3.10 (June 18): v3 metering gaps, BaseTool schema caching, and gpt-5.x routing correction.

Sungjae Lee

Jun 21, 2026

Google / Gemini News & Releases Research & Benchmarks

AlphaEvolve in Borg before the paper: the concrete wins

Evolutionary code optimization from DeepMind — in Borg since 2024, 23% TPU speedup, Strassen improved.

Sungjae Lee

Jun 21, 2026

News & Releases Dev Tools & SDK Changelogs

Firefly executes Photoshop jobs. Not uniformly, though.

Firefly in Photoshop 27.9, Premiere 26.3, Illustrator, InDesign, Frame.io. Capabilities, 30-model backend, no SDK yet.

Sungjae Lee

Jun 21, 2026

Claude / Anthropic News & Releases Model & API Releases

Opus 4.8 is a one-line swap. The xhigh recalibration isn't.

Opus 4.8 vs 4.7: +4.9 pts SWE-bench Pro, xhigh recalibrated, GA subagent fleets. Drop-in API; effort tiers changed.

Sungjae Lee

Jun 21, 2026

OpenAI / Codex News & Releases Dev Tools & SDK Changelogs

Seven alphas later — what Codex CLI 0.141.0 actually delivers

Codex CLI v0.141.0 (stable): Noise relay, SQLite WAL pin, Windows hardening. v0.142 hit seven alphas in 48 hours.

Sungjae Lee

Jun 20, 2026

News & Releases Model & API Releases

SubQ's 56× gain: Appen ran the study. SubQ paid Appen.

SubQ 1.1 Small: 56× FLOP reduction at 1M, 99% NIAH, Appen-measured, unnamed donor base, private API, no public weights.

Sungjae Lee

Jun 20, 2026

Google / Gemini News & Releases Funding, Strategy & Policy

AI summaries appeared on 18% of queries. Clicks nearly halved.

Pew's 68,879-query study: AI summaries cut CTR from 15% to 8%. Semrush: 93% no-click on AI Mode. Dev content strategy.

Sungjae Lee

Jun 20, 2026

Claude / Anthropic News & Releases Funding, Strategy & Policy

Anthropic writes the checks. CodePath signs the W-2s.

$150M fund, 1,000 fellows embedded at nonprofits, W-2 from CodePath — what Claude Corps covers and who qualifies.

Sungjae Lee

Jun 20, 2026

Google / Gemini News & Releases Research & Benchmarks

AMIE's chronic care paper is strong. The fine print is longer.

AMIE extends to longitudinal chronic care: 627 guidelines, 88% care plan quality vs 74% PCPs, drug knowledge ceiling at 73%.

Sungjae Lee

Jun 20, 2026

News & Releases Research & Benchmarks

94 vs 67: AMIE vs PCPs in a blinded prescription OSCE

Google's AMIE scored 94% vs 67% on prescription precision in a blinded OSCE. New RxQA benchmark released. Nature 2026.

Sungjae Lee

Jun 20, 2026

News & Releases Dev Tools & SDK Changelogs

xAI's Office sidebar is in PowerPoint. What goes where?

xAI's Office add-ins for PowerPoint, Word, and Excel. What content leaves your org and which protections apply per plan.

Sungjae Lee

Jun 20, 2026

Google / Gemini News & Releases Dev Tools & SDK Changelogs

$24,999 디스플레이와 Google Beam이 아직 못 하는 것

Google I/O 2026 extended Beam to multi-person calls. Here's the AI pipeline, the $24,999 display, and where the gaps are.

Sungjae Lee

May 30, 2026

Google / Gemini News & Releases Research & Benchmarks

CDC 예측을 넘어선 Gemini ERA의 실제 성능

Google's I/O 2026 AI research suite: literature triage, hypothesis tournaments, and ERA outperforming CDC forecasts.

Sungjae Lee

May 30, 2026

News & Releases Funding, Strategy & Policy

LLM 비용을 선물로 헤지한다는 상하이의 실험

Shanghai Futures Exchange is prototyping AI token futures — forward contracts on LLM consumption costs. Here's the technical picture.

Sungjae Lee

May 30, 2026

News & Releases Research & Benchmarks

블록 하나씩만 학습해도 정확도가 유지되는 이유

DiffusionBlocks trains one residual block per step, reducing activation memory B× with competitive or better accuracy.

Sungjae Lee

May 30, 2026

MCP News & Releases Protocols & Ecosystem

AI 에이전트가 Robinhood에서 직접 거래하는 실제 구조

Robinhood opened its brokerage and card infrastructure to MCP-compatible AI agents. Here's what the implementation looks like technically.

Sungjae Lee

May 30, 2026

LangChain / LlamaIndex News & Releases Dev Tools & SDK Changelogs

ChatPerplexity 1.3.0, 실시간 검색 자동 라우팅이 된다

ChatPerplexity gains use_responses_api in 1.3.0: auto-routes to Perplexity's Agent API for real-time search.

Sungjae Lee

May 30, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

대화 중간 제약을 바꿔도 프롬프트 캐시가 끊기지 않는다

Mid-conversation constraint injection in v0.105.0 preserves prompt cache continuity across long inference runs.

Sungjae Lee

May 30, 2026

News & Releases Dev Tools & SDK Changelogs

5줄 스크립트로 SharePoint가 조용히 유출된다

A 5-line poisoned Skills script silently exfiltrates SharePoint data via Copilot Cowork — no approval gate, no CVE, no patch.

Sungjae Lee

May 30, 2026

Google / Gemini News & Releases Model & API Releases

Waymo가 이미 실사용 중인 Street View 기반 Genie 3

Genie 3 generates interactive worlds from real Street View geometry. Waymo is already using it for rare-event training.

Sungjae Lee

May 30, 2026

Google / Gemini News & Releases Model & API Releases

클라우드도 선도 없이 시각장애 러너를 안내하는 AI

DeepMind's chest-mounted AI system lets blind runners navigate independently using dual-path on-device inference—no cloud, no tether.

Sungjae Lee

May 30, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Anthropic SDK 릴리즈가 PyPI 배포를 깨뜨린 이유

Two rapid patches followed Anthropic's 0.105.0 drop. Here's what broke, why, and which version to pin.

Sungjae Lee

May 29, 2026

Claude / Anthropic MCP News & Releases Dev Tools & SDK Changelogs

Claude Code MCP 크리덴셜 유출이 패치됐다

Seven builds in one week: four Bash/PowerShell sandbox bugs patched, /code-review --fix lands auto-apply, and a serious MCP auth credential leak is closed.

Sungjae Lee

May 29, 2026

Google / Gemini News & Releases Research & Benchmarks

AlphaEvolve와 Co-Scientist, 발표대로 작동하는가

Three experimental AI research tools launched at I/O 2026. What Literature Insights, Co-Scientist, and AlphaEvolve each actually do.

Sungjae Lee

May 29, 2026

Google / Gemini News & Releases Dev Tools & SDK Changelogs

Google Workspace Live, 기능 접근 순서가 정해졌다

Docs Live, Gmail Live, Gemini Spark, Sheets one-shot: I/O 2026 Workspace features and who gets access first.

Sungjae Lee

May 29, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Anthropic SDK 출력 귀속, 코드에서 실제로 뭐가 달라지나

v0.105.0 adds granular output-type attribution and configurable upload caps—here's what they do and when to use them.

Sungjae Lee

May 29, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

vLLM 최신은 v0.21.0, 포트 버그는 아직 미해결

v0.22.0 doesn't exist yet. v0.21.0 ships KV offload, spec decode, and a multi-server port bug still under review.

Sungjae Lee

May 29, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Claude Code, 9일 만에 보안 구멍 4개를 닫았다

Ten patches in nine days: pinned sessions, four security fixes, /code-review --fix, and skill-level tool gating.

Sungjae Lee

May 29, 2026

News & Releases Model & API Releases

SuperGrok 구독자는 이제 API 키 없이 grok-build-0.1을 쓴다

SuperGrok and X Premium+ subscribers can now authenticate into Kilo Code and run grok-build-0.1 inside VS Code or JetBrains — no API key management required.

Sungjae Lee

May 29, 2026

OpenAI / Codex MCP News & Releases Dev Tools & SDK Changelogs

Codex CLI 0.134.0 & 0.135.0: 48시간 안에 안정 버전 2개 출시

OpenAI shipped two Codex CLI stable releases in 48 hours. What changed, what broke, and why the cadence matters.

Sungjae Lee

May 29, 2026

Claude / Anthropic News & Releases Dev Tools & SDK Changelogs

Anthropic Python SDK 0.105: Opus 4.8 및 미드-세션 시스템 프롬프트

Three SDK releases in 7.5 hours ship claude-opus-4-8 support, mid-conversation system blocks, and finer output usage reporting.

Sungjae Lee

May 29, 2026

News & Releases Model & API Releases

xAI grok-build-0.1 API 공개 베타: 토큰 비용 및 SDK 지원

xAI's coding model exits the $299 CLI gate. Here's what the public API beta actually offers developers.

Sungjae Lee

May 29, 2026

News & Releases Dev Tools & SDK Changelogs

Grok Build, OpenCode·Kilo Code에 상륙: xAI의 13일 롤아웃

xAI shipped grok-build-0.1 to three developer tools in 13 days. Here's what each integration covers and how to pick the right surface.

Sungjae Lee

May 29, 2026

OpenAI / Codex News & Releases Dev Tools & SDK Changelogs

Codex CLI Doctor가 생겼다, TUI와 Vim 모드도 달라졌다

OpenAI's 0.135.0 stable is a diagnostics and polish cycle. What moved in the TUI, Vim mode, and remote transport.

Sungjae Lee

May 28, 2026

News & Releases Model & API Releases

Command A+, 벤치마크 갭에도 엔터프라이즈가 선택할 이유

Cohere's first open-weight frontier model: benchmark gaps, native citation design, and the enterprise sovereignty case.

Sungjae Lee

May 28, 2026

News & Releases Model & API Releases

grok-build-0.1, 캐싱 인시던트가 드러낸 스펙의 실체

xAI's grok-build-0.1 hit public beta in May 2026. Here's what the spec says — and what the caching incident revealed.

Sungjae Lee

May 28, 2026

Claude / Anthropic News & Releases Model & API Releases

SWE-Bench Pro 69.2%의 Claude, 에이전트 코딩이 달라지나

Anthropic ships Opus 4.8 with 69.2% SWE-Bench Pro, mid-conversation system messages, and adaptive thinking.

Sungjae Lee

May 28, 2026

OpenAI / Codex MCP News & Releases Dev Tools & SDK Changelogs

Codex CLI alpha, 릴리즈 노트 오류 뒤 529개 파일의 실체

Two alpha releases in three hours, 529 files changed. Here's what the diff says when the release notes page errors.

Sungjae Lee

May 28, 2026

Claude / Anthropic News & Releases Funding, Strategy & Policy

xAI Colossus 독점 계약으로 Claude 요청 한도가 즉시 올랐다

Anthropic buys exclusive access to xAI's Colossus 1 cluster: 220K GPUs, $1.25B/month, and immediate Claude rate limit increases.

Sungjae Lee

May 28, 2026

OpenAI / Codex News & Releases Funding, Strategy & Policy

OpenAI Codex, 클라우드 없이 Dell에서 배포하는 구조

OpenAI named Dell as its first non-hyperscaler Codex deployment path. Here's how the architecture actually works and who it targets.

Sungjae Lee

May 28, 2026

MCP vLLM / Ollama News & Releases Protocols & Ecosystem

Starlette BadHost, 프록시 없는 AI 에이전트 인증을 우회한다

Starlette BadHost (CVE-2026-48710): a crafted Host header bypasses auth middleware. Unproxied AI agents at highest risk.

Sungjae Lee

May 28, 2026

News & Releases Protocols & Ecosystem

Netflix GenAI 스튜디오, 보도자료 없이 채용 공고가 드러냈다

Netflix's AI animation studio emerged from job listings, not PR. Here's what the hiring data reveals about the pipeline architecture.

Sungjae Lee

May 28, 2026

Google / Gemini News & Releases Protocols & Ecosystem

구글 AI Mode 10억 명, 개발자 코드에 뭐가 달라지나

AI Mode crossed 1B users at I/O 2026. Queries are 3× longer, background agents go live this summer. Here's what structurally changed.

Sungjae Lee

May 28, 2026

News & Releases Funding, Strategy & Policy

일리노이 AI 법안, $5억 매출이면 적용 의무가 달라진다

SB 315 passed 110-0. Who the $500M threshold covers, what five obligations apply, and when enforcement starts.

Sungjae Lee

May 28, 2026

Google / Gemini News & Releases Model & API Releases

Gemini 3.5 Flash GA, thinking_level이 기존 코드를 깨뜨린다

Gemini 3.5 Flash is GA: 1M-token context, a breaking thinking_level change, and full pricing breakdown.

Sungjae Lee

May 28, 2026

OpenAI / Codex News & Releases Dev Tools & SDK Changelogs

openai-codex 4시간 만에 재패치, SDK 성숙도를 어떻게 볼까

Two beta releases in under four hours. Here's what the b1→b2 patch cadence tells developers about SDK maturity and what to pin.

Sungjae Lee

May 28, 2026

Featured posts

Tags

Sign up for insights and ideas