5 posts 3 posts

Research & Benchmarks

Notable AI research papers and benchmark results, distilled for practitioners.

Gemini's ERA Model Is Now Outrunning CDC Disease Forecasts

Google's I/O 2026 AI research suite: literature triage, hypothesis tournaments, and ERA outperforming CDC forecasts.

DiffusionBlocks Cuts Training Memory B× Without Accuracy Loss

DiffusionBlocks trains one residual block per step, reducing activation memory B× with competitive or better accuracy.

What Gemini's Three I/O 2026 Research Tools Actually Do

Three experimental AI research tools launched at I/O 2026. What Literature Insights, Co-Scientist, and AlphaEvolve each actually do.

Do Grok Build's SWE-Bench Claims Actually Hold Up?

xAI shipped its terminal coding agent on May 14, 2026. Here's what the CLI actually does, where the benchmark numbers hold, and what $299/month buys.

A Reasoning Model Just Broke an 80-Year-Old Conjecture

OpenAI's reasoning model disproved an 80-year-old geometry conjecture — verified by a nine-mathematician team including a Fields Medalist.

CDC 예측을 넘어선 Gemini ERA의 실제 성능

Google's I/O 2026 AI research suite: literature triage, hypothesis tournaments, and ERA outperforming CDC forecasts.

블록 하나씩만 학습해도 정확도가 유지되는 이유

DiffusionBlocks trains one residual block per step, reducing activation memory B× with competitive or better accuracy.

AlphaEvolve와 Co-Scientist, 발표대로 작동하는가

Three experimental AI research tools launched at I/O 2026. What Literature Insights, Co-Scientist, and AlphaEvolve each actually do.