vLLM / Ollama

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

Rust's borrow checker reaches CUDA — and the cost is zero

NVIDIA & Hugging Face's cuTile Rust: borrow-checked CUDA at 96% cuBLAS; Grout decoder matches vLLM at batch-1 on B200.

Sungjae Lee

Jun 25, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

cuTile Rust at 96% cuBLAS — Grout's engine wins need context

cuTile Rust carries Rust ownership to CUDA kernels. Grout hits 96% of cuBLAS — with vLLM/SGLang prefix caching off.

Sungjae Lee

Jun 22, 2026

Meta / Llama vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

The parity fix that quietly resets your profiling baseline

llama.cpp b9437: -fa auto added to llama-bench, -ngl default flips to -1. What changes and who's affected.

Sungjae Lee

Jun 12, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

v0.22.0: the chips sat idle while the front end was choking

vLLM v0.22.0 ships --api-server-count, a DP Supervisor, and three LB topology modes. Annotated explainer for operators.

Sungjae Lee

Jun 10, 2026

Meta / Llama vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

4 GitHub stars, voice interviews with Ollama: that's GrillKit

Apache 2.0 interview trainer with Whisper voice input, Ollama or cloud LLM support, and local session history. No SaaS, no registration required.

Sungjae Lee

Jun 02, 2026

Meta / Llama vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

RDNA3 cuts llama.cpp KV VRAM 47% — and CUDA has no equivalent

RDNA3 bit-packing cuts llama.cpp KV VRAM 47% on RX 7900. Flags, VRAM math, and TurboQuant for 4.9× compression.

Sungjae Lee

Jun 01, 2026

Meta / Llama vLLM / Ollama Build & Learn Daily How-To

llama-bench skipped FA on capable GPUs — b9437 corrects it

llama.cpp b9437 (May 30): -fa goes auto, -ngl to -1 in llama-bench. Your pre-b9437 comparisons need a flag audit.

Sungjae Lee

May 31, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

459 Commits Into vLLM 0.22.0 — What Moves the Needle

459 commits, a dedicated DeepSeek V4 package, Rust frontend, and an rc0 that's one CI fix. What matters and what doesn't.

Sungjae Lee

May 30, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

vLLM v0.21.0 Production Update: KV Offload and Multi-Server Port Bug

v0.22.0 doesn't exist yet. v0.21.0 ships KV offload, spec decode, and a multi-server port bug still under review.

Sungjae Lee

May 29, 2026

MCP vLLM / Ollama News & Releases Protocols & Ecosystem

A Crafted Host Header Bypasses Auth in Your AI Agent Stack

Starlette BadHost (CVE-2026-48710): a crafted Host header bypasses auth middleware. Unproxied AI agents at highest risk.

Sungjae Lee

May 28, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

vLLM RC3 Fixes a Hard-Coded 60s Timeout — What to Configure

RC3 patches a hard-coded 60s startup timeout in vLLM's multi-API-server subsystem — here's what changed and what operators must configure.

Sungjae Lee

May 28, 2026

MCP vLLM / Ollama News & Releases Protocols & Ecosystem

BadHost's CVSS 6.5 Understates the Real Risk for MCP Servers

CVSS 6.5 misses the mark. Why MCP servers and proxy-less AI agent stacks face disproportionate exposure from BadHost.

Sungjae Lee

May 28, 2026

vLLM / Ollama News & Releases Dev Tools & SDK Changelogs

vLLM 최신은 v0.21.0, 포트 버그는 아직 미해결

v0.22.0 doesn't exist yet. v0.21.0 ships KV offload, spec decode, and a multi-server port bug still under review.

Sungjae Lee

May 29, 2026

MCP vLLM / Ollama News & Releases Protocols & Ecosystem

Starlette BadHost, 프록시 없는 AI 에이전트 인증을 우회한다

Starlette BadHost (CVE-2026-48710): a crafted Host header bypasses auth middleware. Unproxied AI agents at highest risk.

Sungjae Lee

May 28, 2026

Rust's borrow checker reaches CUDA — and the cost is zero

cuTile Rust at 96% cuBLAS — Grout's engine wins need context

The parity fix that quietly resets your profiling baseline

v0.22.0: the chips sat idle while the front end was choking

4 GitHub stars, voice interviews with Ollama: that's GrillKit

RDNA3 cuts llama.cpp KV VRAM 47% — and CUDA has no equivalent

llama-bench skipped FA on capable GPUs — b9437 corrects it

459 Commits Into vLLM 0.22.0 — What Moves the Needle

vLLM v0.21.0 Production Update: KV Offload and Multi-Server Port Bug

A Crafted Host Header Bypasses Auth in Your AI Agent Stack

vLLM RC3 Fixes a Hard-Coded 60s Timeout — What to Configure

BadHost's CVSS 6.5 Understates the Real Risk for MCP Servers

vLLM 최신은 v0.21.0, 포트 버그는 아직 미해결

Starlette BadHost, 프록시 없는 AI 에이전트 인증을 우회한다

Featured posts

Ghostty beats iTerm2 3× — speed isn't the agent bottleneck

World Monitor hit 67k stars — here's what the MCP endpoint

Ghostty가 iTerm2보다 3배 빠르다 — 병목은 에이전트가 아니다

World Monitor 6.7만 스타 — MCP 엔드포인트는?

Codex CLI가 chat-wire를 버리자, OpenCodex가 라우팅을 맡다

OpenAI, Claude Code에 Codex 탑재 — 두 명령인가, 네 명령인가?

LongCat-Video-Avatar 1.5, 추론 8단계로 단축 — 핵심

Tags

vLLM / Ollama

Featured posts

Tags

Sign up for insights and ideas