8 posts 2 posts

vLLM / Ollama

Self-hosting and inference with vLLM, Ollama, and local model runners.

4 GitHub stars, voice interviews with Ollama: that's GrillKit

Apache 2.0 interview trainer with Whisper voice input, Ollama or cloud LLM support, and local session history. No SaaS, no registration required.

RDNA3 cuts llama.cpp KV VRAM 47% — and CUDA has no equivalent

RDNA3 bit-packing cuts llama.cpp KV VRAM 47% on RX 7900. Flags, VRAM math, and TurboQuant for 4.9× compression.

llama-bench skipped FA on capable GPUs — b9437 corrects it

llama.cpp b9437 (May 30): -fa goes auto, -ngl to -1 in llama-bench. Your pre-b9437 comparisons need a flag audit.

459 Commits Into vLLM 0.22.0 — What Moves the Needle

459 commits, a dedicated DeepSeek V4 package, Rust frontend, and an rc0 that's one CI fix. What matters and what doesn't.

vLLM v0.21.0 Production Update: KV Offload and Multi-Server Port Bug

v0.22.0 doesn't exist yet. v0.21.0 ships KV offload, spec decode, and a multi-server port bug still under review.

A Crafted Host Header Bypasses Auth in Your AI Agent Stack

Starlette BadHost (CVE-2026-48710): a crafted Host header bypasses auth middleware. Unproxied AI agents at highest risk.

vLLM RC3 Fixes a Hard-Coded 60s Timeout — What to Configure

RC3 patches a hard-coded 60s startup timeout in vLLM's multi-API-server subsystem — here's what changed and what operators must configure.

BadHost's CVSS 6.5 Understates the Real Risk for MCP Servers

CVSS 6.5 misses the mark. Why MCP servers and proxy-less AI agent stacks face disproportionate exposure from BadHost.

vLLM 최신은 v0.21.0, 포트 버그는 아직 미해결

v0.22.0 doesn't exist yet. v0.21.0 ships KV offload, spec decode, and a multi-server port bug still under review.

Starlette BadHost, 프록시 없는 AI 에이전트 인증을 우회한다

Starlette BadHost (CVE-2026-48710): a crafted Host header bypasses auth middleware. Unproxied AI agents at highest risk.