AI Developer Digest
Model Releases
Mistral Medium 3.5 — 128B Dense, 256k Context, Open Weights
Source: Mistral AI | Date: April 29, 2026 (HN thread April 30) | Link: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
TL;DR: Mistral releases a 128B dense open-weight model with 256k context window, 77.6% on SWE-Bench Verified, configurable per-request reasoning effort, and EAGLE speculative decoding variant.
Dev signal: API pricing $1.50/1M input, $7.50/1M output. Open weights on HuggingFace (huggingface.co/mistralai/Mistral-Medium-3.5-128B) under modified MIT — self-hostable on 4× GPUs. A separate Mistral-Medium-3.5-128B-EAGLE variant enables speculative inference for lower latency. Reasoning effort is configurable per request (quick chat vs. full agentic runs from the same weights). Competes with Gemini 3.1 Pro Preview (78.8% SWE-Bench) at a fraction of the API cost.
Primary source: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B | https://news.ycombinator.com/item?id=47949642
API & SDK Changes
ACTION REQUIRED: gemini-robotics-er-1.5-preview Shut Down Today
Source: Google AI for Developers | Date: April 30, 2026, 9AM PST | Link: https://ai.google.dev/gemini-api/docs/deprecations
TL;DR: The gemini-robotics-er-1.5-preview model endpoint was decommissioned today at 9AM PST — all calls to it now return errors.
Dev signal: Migrate all code referencing gemini-robotics-er-1.5-preview to gemini-robotics-er-1.6-preview, which added instrument reading, improved spatial reasoning, and better physical reasoning. Grep your configs and environment variables now.
Primary source: https://ai.google.dev/gemini-api/docs/deprecations
Research Papers
Nothing today that cleared the quality bar (no May 1 submissions from recognized labs with code + benchmarks confirmed).
Tooling Updates
llama.cpp — 32-bit WASM >2GB Models + Vulkan Mixed-Quant Flash Attention + Fast Int MatMul
Source: ggml-org/llama.cpp | Date: April 30–May 1, 2026 (b8984, b8992, b8995) | Link: https://github.com/ggml-org/llama.cpp/releases
TL;DR: Three back-to-back releases shipped: 32-bit WebAssembly now supports models larger than 2GB; Vulkan coopmat2 path gains asymmetric flash attention with mixed quantization types (incl. Q1_0); fast matrix multiplication landed for integer quantization types.
Breakdown:
- b8992 (Apr 30, 23:33 UTC): Switches memory mapping to
ftello/fseeko, removing the 2GB model ceiling in 32-bit WASM. Previously, 32-bit WebAssembly environments couldn't address model files >2GB due to 32-bit file offset limits. PR #22497. - b8995 (May 1, 16:10 UTC): Vulkan coopmat2 path now supports asymmetric flash attention with mixed quantization types including Q1_0. Completes the originally intended design of the cm2 FA shader.
- b8984 (Apr 30, 10:35 UTC): Fast matrix multiplication for integer quantization types — no benchmark numbers in release notes, but targets throughput for quantized model inference.
Dev signal: If you serve llama.cpp in a 32-bit WASM environment (e.g., browser or embedded runtime) and were hard-capped at <2GB models, upgrade to b8992+. Vulkan users running coopmat2 with mixed quantization can now use Q1_0 in flash attention — check VRAM headroom. The fast int matmul in b8984 may benefit quantized model throughput but needs your own benchmarking to confirm. Primary source: https://github.com/ggml-org/llama.cpp/releases/tag/b8992 | https://github.com/ggml-org/llama.cpp/releases/tag/b8995 | https://github.com/ggml-org/llama.cpp/releases/tag/b8984
Technical Discussions
Nothing today that cleared the quality bar.
Quality gate excluded ~32 items. Several significant releases from the past week fell outside the 24-48h window: GPT-5.5 (Apr 24, API pricing $5/$30/1M, Terminal-Bench 82.7%), Claude Opus 4.7 (Apr 16, SWE-bench 87.6%), Gemma 4 (Apr 2, 31B Apache 2.0, AIME 2026 89.2%), Ollama v0.22.0 (Apr 28, Nemotron 3 Omni + Poolside Laguna XS.2). Check prior digests for those.