← All digests
📡

AI Developer Digest

Tue, May 5, 20267 items · 20 scanned · 13 excluded

Model Releases

Nothing today — no new model GA announcements within the last 24 hours. (GPT-5.5: April 24; Claude Opus 4.7: April 16; Mistral Medium 3.5: May 2 — all outside window.)


API & SDK Changes

Gemini API File Search is Now Multimodal

Source: Google Developers Blog | Date: 2026-05-05 | Link: https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/ TL;DR: The Gemini API's File Search tool now indexes and retrieves images alongside text, powered by Gemini Embedding 2, with custom metadata filters and page-level citations for RAG pipelines. Dev signal: If you're building RAG on Gemini, you can now store charts, product images, and diagrams in the same File Search corpus as your text docs and query across all of them in a single API call. Page citations are returned for grounding. Update your integration and check the new metadata_filters param. Primary source: https://ai.google.dev/gemini-api/docs/changelog


Anthropic Python SDK v0.98.0 + v0.99.0

Source: anthropics/anthropic-sdk-python GitHub | Date: 2026-05-04 (v0.98.0), 2026-05-05 (v0.99.0) | Link: https://github.com/anthropics/anthropic-sdk-python/releases TL;DR: v0.98.0 adds Workload Identity Federation, interactive OAuth, and named auth profiles to the client; v0.99.0 adds OIDC federation token exchange scoped to a specific workspace. Dev signal: Enterprise deployments using service accounts or SSO can now authenticate without static API keys — pass auth_profile="my-profile" or use the new workload_identity_federation client option. The streaming bug where stop_details was missing from accumulated Message objects (issue #1725) is also fixed. Run pip install anthropic==0.99.0. Primary source: https://github.com/anthropics/anthropic-sdk-python/compare/v0.97.0...v0.99.0


Claude Agent SDK Python v0.1.73 — Eager Session Store Flushing

Source: anthropics/claude-agent-sdk-python GitHub | Date: 2026-05-04 | Link: https://github.com/anthropics/claude-agent-sdk-python/releases/tag/v0.1.73 TL;DR: New session_store_flush option ("batched" | "eager") on ClaudeAgentOptions delivers session frames to SessionStore.append() in near-real-time instead of only at end-of-turn. Dev signal: eager mode enables three previously difficult patterns: (1) live-tailing UIs that stream agent progress, (2) cross-process agent resume from another worker, (3) crash-durability — agent state is already persisted before a crash. Set session_store_flush="eager" if you're building any of these. Default remains "batched". Run pip install claude-agent-sdk==0.1.73. Primary source: https://github.com/anthropics/claude-agent-sdk-python/releases/tag/v0.1.73


LiteLLM v1.83.14-stable — GPT-5.5, Memory API, LLM-as-Judge Guardrail

Source: BerriAI/litellm GitHub | Date: 2026-05-04 | Link: https://github.com/BerriAI/litellm/releases/tag/v1.83.14-stable TL;DR: Adds GPT-5.5 and GPT-5.5 Pro support, GLM-5 and Minimax M2.5 on Bedrock, new /v1/memory CRUD endpoints, an LLM-as-a-Judge guardrail option, and route_all_chat_openai_to_responses global flag. Dev signal: If you're using LiteLLM proxy for unified access: (1) gpt-5.5 and gpt-5.5-pro model strings are now valid; (2) the new /v1/memory endpoints let you persist and retrieve structured memory across sessions through the proxy; (3) the LLM-as-a-Judge guardrail lets you define a judge prompt that runs on every completion — useful for policy enforcement without a custom layer. Run pip install litellm==1.83.14. Primary source: https://github.com/BerriAI/litellm/releases/tag/v1.83.14-stable


Research Papers

Nothing today — no papers with verified May 5 submission dates, strong benchmark numbers, and code repos surfaced from cs.CL/cs.AI/cs.LG scans.


Tooling Updates

vLLM v0.20.1 — DeepSeek V4 Stability Patch

Source: vllm-project/vllm GitHub | Date: 2026-05-04 | Link: https://github.com/vllm-project/vllm/releases/tag/v0.20.1 TL;DR: Patch release fixing CUDA graph memory capture, BailingMoE linear layer bugs, and adding multi-stream pre-attention GEMM plus BF16/MXFP8 all-to-all support for DeepSeek V4. Dev signal: If you're serving DeepSeek V4 on vLLM, upgrade — the CUDA graph memory capture bug caused silent OOM failures in multi-GPU setups. The new tile kernels for "optimized head computation" also improve throughput on H100 configurations. Run pip install vllm==0.20.1. Primary source: https://github.com/vllm-project/vllm/releases/tag/v0.20.1


Ollama v0.23.1-rc0 — Gemma 4 MTP Speculative Decoding (>2x speed on coding)

Source: ollama/ollama GitHub | Date: 2026-05-05 | Link: https://github.com/ollama/ollama/releases/tag/v0.23.1-rc0 TL;DR: Release candidate adds Gemma 4 MTP (multi-token prediction) speculative decoding claiming over 2x throughput increase for Gemma 4 31B on coding tasks, plus MLX threading fixes and Go 1.26 bump. Dev signal: If you're running Gemma 4 31B locally for code tasks, this RC is worth testing — 2x tokens/second is a significant gain for interactive use. Note it's a release candidate; wait for v0.23.1 stable for production. MLX threading fix also resolves an Apple Silicon hang under concurrent requests. Primary source: https://github.com/ollama/ollama/releases/tag/v0.23.1-rc0


llama.cpp b9028–b9033 — KV Rotation Fast Path, Memory Savings, Lazy Backend Loading

Source: ggml-org/llama.cpp GitHub | Date: 2026-05-05 | Link: https://github.com/ggml-org/llama.cpp/releases TL;DR: Five builds today: fast Walsh-Hadamard transform for KV rotation (perf), device buffer memory saving option, backend lazy-loading (only initializes GPU/CPU backends on demand), and a new /models?reload=1 server API endpoint. Dev signal: Three things worth acting on: (1) --memory-save-device-buffers flag reduces VRAM overhead for multi-model servers; (2) /models?reload=1 lets you hot-reload model configs without restarting the server process; (3) the Walsh-Hadamard KV rotation is a drop-in perf improvement for models that use RoPE variants — no config change needed, just pull latest build. Primary source: https://github.com/ggml-org/llama.cpp/releases


Technical Discussions

Nothing today passing the quality gate. The Anthropic CEO cybersecurity remarks (Dario Amodei warning of "tens of thousands of vulnerabilities" found by Opus 4.7) are newsworthy but scored below 3 — no code, no API link, no reproducible data.


Quality gate excluded 13+ items: 4 strong releases outside the 24h window (Claude Opus 4.7, Managed Agents, GPT-5.5, Mistral Medium 3.5), Devstral 2 (Dec 2025), Ollama v0.23.0 (May 3), arXiv papers without confirmed today-dates and benchmark links, plus several opinion/roundup pieces. Light day on papers — honest digest beats a padded one.

← All digestspersonal/digests/ai-2026-05-05.md