LLM Status: From Side Project to Open-Source Infrastructure & 2026 Roadmap
LLM Status exists because we were tired of guessing whether a slow AI feature was a bug in our code or a provider outage. It started with a cron job. It's now an open-source platform tracking 40+ providers. This post tells that story and lays out where we're going.
The Origin
In 2024, the SoxAI team was running dozens of API calls to various AI providers every day while developing our gateway. When a call failed or took unusually long, the debugging process was frustrating:
- Check our own logs — looks fine on our end
- Check the provider's status page — shows "All Systems Operational" (it almost always does)
- Check Twitter / X — find a thread from 45 minutes ago saying "anyone else seeing Anthropic issues?"
- Wait another 30 minutes to confirm it's a provider issue, not ours
We needed actual data, not a status page that was manually updated (when updated at all). The first version of LLM Status was literally:
#!/bin/bash
# check_anthropic.sh — ran via cron every 5 minutes
START=$(date +%s%N)
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" \
-H "x-api-key: $ANTHROPIC_KEY" \
-H "content-type: application/json" \
-d '{"model":"claude-3-haiku-20240307","max_tokens":5,"messages":[{"role":"user","content":"hi"}]}' \
https://api.anthropic.com/v1/messages)
END=$(date +%s%N)
DURATION=$(( (END - START) / 1000000 ))
echo "$(date -u) anthropic $RESPONSE ${DURATION}ms" >> /var/log/llm-probe.logThat log file grew for two weeks before we realised we'd quietly built something useful.
Phase 1 — Making It Visible
We added a simple Next.js frontend that read from the log file and showed a coloured grid of provider health. Shipped it internally at http://10.0.0.5:3000. Within a day the entire engineering team had it bookmarked.
A few weeks later we open-sourced it. A HackerNews comment from someone at a fintech company: "This is the thing I've been wanting to build for months, just open-sourced it."
That response made us take it seriously as a project rather than a tool.
Phase 2 — Real Architecture
The shell script worked fine for 2 providers. At 10 providers it was becoming unmanageable. At 40 providers we needed proper infrastructure.
The main architectural decisions, made over Q2-Q3 2024:
TimescaleDB over plain Postgres. We initially used Postgres with a probe_results table. As data grew, queries for "last 30 days of hourly uptime" became slow. TimescaleDB's continuous aggregates solved this without requiring us to run a different database system — it's a Postgres extension.
Go for probers. Our API server is Rust but the prober workers are Go. Go's goroutine model made it trivial to run 50 concurrent probes with independent timeouts. We tried implementing the prober in Rust but async timeout handling across provider SDK calls was more complex than justified.
Jittered probe scheduling. We learned early that if all probers fire exactly on the minute, Anthropic (and others) see 20 simultaneous requests from 20 different IPs and flag us as suspicious. Adding ±15s random jitter to each probe removed this problem entirely.
The False Alert Problem
Early on, we had frequent false alerts — probes would fail for one or two cycles, trigger an incident notification, then immediately recover. Users were annoyed.
The fix was requiring sustained evidence before changing state. A provider is only marked "Degraded" if ≥30% of probes fail across a rolling 2-minute window. "Outage" requires ≥70% failure across 5 minutes. Single bad probes are recorded but never trigger notifications.
We also added a regional check: if only one of our 20 probe regions sees failures, we surface it as a "Partial Degradation" rather than a full outage — because it might be a routing issue affecting only one geography.
Phase 3 — Community and Contributions
By early 2025, the GitHub repo had 800+ stars and contributors from Anthropic, OpenAI, and several AI startups submitting probe definitions and fixing provider-specific parsing.
The most valuable contributions were provider-specific edge cases that we hadn't hit internally:
- Google Vertex behaves differently when the project quota is exhausted vs. when the model is rate-limited — different error codes, different retry strategies
- Cohere's streaming API sometimes sends SSE events out of order under load
- Some AWS Bedrock models return a 200 with an error JSON body rather than a proper HTTP error code
Each of these became a test case in the probe validation suite.
Governance
With outside contributors came the need for clear governance. We adopted the CNCF contributor ladder model (contributor → committer → maintainer) and added a CODEOWNERS file. Major architectural decisions go through a lightweight RFC process (GitHub issue with a 2-week comment period before merging).
2026 Roadmap
Q1 2026: Latency Benchmarks Dashboard (Shipped)
We shipped this in March 2026. The new benchmarks view shows p50/p95/p99 latency for every model over the last 30 days, normalized by output token count. This makes it possible to compare providers fairly — a model that generates twice as many tokens will naturally take longer, but cost per 1000 tokens might still be better.
The benchmark data feeds into SoxAI's channel health scoring, creating a direct integration between the two projects.
Q2 2026: Provider-Announced Maintenance Windows
Providers occasionally post maintenance windows in advance. Right now we treat those windows the same as outages — probes fail, incidents open. We're building a maintenance calendar that:
- Ingests provider status page RSS feeds and official status APIs (where they exist)
- Automatically suppresses incident notifications during announced windows
- Shows historical maintenance frequency in the dashboard
This is a meaningful quality-of-life improvement for teams that subscribe to our alert webhooks.
Q2 2026: Custom Probe API (Beta)
We want organisations to be able to run custom probes against their own models or fine-tunes, using the same infrastructure we use for public providers. The API will accept a probe definition (YAML, same format as our built-in probes), run it from all 20 probe regions, and return historical data via the standard REST API.
Use cases:
- Monitor a fine-tuned model hosted on Azure OpenAI
- Track latency of a self-hosted Ollama deployment from multiple regions
- Run domain-specific semantic checks ("does my customer-service model still follow the refusal policy?")
Private beta in Q2, GA in Q3.
Q3 2026: Alert Routing & Integrations
Currently LLM Status sends alerts via webhook and email. We're building native integrations for:
- PagerDuty — route alerts to on-call rotation based on which model is affected
- Slack — post rich incident notifications with latency graphs inline
- GitHub Actions — a
llmstatus-checkaction that gates CI deployments on provider health - Prometheus exporter — pull metrics directly into your existing Grafana stack
The GitHub Actions integration is the one we're most excited about. If your CI deploys a feature that depends on a specific AI model, and that model is currently in an outage, the workflow can automatically delay the deployment rather than deploying to a broken environment.
Q4 2026: Federated Monitoring
LLM Status currently runs a centralised probe fleet. Some users — particularly those in data-sensitive industries — want probes to originate from inside their own network, using their own credentials, with results that never leave their environment.
We're building a federated prober: a lightweight Docker container that runs the standard probe suite against any provider, but reports results only to a self-hosted LLM Status instance. The public llmstatus.io dashboard will optionally aggregate anonymised data from federated probers to improve global coverage.
What Won't Change
LLM Status will remain:
- Open source, MIT licensed
- Free to use at llmstatus.io for public provider monitoring
- Self-hostable with first-class documentation and Helm charts
- Privacy-respecting: we don't log probe request contents, only timing and success/failure
The commercial version (custom probes, team management, advanced alerting) will always be optional and will never be required for the core functionality.
Read the architecture deep dive: LLM Status Architecture: Monitoring 40+ AI Providers in Real Time
Source code: github.com/llmstatus/llmstatus