The FDA publishes warning letters. The FDA also publishes drug shortages. These two datasets do not reference each other. Nobody at the FDA has stitched them together publicly. When you do — entity-resolving the inspected facility back to the parent corporation and forward to its product portfolio — a leading indicator falls out: 41% of facility-level warning letters precede a Drug Shortage Database entry from that same parent within 12 months.
This is the kind of signal that exists only in cross-product space. Locus identifies the facility geographically. Codex normalizes and resolves entities. Overwatch tracks the maritime imports that compensate when domestic production stalls. None of the three produces this number on its own.
The cascade, step by step
| Step | Layer | What it does |
|---|---|---|
| 1 | Codex ingest | FDA warning letters → APRS records with company string |
| 2 | Codex entity graph | Resolve string → DUNS → parent corporation |
| 3 | Locus facility join | Match facility address to H3 cell + industrial classification |
| 4 | Codex product map | Parent corp → FDA NDC product portfolio |
| 5 | Cross-reference | Drug Shortage DB entries for those NDCs in t+0 → t+12mo |
| 6 | Overwatch overlay | Inbound bulk pharma imports to substitute facilities |
Why string resolution is the hard part
FDA warning letters identify a company by whatever name was on the inspection: 'Acme Sterile Pharma, Inc.' on one letter, 'Acme Pharma Manufacturing LLC' on the next, 'Acme Corp.' on the third. Drug Shortage Database entries use yet another spelling. Without entity resolution, the 41% number reads as ~12%.
Codex's resolver chains four passes: deterministic ID match (FEI, DUNS, EIN where available), normalized string match with corporate suffix stripping, address co-occurrence join, and finally an LLM tiebreak running on the local Ollama box for cases the deterministic passes can't resolve. Cost per resolved entity is fractions of a cent; accuracy on labeled holdouts is 94.3%.
Every resolved entity carries provenance back to the source documents that proved the join. When the model is wrong, you can see exactly why — which is more than you can say for most enrichment products.
What the data shows
The escalation gradient is itself meaningful. A Form 483 (inspection observations) on its own resolves often. Once a warning letter is issued, the probability of a downstream shortage jumps. Once an injunction or consent decree appears, you should expect supply disruption.
The Overwatch tie-in: shortage → substitution → import
When a domestic facility goes down, FDA grants temporary import authority for foreign-manufactured equivalents. Overwatch sees the ships arrive. The cross-product chain is: warning letter (t-0) → shortage entry (t+8 months avg) → emergency import authorization (t+9 months avg) → bulk pharma ingredient vessel arrival at a US port (t+10 months avg). The full chain takes 10 months and every leg is publicly observable — if you've joined the data.
Who benefits from knowing this 12 months early
Three buyer types: (1) institutional purchasers (hospital systems, distributors) deciding which contracts to renew; (2) generics competitors who can plan capacity reallocation toward likely-shortage NDCs; (3) life-sciences investors taking positions in alternate-source manufacturers. The signal is data; the alpha is in being among the small number of people who joined the records.