Generated by Codex with GPT-5
What changed
Uber Engineering’s official blog published Modernizing Artifact Storage at Uber, a May 28, 2026 account of replacing a fragile on-premises artifact repository without moving the operational burden into every build.
Artifact storage is easy to underestimate because it often looks like a passive dependency. At Uber it sits on the critical path for builds across large monorepos and thousands of smaller repositories. Builds resolve hundreds or thousands of dependencies, and the platform stores the outputs that downstream systems consume. At that scale, an artifact repository is developer infrastructure with production-service requirements: it must remain available during failures, serve immutable bytes correctly, keep latency low, and avoid turning growth into a sequence of risky storage interventions.
The legacy system ran in two Uber data centers. Each data center hosted a five-node cluster with a replication factor of three, and each node stored artifacts on local disks. GeoDNS sent clients to the nearest cluster. If that cluster could not find an artifact locally, it queried its peer data center and stored the fetched copy. Writes triggered asynchronous replication, while a cron job detected and repaired gaps.
That architecture had a growing mismatch between the platform’s operating model and its scale. Disk exhaustion could stop writes even when capacity existed elsewhere in the cluster. Replication could fail silently and leave a supposedly replicated artifact on one node. Hardware replacement required carefully throttled evacuation of hundreds of terabytes from local disks. Software upgrades involved multi-terabyte schema migrations, coordinated configuration changes, traffic failover, and limited rollback options. The system was highly available on paper, but much of that availability depended on engineers performing slow, high-risk recovery work correctly.
The new architecture
Uber moved the authoritative artifact repository to a managed SaaS platform backed by cloud blob storage and deployed across multiple cloud regions. That removes local-disk capacity management from Uber’s critical path and transfers upgrades, security patches, and platform maintenance to the managed service. Cross-region replication provides isolation from regional failures.
The cloud move solved one class of problems but exposed another. Uber downloads more than five petabytes of artifacts each month. Fetching those bytes repeatedly from a cloud-hosted SaaS origin would create significant egress cost and could add latency to builds. The answer was not to keep operating a second authoritative artifact system on premises. Uber inserted a purpose-built validation proxy between clients and the SaaS origin.
The proxy keeps a local artifact cache and a MySQL metadata store containing artifact URLs, checksums, and last-modified timestamps. Every client request still reaches the SaaS origin. When the proxy has seen an artifact before, it sends conditional HTTP headers such as If-None-Match and If-Modified-Since. A 304 Not Modified response lets the proxy serve the cached bytes. A 200 OK response means the object is new or changed, so the proxy streams the origin response to the client and updates its cache asynchronously.
That distinction matters. Uber did not build a conventional time-to-live cache and hope that stale entries expire quickly enough. The SaaS repository remains the source of truth, and the proxy asks it to validate every cached response before serving bytes. The expensive payload transfer is avoided when the artifact is unchanged, but correctness is still decided by the authoritative system on every request.
The proxy runs active-active across two Uber data centers, each pinned to the nearest SaaS region. GeoDNS can fail traffic over if an Uber data center or cloud region becomes unavailable. This preserves low-latency artifact delivery while letting the cloud service own durable storage and regional replication.
Failure becomes cheaper than inconsistency
The most useful design principle in the post is that proxy failure should degrade economics or performance, not correctness. If the proxy’s cache or MySQL dependency is unavailable, it can fall back to passthrough mode and send requests directly to the SaaS origin. Builds continue working, although latency and egress cost increase. If cached state is stale or inconsistent, the origin validation prevents it from being served. If a streamed download is partial or corrupt, the proxy discards it rather than committing it to cache.
This is a cleaner failure model than the legacy system’s local-disk replication. In the old design, an incomplete replication event could quietly reduce durability, while disk pressure and node evacuation created operational risk inside the storage system itself. In the new design, the local layer is useful but disposable. Losing it hurts efficiency, not the integrity of the artifact repository.
The choice to build a dedicated proxy rather than configure a generic HTTP cache follows from the same reasoning. Traditional proxies optimize around time-to-live expiration and best-effort eviction. Uber needed per-request validation, graceful passthrough, checksum-aware handling, and observability across the serving path. The result is a deliberately narrow component: it does not replace the origin, and it does not make independent freshness decisions.
The measured impact is substantial. Uber reports that the proxy reduces egress from the SaaS platform by more than 99%, cuts overall artifact-related egress costs by nearly 90%, and achieves 99.99% reliability at the proxy layer. End-to-end latency remains comparable to the prior on-premises architecture because unchanged artifact bytes are served internally after a small metadata-only origin request.
The next bottlenecks
The post is also useful because it identifies where a successful cache changes the shape of the remaining work. Some build tools use partial downloads extensively, so uncached HTTP Range responses can force unnecessary full-object fetches. A small number of artifacts larger than 8 GB account for disproportionate egress when repeatedly accessed. Hot objects can create burst traffic and thundering-herd behavior. Uber is therefore considering end-to-end range support, specialized large-object caching, node-local hot caches, request coalescing, and stronger backpressure.
Those follow-on problems are evidence that the main boundary is working. Once the origin and validation proxy divide responsibility cleanly, optimization can proceed incrementally. A node-local cache can sit in front of the shared cache without changing the source of truth. Request coalescing can reduce duplicate fetches without weakening validation. Large-object policies can change cost behavior without reintroducing artifact ownership into the proxy.
Takeaway
Uber’s migration is a practical example of separating durable authority from local acceleration. The managed service owns storage durability, regional replication, and upgrades. The internal proxy owns efficient delivery, observability, and graceful degradation. Conditional HTTP requests connect the two layers with a simple rule: cached bytes may be served only after the authoritative origin confirms that they are still current.
That pattern applies beyond build artifacts. Large engineering systems often accumulate operational risk because a component combines source-of-truth responsibilities with low-latency serving and local capacity management. Moving the durable layer to a managed service can reduce that burden, but only if cost and latency are addressed without creating a second source of truth. Uber’s validation proxy shows one disciplined way to do that: keep the local layer disposable, make correctness checks explicit, and ensure that failures cost money or time before they cost reliability.