Uber 20260514 Beyond Prediction Solving the Multiple Knapsack Problem at Scale How Uber Optimizes Incentives Summary

Generated by Codex with GPT-5

What happened

Uber’s official engineering blog published Beyond Prediction: Solving the Multiple Knapsack Problem at Scale: How Uber Optimizes Incentives, a May 14, 2026 post about Tarot, Uber’s internal targeting platform for allocating incentives under large-scale marketplace, budget, and user-experience constraints.

The post is interesting because it treats incentive targeting as an optimization system rather than a ranking model. A simpler growth stack might ask which offer has the highest predicted effect for each user. Uber’s problem is harder: millions of users, many possible incentives, multiple lines of business, separate quarterly budgets, concurrent campaigns, and a hard limit on how many offers a person should see. At that scale, a locally strong prediction can be globally wrong if it consumes the wrong budget, blocks a better incentive, or improves one marketplace objective while harming another.

Uber frames the system as a production instance of the multiple knapsack problem. The knapsacks are team or business-unit budgets. The items are incentive programs. The weight is the expected monetary cost. The value is predicted incremental impact, such as more trips, better driver utilization, or higher earnings. The goal is to choose assignments that maximize total value while respecting hard budget and eligibility constraints. Before Tarot, Uber says contention was handled with a first-in-first-out strategy, so earlier incentives could block more relevant later ones, and budget management remained reactive and manually adjusted.

Tarot’s architecture separates the pieces that need to change at different speeds. A central Golang orchestrator coordinates targeting runs, taking triggers from a cron workflow and interacting with a use-case store, segmentation platform, ML gateway, budgeting platform, budget pacer, multi-lever optimizer, and domain execution services. The segmentation layer produces eligible users. The ML layer scores each user-lever opportunity. The budget layer reports how much capacity is still available. The optimizer turns those inputs into assignments, and downstream services apply the selected incentives.

The execution flow is deliberately batch-oriented. A scheduled run defines which incentive groups should be optimized together, fetches eligible user pools, forms the Cartesian product of users and valid levers, queries predictions for each opportunity, asks the pacer for current budget caps, solves the constrained assignment problem, and publishes the result. That makes the core targeting decision a coordinated optimization pass rather than a sequence of isolated campaign decisions.

The ROI layer is a useful design choice. Uber does not reduce the problem to a single learned score baked permanently into a model. The prediction layer estimates uplift across a vector of metrics, such as rides trips, delivery orders, utilization, or earnings. A valuation layer then applies dynamic weights that reflect the current market and strategy. The final score is a weighted combination of predicted metric changes. This lets Uber adjust priorities, such as growth versus efficiency, without retraining the underlying uplift models every time business context changes.

The budget pacer handles the gap between probabilistic predictions and deterministic finance constraints. It continuously reconciles configured budget, observed spend, and predicted liability for incentives that have been assigned but not fully paid out. If actual cost runs above forecast, later runs get tighter caps. If spend is below plan, the system can release more capacity. In effect, Tarot couples a mathematical optimizer with a feedback controller so that local assignment decisions remain consistent with longer-period budget agreements.

The most concrete engineering lesson comes from the two scaling pivots. First, Uber initially modeled incentive costs as time-series curves across the life of each incentive. That matched financial accrual more naturally, but it expanded the optimization problem from a single cost dimension into many per-day constraints. The result was higher latency, larger memory pressure, and artificial choke points where one predicted daily spike could block an otherwise useful assignment. Uber collapsed the predicted total cost to the assignment timestamp, simplifying cost from a vector to a scalar. The post says this improved simulated budget utilization from about 68% to 99.99% while restoring fine-grained geographic control.

Second, Uber changed solvers. The team first modeled the problem as linear programming with HiGHS, which worked for small batches but failed to converge quickly as scale rose. Around 100,000 users with a single incentive lever, finding a feasible solution could take more than 24 hours. The team then recognized the problem as a discrete combinatorial assignment problem and moved to Google OR-Tools CP-SAT. The same scale that stalled the LP approach could be solved in minutes, and Uber reports that CP-SAT stress tests across millions of users and multiple simultaneous levers stayed within strict production SLAs.

Why it matters

The broader point is that prediction is only one part of production ML. A marketplace system can have accurate uplift estimates and still allocate poorly if it lacks a global constraint model. Tarot’s value comes from joining model outputs with budget pacing, eligibility rules, contention limits, and solver-backed optimization. The machine-learning system produces opportunity estimates; the optimization system decides which opportunities can coexist.

The post is also a useful reminder that mathematical purity is not the same as production fit. Modeling costs across daily curves may look more faithful to the real world, but it made the solver slower and created brittle bottlenecks. Collapsing costs to a scalar was an intentional abstraction that made the system more useful. Likewise, linear programming was not wrong in principle, but CP-SAT better matched the discrete assignment structure that mattered operationally. The engineering work was not just selecting a solver; it was shaping the problem so the solver could produce timely decisions at marketplace scale.

The two-layer ROI design is especially transferable. Teams often encode business priorities directly into model targets and then pay the retraining cost when priorities change. Uber’s separation of predicted metric uplift from strategic valuation gives the system a cleaner control surface. Models can estimate what may happen, while product and marketplace strategy can decide what those outcomes are worth in a given context.

There is also a reliability lesson in the pacer. Without feedback, an optimizer can overspend because its costs are forecasts, not facts. With a pacer, the system observes realized cost and adjusts future capacity, turning campaign execution into a closed-loop process. That pattern matters for any automated decision system that makes commitments before the true outcome is known: prediction should be reconciled continuously against observed state.

Takeaway

Uber’s Tarot writeup is a strong example of applied research-engineering in a production marketplace. The system combines uplift modeling, constraint programming, budget feedback, and domain-specific simplification to move from manual heuristics and FIFO contention toward globally coordinated incentive allocation.

The practical takeaway is that high-scale ML products often need an optimization layer after prediction. Once many actors compete for constrained resources, the important question is not simply “what is the best action for this user?” It is “which set of actions is best under all shared constraints?” Tarot’s answer is to make that question explicit, solve it with machinery that matches the discrete structure of the problem, and keep the resulting decisions tied to observed budget reality.