Cloudflare 20260528 How We Built Cloudflares Data Platform and an AI Agent on Top of It Summary

Generated by Codex with GPT-5

What happened

Cloudflare’s official engineering blog published How we built Cloudflare’s data platform and an AI agent on top of it, a May 28, 2026 post about Town Lake, its internal unified analytics platform, and Skipper, an AI data agent built on top of that platform.

The post is interesting because it treats “ask the data” as an infrastructure problem before it treats it as a model problem. Cloudflare operates at a scale where data sprawl is not an inconvenience; it is a product and operations bottleneck. Events, account metadata, billing rollups, customer support tickets, security signals, raw logs, and streaming data sit across production databases, ClickHouse clusters, Kafka topics, BigQuery datasets, R2 buckets, and other systems. A useful internal question might require knowing which system owns the authoritative source, whether the data is sampled, how fresh it is, which identifier translates between tables, and which credentials are needed.

Town Lake is Cloudflare’s answer to that problem: a unified SQL layer over heterogeneous data with governance built into the access path. Skipper is the natural-language layer over Town Lake, but Cloudflare is careful not to frame it as just a text-to-SQL demo. The agent works because the underlying platform exposes metadata, lineage, access rules, transformation code, runtime introspection, and curated data-model guidance. The model is not expected to guess the warehouse from a prompt. It is given tools and context that mirror how experienced internal analysts reason about the data.

The architecture

Town Lake is a lakehouse-style platform. Apache Trino provides the query engine, so a single SQL query can join data across systems such as Postgres, ClickHouse, and Iceberg tables stored on R2. R2 Data Catalog provides the managed Apache Iceberg layer for warm and cold data, with schema evolution, time travel, partition evolution, and compaction as data ages. That division lets Cloudflare keep recent or operationally important data queryable while reducing storage cost for older rollups that no longer need OLAP-database economics.

The supporting services are where the design becomes more than a generic lakehouse. DataHub is the catalog of tables, columns, owners, glossary terms, lineage, and descriptions. Lifeguard is the access-control service, storing rules in D1, pulling user and group membership from internal identity systems, and rendering policies consumed by Trino. Skimmer is the PII scanner. It continuously samples table columns, uses Workers AI to classify sensitive fields, and escalates flagged cases into an agentic second pass that can inspect table context and query Trino to verify findings. Transformer is the ELT engine, built on Workflows, where teams define SQL transformation DAGs with YAML frontmatter and get state tracking through Durable Objects, R2, and D1. Ingestion is handled by long-lived orchestration plus short-lived Kubernetes jobs that extract from operational systems and write Parquet into R2-backed Iceberg tables.

This is a deliberately compositional architecture. Trino supplies query federation, Iceberg supplies table semantics over object storage, DataHub supplies metadata, and Cloudflare’s own platform primitives supply state, orchestration, storage, access, and AI inference. The broader point is that natural-language analytics requires a dependable substrate: without lineage, ownership, policies, and transformation definitions, an agent can produce plausible SQL but not dependable answers.

Governance as part of the data path

The most important implementation choice is that Town Lake is default-closed. A newly connected database or newly created table is not queryable until Skimmer classifies it and a reviewer approves the table and columns. That reverses the common “open first, restrict later” pattern that creates quiet sensitive-data exposure in analytics environments.

Cloudflare makes that stricter model usable by automating the expensive parts. Skimmer detects obvious PII such as emails and phone numbers, but it also looks for less obvious sensitive fields such as API-token-like strings and opaque identifiers that can be traced back to users. Reviewers approve, override, or deny findings, and the platform turns a failed query into a self-service request rather than a dead-end permission error. Schema discovery and data access are separated: users can discover that a table exists, while unreviewed columns are hidden from DESCRIBE, SHOW COLUMNS, and broad selects. Sensitive fields are redacted by default, and raw PII requires an explicit session-level opt-in with permission checks and audit logging.

That design matters for AI agents because the agent inherits the data model’s security semantics. Skipper runs as the calling user. If the user cannot query a table, Skipper cannot query it on the user’s behalf. Shared dashboards are checked at view time against the viewer’s current permissions, not merely at save time against the creator’s access. This avoids the common failure mode where an agent or dashboard becomes an accidental privilege-escalation layer over a richer backend.

Why Skipper works

Skipper is a conversational agent that can search datasets, inspect schemas and lineage, write SQL, run Trino queries, fetch results, render charts, create dashboards, check access, and build transformation graphs. The hard part is not the chat interface; it is grounding. Cloudflare says early experiments showed that giving an LLM a list of tables and a SQL prompt led to hallucinated joins, wrong columns, and confident but incorrect numbers.

The production design layers context. First, DataHub provides schema and usage metadata, including keys and historical join patterns. Second, human annotations and curation tags steer the agent toward validated tables rather than scratch or internal tables. Third, code-derived metadata from Transformer tells the agent how tables are actually produced, including business logic that column descriptions often omit. Fourth, curated data-model pages explain how to think about domains such as billing, accounts, customers, and zones. Fifth, runtime introspection lets the agent use live queries such as DESCRIBE, distinct-value checks, and counts when static context is insufficient.

The most useful lesson is that code-derived context beat generic metadata. A column name can say little about defaults, fallbacks, filtering conventions, or joins. The SQL that builds the table captures those choices. Feeding that implementation context back into the catalog gives the agent a better chance of answering like someone who understands the organization’s data semantics, not just its schemas.

Code Mode and tool design

Cloudflare’s MCP implementation uses Code Mode rather than exposing dozens of separate tools directly. Instead of asking the model to call a long menu of tools one round trip at a time, Skipper exposes search and execute. The model writes a JavaScript snippet that calls the broader Skipper API inside a sandboxed Dynamic Worker isolate. A multi-step workflow can then search for tables, start a query, fetch results, and create a chart in one auditable program rather than several model turns.

That is a pragmatic agent-engineering choice. It reduces latency and token churn, but it also changes observability. The agent’s plan is captured as code, so reviewers and logs can inspect what it tried to do. It also narrows the surface area the model has to choose from. Cloudflare reports that tool overlap hurt reliability: multiple variants of search, list, and result-fetching tools confused the model. Consolidating tools and adding explicit modes produced a cleaner action space.

The lesson is not that every agent should execute generated JavaScript. It is that tool APIs for agents need the same design discipline as APIs for humans. Overlapping affordances, verbose prompts, and fragmented state force the model to spend capacity navigating the interface rather than solving the task. A smaller, composable tool surface plus a sandboxed execution environment can make complex workflows both faster and easier to audit.

Why it matters

Town Lake and Skipper show a pattern that is likely to matter across large organizations: production AI agents become useful when they sit on top of well-modeled operational systems. The apparent breakthrough is natural-language access to data, but the engineering work is in access control, lineage, metadata, transformation management, identity, auditing, schema evolution, ingestion idempotency, and memory.

The post also reframes memory in a practical way. Skipper does not need memory as a vague personality feature. It needs memory for recurring analytical corrections: filter this domain this way, avoid that table family, prefer this curated model, join through this identifier. Those are the small pieces of organizational practice that experienced analysts accumulate over time. Capturing them lets the agent improve on the questions a team actually asks rather than relearning the same internal conventions in every conversation.

The broader takeaway is that data agents should be built as governed interfaces over governed data platforms, not as privileged shortcuts around them. Cloudflare made the security model the data model: user identity, group membership, column review, PII redaction, dashboard sharing, and audit trails are all enforced beneath the agent. That is what lets the company put an LLM interface over sensitive internal data without turning the LLM into a new trust boundary.

For engineering teams, the durable lesson is that AI usability depends on non-AI infrastructure. The model can turn intent into analysis only if the system can answer where data lives, what it means, who owns it, how it was produced, who may see it, and how to verify a suspicious result. Town Lake gives Skipper those answers. Skipper then becomes less a magical analyst and more a high-level interface to a carefully governed data operating system.