Simon AI Context Layer

Simon's agents are built on the same foundation models that power tools like Claude and ChatGPT. What makes them effective for marketing work isn't the model, it's how we shape and surface your data so the model can actually reason about it. The context layer is the bridge between raw warehouse schemas and an agent that can do useful work on top of them.

Getting started

The context layer is always on. It applies automatically whenever you work with any Simon AI agent. To review and configure what the agent has access to, navigate to AI Studio > AI Context.

What the context layer is

At its core, the context layer is a bottoms-up enrichment of your warehouse — conceptually a Snowflake Semantic View, extended with profiling, sample values, and semantic typing. It's derived from the tables the agent can see, not authored by hand.

Field-level profiling

For every field the agent can see, we capture and keep fresh:

CategoryWhat we store
ShapeTotal count, null count and percentage, unique count, sample unique values, top values and their distribution
NumericMin, max, mean, median, standard deviation, 25th and 75th percentile
TemporalEarliest and latest values, distribution across time periods

This is the information that lets an agent decide whether a field is actually useful for a question before it writes any SQL — the difference between a status column with three clean values (active, churned, paused) and a status_code column with 47 mostly-null variants.

Table-level semantic tags

Tables are tagged with a semantic type so the agent knows the role they play in the customer picture. Today's taxonomy includes:

  • behavioral and behavioral_signal
  • product_catalog
  • weather
  • social_media
  • regional_events

These tags let the agent quickly decide that a weather-enrichment table is relevant to a seasonality question but not to a loyalty-tier question, without having to read every column to figure that out.

The real innovation: topological access

The harder problem isn't gathering this metadata, it's giving the LLM access to it.

A typical RAG-style approach dumps the whole schema plus descriptions into the context window. That works until the warehouse has more than a few dozen tables, at which point you blow past the usable context budget and the model starts losing relevant details in the middle of the prompt.

We don't dump the context layer into the prompt. We give the agent a map, and let it pan and zoom.

The agent starts with an overview of the semantic layer — what domains of data exist and how they're shaped — and then uses a single search-and-navigate tool to drill into the parts it cares about: list tables matching a semantic type, fetch profiling for a specific field, sample rows from a table. This mirrors how a person approaches an unfamiliar database: get the shape first, zoom into what's interesting, pull details on demand.

This pattern — Ben Shneiderman's "overview first, zoom and filter, then details-on-demand" — turns out to be critical for agentic performance. It:

  • Keeps the context window focused on the current question, not the entire catalog
  • Replaces a sprawl of one-off tools (list_tables, get_schema, describe_field, get_samples...) with a single search interface that LLMs already understand from cartographic and filesystem APIs they were trained on
  • Lets the agent stay cheap and fast on simple questions while still being able to reach deep into the data for harder ones

Semantic understanding vs. raw query generation

A question that comes up often: why not just connect a general-purpose model directly to the warehouse?

That approach gives you query generation. The model sees column names, guesses, and writes SQL. It's fine for ad-hoc exploration, but it has no view of which tables are relevant to a marketing question, how cardinal a field is, what the top values look like, or which tables represent behavioral signals versus catalog structure versus third-party enrichment. Every question starts from zero.

The context layer front-loads that discovery. By the time the Simon agent writes a query, it has already narrowed the search space, checked distributions, and confirmed the field it's about to reference actually contains useful values. The result is fewer hallucinated joins, fewer queries against empty or malformed columns, and faster, cheaper answers.

What the context layer does not (yet) encode

The context layer describes the shape of your data — what exists, how it's distributed, what type of signal it represents. It does not, today, encode your team's business definitions: how you calculate LTV, define churn, structure loyalty tiers, or attribute a campaign response. Those definitions still live in the queries your team writes and in the datasets that feed Simon.

A true business-definition layer — think LookML-style semantics sitting on top of the profiled warehouse — is a direction we're actively investing in. For now, the context layer's job is to make sure the agent is always looking at the right data, with enough profile to use it well.

Related