Snowflake Credit Management
Snowflake Cortex
Practical Guide
Snowflake Cortex Cost: Why Most Teams Are Flying Blind
Apr 17, 2026
Abinash E, Snowflake Developer & Data Engineer @ Anavsan

Snowflake Cortex — especially Cortex Code launched in February 2026 — bills on a token-based model that is fundamentally different from warehouse compute. Most Snowflake teams have warehouse cost visibility. Almost none have model-level attribution for their AI services. The result: CORTEX_CODE_SNOWSIGHT or AI_SERVICES can consume the majority of total service spend without a single warehouse alert firing. This post covers how Cortex billing actually works across Code, Agents, and Snowflake Intelligence, the specific patterns that cause unexpected spikes, the native Snowflake queries to find your current exposure, the governance controls available today, and what model-level visibility looks like in practice.
Snowflake Cortex arrived quietly for most data teams. A developer enabled Cortex Code in Snowsight to speed up a query rewrite. A data scientist started using Cortex Agents for document analysis. Snowflake Intelligence got turned on for a business intelligence use case. Each adoption felt small and self-contained.
Then the bill arrived.
The problem is not that Cortex is expensive — in many cases the productivity gains justify the cost entirely. The problem is that most Snowflake environments have robust warehouse cost monitoring and almost no Cortex cost visibility. Teams that can pinpoint a warehouse sizing issue within hours are completely blind to which AI model consumed 80% of their service credits last week, through which Cortex source, and by which team or use case.
This post covers how Cortex billing actually works, the usage patterns that produce unexpected spikes, the native queries to surface your current exposure, the governance controls Snowflake now provides, and what model-level attribution looks like when you have genuine visibility rather than just an aggregate spend number.
How Cortex Billing Works — and Why It Surprises Teams
Understanding why Cortex costs feel unpredictable starts with understanding that Cortex operates on a fundamentally different billing model from virtual warehouse compute.
Warehouse compute bills by the second of runtime at a fixed credit rate per size tier. An X-Small warehouse costs 1 credit per hour. A Large costs 8. The relationship between configuration and cost is direct and predictable. Engineers who have worked with Snowflake develop an intuition for warehouse costs over time.
Cortex bills on tokens. A token is roughly 4 characters of text — approximately three-quarters of an English word. A 200-line Python file is approximately 2,000 tokens. A complex SQL query with full schema context might be 5,000 to 15,000 tokens per interaction. Every prompt sent to a Cortex model consumes input tokens. Every response generates output tokens. Both are billed.
The critical implication: there is no warehouse size to right-size, no auto-suspend to configure, and no cluster count to optimize. Cost scales with the volume and complexity of AI interactions — which is driven by developer behavior, application design, and adoption spread across the organization. None of these show up in your warehouse monitoring dashboard.
Cortex Code, launched at BUILD London on February 3, 2026, adds another layer of complexity. Each Cortex Code session follows a structured pipeline: the developer sends a prompt, Snowflake packages it with context — conversation history, open files, schema metadata — sends the full package to the underlying model as input tokens, receives output tokens, and loops through verification steps. A single productive engineering session might consume 10,000 to 50,000 tokens across multiple turns. At scale, across an engineering team with broad Cortex Code access, this compounds quickly.
In Snowflake's account usage views and cost management interfaces, this AI service spend appears under two service type labels depending on the account configuration: CORTEX_CODE_SNOWSIGHT for Cortex Code accessed through the web interface, and AI_SERVICES as a broader umbrella that captures various Cortex AI workloads. Both can become the dominant line item in service spend — sometimes representing 95% or more of total service credits in a given period — while all warehouse metrics look completely normal.
The Billing Components Most Teams Miss
Cortex billing is not a single line item. It spans several components that accumulate through different mechanisms:
Token compute is the primary cost driver for Cortex Functions, Cortex Agents, Cortex Code, and Snowflake Intelligence. Billed per million tokens at rates that vary by model. Claude Sonnet models cost differently from Claude Opus models. Output tokens cost more than input tokens. The specific rates evolved significantly in April 2026 when Snowflake introduced dedicated AI Credits decoupled from edition pricing — reducing costs for Enterprise and Business Critical customers by 33% to 67% depending on region.
Serving compute is the component that surprises teams running Cortex Search. Cortex Search charges a serving layer based on indexed data volume — measured in gigabytes per month — regardless of how many queries are executed against it. A service with 50GB of indexed data plus 20GB of embeddings bills for all 70GB monthly whether it receives one query or ten thousand. Development and staging services left running indefinitely generate this cost continuously without any alert firing.
Embedding compute applies when Cortex Search or other vector services process text into embeddings. Charged per token of text in the indexed columns, at rates that vary by the selected embedding model. Creating a search service on a 10 million row table with 500 tokens per row generates a substantial one-time embedding cost on initialization, plus incremental costs for each subsequent insert or update.
Warehouse compute associated with Cortex is perhaps the least visible component. Cortex Search refresh pipelines, Cortex Analyst queries, and various Cortex orchestration steps consume standard warehouse compute on top of token costs. This compute appears in QUERY_ATTRIBUTION_HISTORY but not in the AI-specific usage views. A team watching only token costs misses the associated warehouse spend, and a team watching only warehouse costs misses the token layer entirely.
Context caching in Cortex Code is a cost reduction mechanism that most teams are not yet using deliberately. Snowflake caches conversation context between turns in the same session, billing subsequent turns at approximately 10% of the normal input token rate — a 90% discount on repeated context. The counterintuitive result: longer sessions are cheaper per turn than short ones. Eight interactions in a single session cost roughly 40% less than eight separate single-turn sessions. At 12 or more turns, savings approach 50%. Teams that habitually close and restart Cortex Code sessions are paying a significant premium they could eliminate by keeping sessions open longer.
The Pattern That Creates Unexpected Spikes
The CORTEX_CODE_SNOWSIGHT service type — and its rollup into AI_SERVICES — is where the most concentrated cost surprises are appearing in 2026. When Cortex Code access is enabled in Snowsight, it becomes immediately accessible to every user with the appropriate role. Unlike a developer tool requiring installation and explicit setup, Cortex Code in Snowsight is a few clicks away from any analyst or engineer who opens the interface.
The typical spike pattern plays out as follows. A team enables Cortex Code access broadly, intending it as a productivity tool. In the first week, several engineers explore it for query help, code generation, and data discovery. Each session generates thousands of tokens. By the end of the week, CORTEX_CODE_SNOWSIGHT or AI_SERVICES appears as the dominant line item in service spend — sometimes representing more than 95% of total service credits — while the warehouse dashboard looks completely normal. No alerts fire. No anomaly is detected. The first signal is the weekly or monthly spend summary.
The compound effect accelerates as more users discover the capability. Token consumption is a function of user count, session frequency, and the complexity of interactions. All three tend to increase as familiarity with the tool grows. Without model-level attribution, the spend increase is visible as an aggregate number but not attributable to specific users, sessions, or interaction patterns.
A documented example from 2025: one team processing 1.18 billion records with Cortex Functions paid nearly $5,000 for a single query. The culprit was token costs, not compute costs — and their traditional cost monitoring showed nothing unusual in warehouse spend.
What Native Snowflake Monitoring Gives You
Snowflake has significantly expanded native Cortex monitoring capabilities in 2026, including a March 2026 general availability release of CORTEX_AI_FUNCTIONS_USAGE_HISTORY. These native views give platform teams genuine signal to work with:
Track total Cortex AI Functions spend by model:
Track Cortex Code consumption by surface and user:
Check Cortex Search idle serving costs:
The limitation of native monitoring is scope. These views are account-level. In multi-account organizations — separate accounts for development, staging, production, or different business units — each account requires separate queries. Aggregating across accounts requires additional infrastructure. And while the views tell you aggregate token consumption and credit spend by model, connecting that to specific teams, pipelines, or business units still requires manual cross-referencing.
Governance Controls Available Today
Snowflake introduced per-user daily credit limits for Cortex Code in 2026. These are the primary governance levers platform teams have available right now:
Set account-wide daily limits:
Override for specific users:
When a user's rolling 24-hour usage exceeds the configured limit, that surface returns an error until usage drops below the threshold. Limits are per-surface — CLI and Snowsight track separately, which matters for teams where users access Cortex Code through both interfaces simultaneously.
For Cortex Search, the primary governance lever is the TARGET_LAG setting. A search service configured with a 1-minute TARGET_LAG refreshes continuously, billing serving compute and embedding tokens at the highest possible rate. For most use cases, hourly or daily refresh is sufficient. Suspending development and staging services during off-hours eliminates serving charges for those environments entirely.
AI service budgets, which became generally available in April 2026, allow teams to track shared AI spend by business unit or cost center. This is essential for showback and chargeback in larger organizations where multiple teams share Cortex access and the AI_SERVICES line item needs to be split by owner.
What Model-Level Attribution Looks Like in Practice
The gap between native monitoring and genuine AI cost governance is attribution depth and cross-account aggregation. Native views give you token consumption and credit spend by model within a single account. What is harder to surface without dedicated tooling is which models are active across accounts, through which Cortex source, at what token volume, and in a single comparable view — in either credits or the currency your finance team uses.
The Anavsan Services view makes this concrete. Across a Snowflake account, it surfaces each service type — CORTEX_CODE_SNOWSIGHT, AI_SERVICES, AUTO_CLUSTERING, SEARCH_OPTIMIZATION, and others — with their consumption in both credits and USD depending on the display setting. When AI_SERVICES shows 33.87 credits and every other service shows zero, the attribution is immediate and actionable. When CORTEX_CODE_SNOWSIGHT shows $187.41 in a 7-day window while SERVERLESS_TASK shows $0.54, the cost story tells itself without requiring a SQL query.
The Cortex models view adds the second layer: which specific model drove that consumption. Claude-opus-4-5 via CORTEX_AGENT consumed 11.08M tokens over 6 months. Claude-4-sonnet via SNOWFLAKE_INTELLIGENCE consumed 11.94M tokens over the same period. A platform lead with that view can answer the question finance is actually asking — which AI workload is driving cost growth, and is it the expensive model or the cheaper one — rather than pointing to an aggregate AI_SERVICES line item and offering to investigate.
That level of attribution is also what makes model selection conversations productive. If claude-opus is being used for tasks that claude-sonnet handles equally well, that is a cost optimization conversation with a specific number attached to it. Without model-level token attribution, the conversation stays abstract.
A Practical Cortex Cost Governance Checklist
Before the next sprint enables broader Cortex access, run through these steps in order:
Establish your baseline first:
Run CORTEX_AI_FUNCTIONS_USAGE_HISTORY for the last 30 days
Run both Cortex Code usage views for the last 7 days
Check CORTEX_SEARCH_SERVING_USAGE_HISTORY for idle services
Identify which models are active and through which Cortex source
Note whether your spend is appearing as CORTEX_CODE_SNOWSIGHT, AI_SERVICES, or both — this tells you which Cortex surfaces are active in each account
Set governance before broad rollout:
Configure per-user daily credit limits before enabling Cortex Code widely — start at 10 to 20 credits per day and adjust based on actual usage data from the first two weeks
Identify power users who need higher limits and configure user-level overrides explicitly
Set dedicated warehouses for Cortex Search refresh pipelines to isolate costs from general warehouse spend
Audit TARGET_LAG settings on all Cortex Search services
Optimize interaction patterns:
Train engineers on context caching — longer sessions are cheaper per turn than short ones. Keeping sessions open for related tasks reduces token costs meaningfully.
Review model selection by use case. Not every task requires the most capable or most expensive model.
Batch Cortex Function calls where possible rather than running them per-row on large tables.
Ongoing review cadence:
Review Cortex Code usage by user weekly during the first 60 days of rollout
Monitor serving compute separately from token compute for all Cortex Search services
Establish a cost baseline before scaling adoption so growth is measurable against a known starting point
Set AI service budgets by team or cost center so monthly AI spend is attributable, not just visible
The Broader Pattern
Cortex represents a new cost category in Snowflake environments that does not behave like any existing category. Warehouse costs respond to configuration. Storage costs respond to lifecycle policies. Cortex costs respond to adoption patterns, interaction complexity, and model selection — all driven by human behavior rather than infrastructure decisions.
The teams that will manage Cortex costs most effectively are not the ones who restrict access most aggressively. They are the ones who establish attribution and governance infrastructure before adoption scales — so that when AI_SERVICES becomes the dominant cost in an account, they already know which model drove it, through which source, and what the trend looks like across all their accounts over time.
Flying blind on warehouse costs is expensive. Flying blind on Cortex costs is more expensive, because the usage patterns are harder to predict and the billing model is less familiar. The visibility layer for Cortex AI needs to be built with the same rigor as the visibility layer for compute — not added reactively after the first surprise bill.
Frequently asked questions
Why is Snowflake Cortex Code so expensive?
Cortex Code bills on tokens, not warehouse runtime. A single productive engineering session with full schema context can consume 10,000 to 50,000 tokens. When access is enabled broadly in Snowsight, token consumption scales with every user who opens the interface — and no warehouse alert fires because the spend lives in a completely separate billing layer that most existing monitoring doesn't cover.
What is CORTEX_CODE_SNOWSIGHT in my Snowflake service spend? CORTEX_CODE_SNOWSIGHT is the service type for Cortex Code usage through Snowflake's web interface (Snowsight). It is billed by token and tracked separately from CLI-based usage. In environments where Cortex Code was recently enabled broadly, it frequently becomes the dominant line item in total service spend within the first week.
What is AI_SERVICES in my Snowflake service spend?
AI_SERVICES is a broader service type umbrella that captures various Cortex AI workloads in Snowflake's account usage reporting. Depending on your account configuration and which Cortex features are active, AI spend may appear as AI_SERVICES, CORTEX_CODE_SNOWSIGHT, or both. Both represent token-based AI billing rather than warehouse compute.
How do I monitor Snowflake Cortex costs natively?
Use CORTEX_AI_FUNCTIONS_USAGE_HISTORY for AI function spend by model, CORTEX_CODE_CLI_USAGE_HISTORY and CORTEX_CODE_SNOWSIGHT_USAGE_HISTORY for Cortex Code by user and date, and CORTEX_SEARCH_SERVING_USAGE_HISTORY for search serving costs. These views became fully available in Snowflake's March to April 2026 releases.
How do I set limits on Snowflake Cortex Code usage?
Use ALTER ACCOUNT SET CORTEX_CODE_CLI_DAILY_EST_CREDIT_LIMIT_PER_USER and CORTEX_CODE_SNOWSIGHT_DAILY_EST_CREDIT_LIMIT_PER_USER to set rolling 24-hour per-user limits at the account level. Individual user overrides are also available. When a user hits the limit, that surface blocks access until the rolling 24-hour window resets.
What is prompt caching in Cortex Code and does it reduce costs?
Yes, significantly. Snowflake caches conversation context between turns in the same session, billing subsequent turns at approximately 10% of the normal input token rate. Eight turns in one session costs roughly 40% less than eight separate single-turn sessions. This is automatic — no configuration needed, but engineers need to know longer sessions are cheaper to use it deliberately.
Does Cortex Search cost credits even when no searches are running?
Yes. Cortex Search charges serving compute based on indexed data size in GB per month regardless of query volume. A service with 70GB of indexed data and embeddings bills for all 70GB monthly whether it receives one query or one million. Development and staging services should be suspended when not in use.