How Snowflake Cortex Billing Works

Unlike warehouse compute which bills by the second, Cortex AI functions bill per token processed. Tokens represent chunks of text, roughly 4 characters or 0.75 words for English content. A single AI_COMPLETE call on a paragraph may process 200-500 tokens. At scale, this adds up quickly. Different Cortex functions have different credit rates.

Cortex AI Functions and Their Cost Implications

Snowflake offers several Cortex functions, each with distinct cost characteristics. AI_COMPLETE / CORTEX.COMPLETE is the most commonly used LLM inference function. Cost depends on model choice and prompt length. AI_EMBED / CORTEX.EMBED_TEXT generates vector embeddings for semantic search. Costs scale with document count. Cortex Search charges per search request and for indexing. Cortex Analyst translates natural language to SQL, consuming one or more LLM calls per query. Document AI extracts from unstructured documents with high per-document credit cost.

Where Cortex Costs Spike Unexpectedly

The most common sources of unexpected Cortex spend are embedding pipelines that re-process entire datasets on every run, AI_COMPLETE calls inside SQL loops for each row in a large table, Cortex Search indexes that rebuild too frequently, and experimentation workloads that never got cleaned up after the prototype phase.

How to Track Cortex Credit Consumption

Query ACCOUNT_USAGE.CORTEX_FUNCTIONS_USAGE_HISTORY to see Cortex credit consumption by function type, user, and time period. Join with QUERY_HISTORY to understand which queries triggered the Cortex calls. Set up weekly monitoring queries that compare Cortex spend week-over-week.

Governance Patterns for Cortex AI Spend

Establish these controls before scaling Cortex workloads: require team approval before switching to larger LLM models, store embeddings in a table and only re-embed changed records, batch AI calls during off-peak hours instead of calling functions inline with user queries, run AI experiments on dedicated warehouses with resource monitors, and require cost impact analysis before any Cortex workload moves to production.

Monitor Cortex AI spend automatically with Anavsan

APEX detects Cortex credit anomalies, attributes spend by function and team, and alerts before AI experimentation costs spiral out of control.

Frequently Asked Questions

Snowflake Cortex AI functions bill per token processed, not per second. Token rates vary by function and model size. AI_COMPLETE with larger models costs more per token than embedding generation.
Query ACCOUNT_USAGE.CORTEX_FUNCTIONS_USAGE_HISTORY to see Cortex consumption by function, user, and time window. Join with QUERY_HISTORY to trace which queries triggered the Cortex calls.
Cache embeddings to avoid re-processing stable data, batch AI calls instead of inline per-row processing, choose smaller LLM models where accuracy allows, and enforce production review gates for new Cortex workloads.