Snowflake Cortex Services: A Practical Guide for Data Teams

TL;DR

A clear, practical guide to every Snowflake Cortex service — AI functions, Search, Analyst, Agents, Document AI, Guard, and Cortex Code — with use cases and best practices for each.

The Cortex Services Landscape

Snowflake Cortex has grown from a handful of LLM functions into a full AI services suite covering text generation, semantic search, natural language analytics, document extraction, agentic workflows, AI safety filtering, and AI-assisted development. This guide covers every major Cortex service — what it does, who it is for, when to use it, and the practical best practices that matter in production. No hype, no marketing copy — just a clear reference for data engineers, data scientists, and analytics engineers building on Snowflake in 2026.

Snowflake Cortex started as a small set of LLM functions you could call inside a SQL query. In 2026 it is a full AI services platform spanning text generation, semantic search, natural language to SQL, document extraction, agentic orchestration, safety filtering, and AI-assisted development. The innovation pace has been fast — fast enough that many data teams have adopted one or two services without fully understanding the rest of the landscape.

This guide covers every major Cortex service: what it does, who it is built for, when to use it over alternatives, and the practical considerations that matter when you move from experimentation to production. It is written for data engineers, data scientists, and analytics engineers who already work with Snowflake and want a clear, accurate reference — not a marketing overview.

Before diving into individual services, it helps to understand how the Cortex ecosystem is organized. There are three broad layers:

AI SQL Functions — AI capabilities you invoke directly inside SQL or Python. No setup, no infrastructure, no data movement. Call them in a SELECT statement the same way you call any other Snowflake function.
Managed AI Services — Purpose-built services you configure and deploy: Cortex Search for semantic retrieval, Cortex Analyst for natural language analytics, Document AI for document extraction. These run as background services with their own billing and lifecycle.
Agentic and Development Layer — Cortex Agents and Snowflake Intelligence for orchestrated AI workflows, and Cortex Code for AI-assisted development directly inside your Snowflake environment.

All Cortex services share one architectural principle: your data never leaves Snowflake's security perimeter. There are no external API calls, no data export, no API key management. Governance, RBAC, and row-level security apply to AI workloads the same way they apply to standard queries.

1. AI SQL Functions

What it is: The AI SQL function family is a set of LLM-powered functions callable directly in SQL. The core function is AI_COMPLETE(), which sends a prompt to an LLM and returns the response as a string — available inside any SELECT statement. The family has expanded significantly and now includes:

Function	What it does
`AI_COMPLETE()`	Text generation from any supported LLM
`AI_CLASSIFY()`	Multi-class categorization of text
`AI_EXTRACT()`	Entity and structured field extraction
`AI_SENTIMENT()`	Sentiment scoring
`AI_SUMMARIZE()`	Text summarization
`AI_TRANSLATE()`	Language translation
`AI_FILTER()`	Boolean filter on text conditions
`AI_REDACT()`	PII and sensitive data redaction
`AI_AGG()`	AI-powered aggregation across rows
`AI_EMBED()`	Generate vector embeddings

Who it is for: Data engineers building enrichment pipelines. Analytics engineers adding AI-derived columns to dbt models. Data scientists who want to prototype fast without leaving SQL.

Best use cases:

Batch enrichment at scale: classify, tag, or extract structured fields from large tables of unstructured text — support tickets, product reviews, contracts, CRM notes.
PII redaction in pipelines

Best practices:

Choose the right model for the task. AI_COMPLETE() supports multiple models including Claude Sonnet, Claude Opus, Llama, Mistral, and Snowflake Arctic. Smaller models (Llama 8B) cost significantly less per token and are sufficient for classification and extraction. Reserve larger models for complex reasoning tasks.
Batch your calls. AI SQL functions run per-row by default. On large tables, run them in batches using LIMIT and OFFSET or process incrementally via Dynamic Tables rather than full re-scans. Avoid running AI functions on tables with millions of rows without understanding the token math first.
Use AI_FILTER() to reduce the rows that hit expensive functions:

Documentation: sql-reference/functions/ai_complete · sql-reference/functions/ai_classify

2. Cortex Search

What it is: Cortex Search is a fully managed hybrid search service that combines keyword search with semantic vector search over your Snowflake data. You point it at a table, define a search column, and Snowflake handles embedding generation, index creation, storage, and serving — no external vector database required.

When a query comes in, Cortex Search runs both a traditional keyword match and a semantic similarity search, then fuses the results using a reciprocal rank fusion algorithm. This means it handles both exact term lookups ("invoice 2024-INV-0012") and conceptual queries ("recent billing issues with enterprise accounts") without configuration.

Who it is for: Data engineers building RAG (retrieval-augmented generation) pipelines. Developers building search-powered Streamlit apps or Cortex Agent workflows. Any team that needs intelligent retrieval over documents, knowledge bases, or unstructured text stored in Snowflake.

Best use cases:

Document retrieval for RAG pipelines — connecting Cortex Search to Cortex Agents to build chatbots or Q&A systems grounded in your Snowflake data:
Semantic search over support tickets, legal documents, HR policies, or product documentation where exact keyword matching is insufficient.

Best practices:

Set TARGET_LAG to match actual business needs. The default is 1 minute, which triggers continuous refresh and generates substantial serving compute cost even with zero queries. For most knowledge bases, 1 hour to 24 hours is appropriate.
Suspend development and staging search services when not in use. Cortex Search charges serving compute by GB per month regardless of query volume — an idle dev service with 20GB indexed continues billing continuously.
Use dedicated warehouses for Cortex Search refresh pipelines to isolate costs from general compute:
Use the ATTRIBUTES parameter to filter results at query time without retrieving and filtering all results in your application layer — this improves performance and reduces serving costs.

Documentation: user-guide/snowflake-cortex/cortex-search/cortex-search-overview

3. Cortex Analyst

What it is: Cortex Analyst converts natural language questions into accurate SQL queries against your Snowflake data. Unlike generic text-to-SQL tools, it uses a semantic model you define — a YAML file specifying your tables, columns, measures, dimensions, and business terminology — to generate syntactically and semantically correct SQL that reflects your actual data model.

The semantic model layer is what separates Cortex Analyst from generic LLM-based SQL generation. Instead of guessing that "revenue" means SUM(order_total), Cortex Analyst reads your semantic model and knows exactly which column, with which calculation logic, filtered by which conditions.

Who it is for: Data engineers who want to give non-technical stakeholders self-serve data access. Analytics engineers building semantic layers that feed AI-powered interfaces. Teams building internal analytics chatbots or embedded analytics products.

Best use cases:

Executive dashboards where business users can ask questions in plain English instead of requesting analyst support for every data pull.
Embedded Q&A in Streamlit apps — users ask "what was our top selling product last quarter in the Northeast?" and get the correct SQL-generated answer without knowing SQL.
Verified queries for common business questions: When you add a verified query to the semantic model, Cortex Analyst uses your pre-validated SQL for that exact question rather than generating new SQL. This improves accuracy and reduces token costs for frequently-asked questions.

Best practices:

Invest in the semantic model. The quality of Cortex Analyst output is directly proportional to the quality of the semantic model YAML. Define measures precisely, include synonyms for business terms, and document join relationships explicitly. A semantic model that takes a week to build correctly will return significantly better results than one written in an afternoon.
Use verified queries for high-frequency questions. Every verified query is an exact-match against the question text that bypasses LLM generation entirely — faster, cheaper, and guaranteed correct:
Test with diverse users. Natural language is ambiguous. Questions that seem clear to the data team often have multiple valid interpretations. Test Cortex Analyst with actual business users, not just technical stakeholders, and iterate on the semantic model based on where it misinterprets intent.

Documentation: user-guide/snowflake-cortex/cortex-analyst

4. Document AI (AI_EXTRACT)

What it is: Document AI extracts structured data from unstructured documents — PDFs, images, Word files, contracts, invoices, forms, and scanned documents — using a multimodal AI model that understands both text and visual layout.

As of March 2026, the legacy Document AI UI and the !PREDICT method were decommissioned. The current interface is AI_EXTRACT(), which accepts a document reference and a schema definition specifying what fields to extract.

Who it is for: Data engineers building document processing pipelines. Teams in finance, legal, HR, and healthcare with high volumes of structured documents that currently require manual extraction. Anyone working with scanned documents, PDFs, or image-heavy files.

Best use cases:

Invoice processing — extract vendor name, invoice number, line items, totals, and due dates from PDF invoices at scale:
Contract analysis — extract key clauses, dates, obligations, and party names from legal documents without manual review.
Scanned form digitization — convert paper-based forms, applications, and records into structured database rows.

Best practices:

Validate extraction quality on a representative sample before running at scale. Document AI handles most standard document formats well, but accuracy varies by document quality, layout complexity, and field type. Run on 100–200 representative documents first and measure accuracy against known values.
Use structured output schemas. The more specific your extraction schema, the better the results. Instead of asking for "payment terms" as a free-text string, specify the expected format — "payment terms as an integer number of days" — to get consistent, queryable output.
Handle extraction failures gracefully. On degraded scan quality or unusual document layouts, AI_EXTRACT may return null or partial results. Build null-handling and confidence checking into your pipeline rather than assuming every extraction will succeed.

Documentation: user-guide/snowflake-cortex/cortex-ai-features

5. Cortex Agents

What it is: Cortex Agents are Snowflake's orchestration layer for agentic AI workflows. An Agent receives a natural language request, creates a plan, breaks it into subtasks, routes each subtask to the appropriate tool — Cortex Analyst for structured data queries, Cortex Search for document retrieval, AI SQL functions for text processing — evaluates the results, and iterates until it reaches a final answer.

The architecture follows a Plan → Execute → Reflect loop. Agents can handle complex multi-step questions that require combining insights from structured tables and unstructured documents in the same workflow. They went generally available in November 2025.

Who it is for: Data engineers building enterprise AI applications that require more than a single LLM call. Teams building internal assistants, customer-facing AI, or automated analysis workflows that draw from multiple data sources.

Best use cases:

Enterprise Q&A systems that need to answer questions spanning both structured data ("what was Q3 revenue?") and unstructured sources ("what did the CEO say about pricing strategy in the board meeting notes?") in a single conversational response.
Automated reporting — an agent that pulls data from multiple tables, runs the relevant calculations, retrieves supporting context from documents, and produces a formatted summary without human intervention.
Agentic data pipelines — agents that monitor data quality, identify issues, and take corrective actions via MCP integrations with external systems.

Best practices:

Define tool boundaries explicitly. Give each tool in the agent's toolkit a precise description of what it does and what questions it handles. Vague tool descriptions lead to incorrect routing — the agent selecting Cortex Search for a question that should go to Cortex Analyst, or vice versa.
Keep agent scope focused. Agents that try to handle everything tend to perform worse than agents with a well-defined domain. Build multiple focused agents and route between them rather than one general-purpose agent that handles all question types.
Monitor token consumption per agent run. Agentic workflows generate significantly more tokens than single-shot LLM calls because each Plan → Execute → Reflect iteration processes the full conversation context. Set budget alerts and review CORTEX_AI_FUNCTIONS_USAGE_HISTORY regularly during development.
Test with adversarial inputs. Agents that perform well in happy path scenarios often behave unexpectedly with ambiguous questions, questions outside their domain, or questions that require the agent to acknowledge it doesn't know something. Test boundary conditions explicitly.

Documentation: user-guide/snowflake-cortex/cortex-agents

6. Cortex Guard

What it is: Cortex Guard is Snowflake's built-in AI safety and content moderation layer. It screens both input prompts and model outputs for harmful content, jailbreak attempts, prompt injection attacks, and policy violations before they reach or leave the LLM. It runs as a filter that wraps around other Cortex AI calls.

Who it is for: Any team deploying Cortex AI capabilities to end users — internal or external. Particularly important for teams in regulated industries, teams building customer-facing AI products, and organizations with strict responsible AI policies.

Best use cases:

Prompt injection defense — when users can submit free-text inputs that get passed to an LLM, Cortex Guard screens those inputs before they reach the model, blocking attempts to override system prompts or extract sensitive information.
Output filtering for customer-facing applications — screen LLM responses before they are displayed to users to ensure they meet content and tone policies.
Compliance-sensitive environments — financial services, healthcare, and legal teams using Cortex AI where regulatory requirements mandate demonstrable content controls.

Best practices:

Enable Cortex Guard by default for any user-facing AI application. The cost is modest relative to the risk of unfiltered LLM output reaching customers or employees.
Layer it with your own application-level validation. Cortex Guard handles broad safety categories. Domain-specific policy enforcement — ensuring an AI assistant does not give medical advice or investment recommendations — still requires application-level prompt engineering and output validation.

Documentation: user-guide/snowflake-cortex/cortex-guard

7. Cortex Fine-Tuning

What it is: Cortex Fine-Tuning lets you adapt supported open-source models — including select Llama variants — on your own Snowflake data without exporting it. You provide a training dataset of prompt-completion pairs stored in a Snowflake table, define the fine-tuning job, and Snowflake manages the training infrastructure, producing a custom model endpoint billed per token at inference time.

Who it is for: Data science teams who need domain-specific model behavior that general-purpose models don't provide out of the box. Teams with specialized vocabulary, classification schemas, or output format requirements that prompt engineering alone cannot achieve reliably.

Best use cases:

Domain-specific classification — training a model on your company's specific product taxonomy, support category schema, or industry terminology where general models make systematic errors.
Consistent output format — when you need model output in a specific structured format every time and few-shot prompting produces inconsistent results.
Specialized language — legal, medical, or technical domains with vocabulary and reasoning patterns underrepresented in general training data.

Best practices:

Start with prompt engineering before fine-tuning. Fine-tuning requires labeled training data, adds operational complexity, and produces a model that requires ongoing maintenance. Many use cases that seem to require fine-tuning can be solved with a well-designed system prompt and a few examples. Only move to fine-tuning when prompt engineering has a clear, measurable ceiling.
Use a held-out evaluation set. Split your training data into train, validation, and test sets before starting. Evaluate the fine-tuned model against the test set on the same metrics you care about in production — not just loss curves.
Fine-tuned models can degrade over time as your domain evolves. Build a re-training schedule into your model lifecycle from the start.

Documentation: user-guide/snowflake-cortex/cortex-finetuning

8. Cortex Code

What it is: Cortex Code is Snowflake's AI coding agent for data teams. Launched at BUILD London in February 2026, it is a Snowflake-native agent that understands your data catalog, schema, governance model, semantic layer, and RBAC — not just SQL and Python syntax. You interact with it through Snowsight (the web interface) or the CLI.

Unlike generic coding assistants, Cortex Code has read access to your actual Snowflake environment. It knows which tables exist, what the column names and types are, which roles have access to what, and how your existing pipelines are structured. This context is what makes its suggestions production-relevant rather than generic.

As of 2026, more than 50% of Snowflake customers use Cortex Code in some form.

Who it is for: Data engineers writing and optimizing SQL pipelines. Analytics engineers building dbt models. Data scientists writing Python for Snowpark. Platform engineers managing configurations, permissions, and cost governance. Effectively anyone who writes code or runs queries in Snowflake regularly.

Best use cases:

SQL pipeline development — writing, debugging, and refactoring complex queries with full awareness of your actual table structures:
dbt model generation — describing a transformation in plain English and getting a complete dbt model with tests and documentation:
Cost optimization queries — asking Cortex Code to identify expensive warehouses, poorly-performing queries, or governance gaps without manually writing account_usage SQL:
Legacy code migration — analyzing existing stored procedures, views, or scripts and migrating them to modern Snowflake patterns.

Prompt: "Write a query that identifies customers who purchased in Q1 but not Q2, and include their most recent order date and total lifetime value." Cortex Code reads your actual schema and generates production-ready SQL, not a generic template.

Prompt: "Create a dbt model that calculates 30-day rolling average revenue by product category, using our existing orders and products tables."

Prompt: "Which 5 service types are using the most credits in the last 30 days? Show me broken down by day."

Best practices:

Understand the session token economics. Longer sessions are cheaper per turn because Snowflake caches context between turns — subsequent turns in the same session are billed at approximately 10% of the first turn's input token rate. Keep related tasks in one session rather than opening a new one for each question.
Set per-user daily credit limits before broad rollout. Cortex Code in Snowsight (CORTEX_CODE_SNOWSIGHT) is accessible to every enabled user with a few clicks. Without limits, a team of 20 engineers exploring the tool simultaneously can generate substantial token spend in a single day:
Review Cortex Code suggestions before running on production. The suggestions are context-aware and usually good — but always review before executing, particularly for DDL operations, DELETE statements, or permission changes.
Use the AGENTS.md framework to provide project context. For teams with established conventions, patterns, or constraints, document them in AGENTS.md so Cortex Code incorporates them into every suggestion without prompting.

Documentation: user-guide/cortex-code/overview

9. Snowflake Intelligence

What it is: Snowflake Intelligence is the conversational AI interface layer built on top of Cortex Agents and the rest of the Cortex stack. It is the UI-level product that makes agentic AI accessible to business users — analysts, operations teams, and executives — without requiring them to interact with APIs or code.

Through Snowflake Intelligence, business users ask questions in plain English, and the underlying agent infrastructure orchestrates the retrieval, analysis, and response. Cortex Code uses SNOWFLAKE_INTELLIGENCE as a source when routing requests through the Intelligence interface.

Who it is for: Business users who need data access without technical assistance. Data teams who want to deploy AI-powered analytics to stakeholders without building a custom application. Organizations embedding AI into their business workflows.

Documentation: user-guide/snowflake-intelligence

Choosing the Right Cortex Service

The services above address different problems. This decision guide helps narrow the choice:

I have a large table of text and need to classify, extract, or enrich it: → AI SQL Functions (AI_CLASSIFY, AI_EXTRACT, AI_COMPLETE)
I need users to search through documents or knowledge bases using natural language: → Cortex Search
I want business users to query structured data using plain English: → Cortex Analyst
I need to extract structured fields from PDFs, images, or scanned documents: → Document AI (AI_EXTRACT with document input)
I need an AI system that combines structured data queries and document retrieval in one workflow: → Cortex Agents
I'm building a user-facing AI application and need content safety controls: → Cortex Guard (layer on top of whichever service you are using)
I need a model that behaves specifically for my domain and prompt engineering isn't sufficient: → Cortex Fine-Tuning
I need help writing, debugging, or optimizing SQL, Python, or pipeline code inside Snowflake: → Cortex Code
I want to give non-technical stakeholders a conversational interface to all of the above: → Snowflake Intelligence

A Note on Cost Visibility Across Services

Each Cortex service has its own billing model and its own set of account usage views. The cost signal for AI SQL Functions lives in CORTEX_AI_FUNCTIONS_USAGE_HISTORY. Cortex Code has separate views for CLI and Snowsight. Cortex Search serving costs appear in CORTEX_SEARCH_SERVING_USAGE_HISTORY. Cortex Agents roll up into AI_SERVICES in your service spend.

The practical implication: a team watching only warehouse costs will miss all of it. A team watching only AI_SERVICES will see the total but not the breakdown by model, source, or service type. Building a complete picture of Cortex cost requires joining across multiple usage views — or having a dedicated layer that surfaces model-level attribution and service breakdowns in one place.

For teams who want that consolidated view across accounts and services, Anavsan's Cortex visibility surfaces this — breaking down AI_SERVICES and CORTEX_CODE_SNOWSIGHT by model name, source (CORTEX_AGENT, SNOWFLAKE_INTELLIGENCE), token count, and cost in either credits or USD across configurable date ranges.

Full breakdown on Cortex billing, spike patterns, and governance controls: Snowflake Cortex cost guide

Frequently Asked Questions

Snowflake Cortex is a suite of AI and ML services running natively inside the Snowflake Data Cloud. It includes AI SQL functions for text enrichment, Cortex Search for semantic retrieval, Cortex Analyst for natural language to SQL, Document AI for document extraction, Cortex Agents for orchestrated workflows, Cortex Guard for safety filtering, Cortex Fine-Tuning for custom models, Cortex Code for AI-assisted development, and Snowflake Intelligence as the conversational business interface. All services operate within Snowflake's security perimeter — no data leaves the environment.

Cortex Analyst converts natural language into SQL queries against structured data. Cortex Search retrieves relevant content from unstructured text and documents using semantic search. They serve different retrieval patterns and are often used together in Cortex Agent workflows — Analyst for structured data questions, Search for document-based questions.

Cortex Agents are for building AI applications that orchestrate multiple data retrieval and processing steps in response to a user question. Cortex Code is an AI coding assistant for data engineers and analysts writing SQL, Python, and pipeline code inside Snowflake. They are complementary — you might use Cortex Code to build the pipelines that Cortex Agents query.

Start with prompt engineering. Fine-tuning requires labeled training data, adds model lifecycle overhead, and is often unnecessary if the task can be solved with a well-designed system prompt and examples. Fine-tune when you have a domain-specific vocabulary or output format requirement that prompt engineering consistently fails to meet, and you have sufficient labeled data to train on.

For Cortex Code, set per-user daily credit limits using ALTER ACCOUNT SET CORTEX_CODE_SNOWSIGHT_DAILY_EST_CREDIT_LIMIT_PER_USER. For Cortex Search, set TARGET_LAG to match actual data freshness needs and suspend dev services when not in use. For AI SQL Functions, use AI_FILTER to reduce the rows that hit expensive functions. Use CORTEX_AI_FUNCTIONS_USAGE_HISTORY, CORTEX_CODE_SNOWSIGHT_USAGE_HISTORY, and CORTEX_SEARCH_SERVING_USAGE_HISTORY to monitor consumption by service and user.

Most Cortex services are available on Enterprise Edition and above. Some features have Business Critical requirements. Check Snowflake's current feature availability matrix as this changes with each release. See Snowflake Cortex availability.

See how Anavsan governs your Snowflake costs

APEX detects cost anomalies, assigns them to the owning engineer, and documents savings with proof — automatically.

Book a Demo Free Assessment

The Cortex Services Landscape

1. AI SQL Functions

2. Cortex Search

3. Cortex Analyst

4. Document AI (AI_EXTRACT)

5. Cortex Agents

6. Cortex Guard

7. Cortex Fine-Tuning

8. Cortex Code

9. Snowflake Intelligence

Choosing the Right Cortex Service

A Note on Cost Visibility Across Services

Frequently Asked Questions

See how Anavsan governs your Snowflake costs

Related Articles