Snowflake Workload Governance
Snowflake Query Optimization
How Query Spillage Increases Snowflake Costs
Abinash, Snowflake Developer & Data Engineer @ Anavsan

Query spillage happens when a Snowflake query runs out of memory and spills data to local disk or remote storage. This slows execution, keeps warehouses running longer, and increases credit consumption. The fix is to identify recurring spilling queries, reduce unnecessary data processing, optimize joins and aggregations, and use the right warehouse size for the workload.
Snowflake cost governance is often discussed at the account, warehouse, or monthly bill level. But many cost problems begin much deeper: inside individual workloads.
One common example is query spillage.
Query spillage happens when a Snowflake query needs more memory than the virtual warehouse can provide during execution. When this happens, Snowflake temporarily writes intermediate data to local disk. If the query needs even more space, it can spill to remote cloud storage.
That sounds technical, but the cost impact is simple:
When a query spills, it usually runs longer. When it runs longer, the warehouse stays active longer. When the warehouse stays active longer, more credits are consumed.
This is why query spillage is not just a performance issue. It is a Snowflake cost governance issue.
What is query spillage in Snowflake?
A Snowflake virtual warehouse has compute, memory, and local storage resources available while it processes queries. Most efficient queries complete their work in memory. This is usually the fastest path.
But some queries create large intermediate results. This can happen during joins, sorts, aggregations, window functions, or large scans. When the intermediate data does not fit in memory, Snowflake starts spilling that data to storage.
There are two common types of spillage:
Local disk spillage: The query spills data to local storage attached to the warehouse.
Remote storage spillage: The query spills data to remote cloud storage because local resources are not enough.
Remote spillage is usually more concerning because accessing remote cloud storage is slower than accessing memory or local disk. That means the query may spend more time waiting on data movement instead of doing useful processing.
Why does query spillage increase Snowflake costs?
Snowflake warehouse compute is billed based on warehouse size and the amount of time the warehouse is running. This means runtime matters.
A query that finishes in 30 seconds consumes less compute time than the same query running for 10 minutes. If spillage causes the query to run longer, the warehouse remains active for longer. That extra runtime becomes additional credit consumption.
This is the core cost pattern:
Memory pressure → disk or remote spillage → longer execution time → more warehouse runtime → higher Snowflake credits.
For data teams, the mistake is treating this as “just a slow query.” A slow query is often a cost signal. It may indicate that the workload is processing too much data, using inefficient joins, sorting large intermediate results, or running on a warehouse that is not suited for that workload type.
Common causes of query spillage
Query spillage can come from several workload patterns. The most common include:
1. Large joins
Joins can create large intermediate datasets, especially when join keys are not selective or when one side of the join contains many duplicate values. If the query produces a much larger result set than expected, memory usage can increase quickly.
2. Heavy sorting
Queries with large ORDER BY operations can spill if Snowflake needs to sort more data than the warehouse can comfortably handle in memory.
3. Expensive aggregations
Large GROUP BY operations can create high memory pressure, especially when the grouping columns have high cardinality or the source table is very large.
4. Wide table scans
Using SELECT * on wide tables can increase the amount of data processed unnecessarily. Even if downstream logic only needs a few columns, the query may still scan and carry extra data through execution.
5. Missing filters
Queries that scan entire tables instead of filtering early create more work for the warehouse. More scanned data can lead to larger intermediate results and higher spill risk.
6. Repeated inefficient workloads
A one-time expensive query is a problem. A repeated expensive query is a governance issue. If the same spilling query runs every hour, every day, or as part of a dashboard refresh, the cost impact compounds.
Why query spillage is a workload governance problem
Many teams manage Snowflake costs by looking at warehouses: which warehouse spent the most credits, which one ran the longest, or which department owns it.
That is useful, but it is not enough.
Warehouse-level visibility tells you where credits were consumed. It does not always explain why. To control costs properly, teams need to connect warehouse consumption back to workload behavior.
For example, a warehouse may look expensive because:
One query spills to remote storage every night.
A dashboard refresh triggers multiple heavy queries at the same time.
A dbt model performs full refreshes instead of incremental updates.
A query scans all partitions because filters are applied too late.
A workload uses a small warehouse for a memory-heavy job, causing long execution times.
In each case, the cost issue is not simply “the warehouse is expensive.” The real issue is the workload pattern running on that warehouse.
That is why modern Snowflake cost governance needs to move from cost visibility to workload accountability.
How to identify query spillage
The first step is to look for queries that spill to local or remote storage. Snowflake query history and query profile can help identify this.
Teams should look for signals such as:
Bytes spilled to local storage
Bytes spilled to remote storage
Long execution time
High scanned bytes
Large intermediate result sets
Repeated query patterns
Warehouses that stay active longer than expected
Remote spillage should be prioritized because it is usually a stronger signal of memory pressure and inefficient execution.
It is also important to review frequency. A query that spills once may not deserve immediate action. A query that spills every day, or every dashboard refresh, is a better optimization candidate.
How to reduce query spillage
There are two broad ways to reduce spillage: optimize the query or adjust the warehouse.
1. Reduce the amount of data processed
Start by asking whether the query is processing more data than necessary.
Can you apply filters earlier?
Can you avoid SELECT *?
Can you remove unused columns?
Can you pre-aggregate data before joining?
Can you deduplicate a subquery before passing it into the outer query?
Smaller intermediate data usually means less memory pressure.
2. Improve joins and aggregations
Review large joins carefully. If a subquery returns many duplicate keys, the outer query may do unnecessary work. Adding DISTINCT or using a cleaner aggregation pattern can reduce the number of rows passed into later stages.
For aggregations, check whether the grouping level is necessary. Sometimes teams group by too many columns or aggregate at a lower level than the business question requires.
3. Use the right warehouse for the workload
Not every query should run on the same warehouse.
A small warehouse may be cheaper per hour, but if a memory-heavy query spills heavily and runs much longer, the total cost may not actually be lower. In some cases, running the workload on a larger warehouse can complete the job faster and reduce spillage.
The goal is not always to use the smallest warehouse. The goal is to match the warehouse to the workload.
4. Separate workload classes
Interactive analytics, BI dashboards, batch transformations, and heavy modeling jobs have different performance needs. If they all run on the same warehouse, one spilling workload can slow down other queries and keep the warehouse busy longer.
Separating workload classes can make cost attribution and tuning easier.
5. Monitor repeated offenders
Spillage governance should not be a one-time cleanup exercise. Teams should continuously monitor the queries that spill most often, consume the most credits, or appear in recurring workflows.
A good governance process should answer:
Which queries are spilling?
Who owns them?
Which warehouse do they run on?
How often do they run?
How much runtime do they add?
What optimization should be attempted first?
Did the fix actually reduce credits?
What Snowflake teams should do next
Query spillage is one of the clearest examples of why Snowflake cost optimization must happen at the workload level.
A monthly bill can tell you that costs increased. A warehouse dashboard can tell you where credits were consumed. But query-level analysis tells you what actually caused the spend.
If your team wants to reduce Snowflake costs, start by finding queries that spill to disk or remote storage. Then prioritize the ones that repeat often, run on important pipelines, or sit behind business-critical dashboards.
The best outcome is not just a faster query. It is a governance loop where expensive workload patterns are detected, assigned, fixed, and measured.
CTA: If your team wants to move from Snowflake cost visibility to workload accountability, Anavsan helps detect cost-heavy workloads, assign ownership, and track optimization impact across queries, warehouses, storage, and AI services. Sign up here.
FAQ
What is query spillage in Snowflake?
Query spillage happens when a query needs more memory than the virtual warehouse can provide. Snowflake then writes intermediate data to local disk or remote cloud storage during execution.
Why does query spillage make Snowflake queries slower?
Queries are fastest when work can be completed in memory. When data spills to disk or remote storage, Snowflake has to read and write intermediate data outside memory, which increases execution time.
Does query spillage increase Snowflake credits?
Yes, it can. Snowflake warehouses consume credits while they are running. If spillage makes a query run longer, the warehouse remains active longer, which can increase credit consumption.
Is remote spillage worse than local spillage?
Usually, yes. Remote spillage is typically slower than local disk spillage because data has to move through remote cloud storage. It is often a stronger signal that the query or warehouse sizing needs attention.
Should I always increase the warehouse size to fix spillage?
No. Increasing warehouse size can help for some memory-heavy workloads, but it should not be the default fix. First check whether the query can process less data, filter earlier, avoid unnecessary columns, or reduce large intermediate results.
How can teams govern query spillage over time?
Teams should monitor query history, identify recurring spilling queries, assign ownership, prioritize fixes, and measure whether the optimization reduced execution time and credits. Query spillage should be treated as part of workload-level cost governance, not just one-off troubleshooting.