Pro Tips
Top 10 Most Expensive Snowflake Query Patterns & Fixes
Sep 28, 2025
Snowflake's pay-per-second billing for compute resources is a dream for scalability, but it can quickly become a nightmare for your budget if queries aren't optimized. Many data professionals find themselves staring at unexpectedly high bills, unaware of the specific SQL patterns that are secretly draining their credits.
This post will pull back the curtain on the top 10 most expensive Snowflake query patterns we frequently observe. More importantly, we'll equip you with immediate, actionable fixes to help you rein in those costs and boost performance. Understanding these patterns is the first step towards a more cost-efficient and performant data ecosystem.
The Culprits: Expensive Query Patterns & Their Remedies
Here are the common offenders that inflate your Snowflake bill, along with practical solutions.
1. Full Table Scans on Large Tables Without Filters
The Pattern:
SELECT * FROM very_large_table;orSELECT column FROM another_huge_table WHERE 1=1;Why it's Expensive: Snowflake bills by the amount of data scanned. A full table scan on a multi-terabyte table forces the warehouse to read all that data, even if you only need a few rows.
The Fix:
Always Filter: Use
WHEREclauses to drastically reduce the data scanned (WHERE date_column = '2023-01-01').Leverage Clustering Keys: If your tables are clustered, filtering on the clustering key makes scans highly efficient.
Avoid
SELECT *: Only select the columns you actually need.
2. Excessive ORDER BY with LIMIT on Unclustered Data
The Pattern:
SELECT * FROM large_table ORDER BY unclustered_column DESC LIMIT 100;Why it's Expensive: Snowflake needs to sort the entire dataset before it can pick the top 100, especially if the data isn't naturally sorted (clustered) on that column. This is a highly compute-intensive operation.
The Fix:
Cluster Tables: If
ORDER BYis frequent, consider clustering your table on that column.Filter First: Apply strong filters before sorting to reduce the dataset size.
Push Down Sorting: If possible, sort in a preceding CTE on a smaller subset.
3. Cross Joins (Cartesian Products)
The Pattern:
SELECT * FROM table_A, table_B;orSELECT * FROM table_A CROSS JOIN table_B;(without an explicitONclause).Why it's Expensive: Generates every possible combination of rows between two tables. If Table A has N rows and Table B has M rows, the result is N*M rows. This explodes data volume and compute.
The Fix:
Always Use
ON: Specify join conditions (ON table_A.id = table_B.id) to link related rows.Review
WHEREclauses for implicit joins: Ensure yourWHEREclause isn't accidentally creating a cross join when you intended an inner join.
4. Complex UNION or UNION ALL on Large Datasets
The Pattern: Combining multiple large
SELECTstatements withUNIONorUNION ALL.Why it's Expensive: Each
SELECTstatement runs independently, and then the results are merged.UNIONalso requires a distinct sort, adding more compute overhead thanUNION ALL.The Fix:
Use
UNION ALLwhen possible: If duplicate rows are acceptable,UNION ALLavoids the expensive distinct sort.Optimize Each Subquery: Ensure each
SELECTstatement is as efficient as possible (filters, clustering).Materialize Intermediates: For very complex unions, consider creating temporary tables or CTEs to break down the problem.
5. Unnecessary DISTINCT on Large Columns or Datasets
The Pattern:
SELECT DISTINCT large_text_column FROM huge_table;orCOUNT(DISTINCT very_wide_column)Why it's Expensive: Snowflake has to read all the data and then perform a global sort and comparison to identify unique values across the entire result set.
The Fix:
Only use
DISTINCTwhen essential: Are you sure you need distinct values at this stage?Filter First: Reduce the dataset before applying
DISTINCT.Consider Approximate Functions: For
COUNT(DISTINCT),APPROX_COUNT_DISTINCT()is much faster and cheaper if an estimate is acceptable.
6. Suboptimal Window Functions (e.g., ROW_NUMBER() OVER (ORDER BY ...) without partitions)
The Pattern:
ROW_NUMBER() OVER (ORDER BY some_column) FROM large_table;Why it's Expensive: Without a
PARTITION BYclause, the window function operates on the entire dataset as a single partition. This requires a global sort, which is very costly.The Fix:
Use
PARTITION BY: Always define logical partitions (PARTITION BY customer_id) to make window functions operate on smaller, more manageable subsets.Optimize
ORDER BYwithin partitions: Ensure the ordering within partitions is efficient.
7. LIKE Clause with Leading Wildcards (%)
The Pattern:
WHERE text_column LIKE '%search_term%'Why it's Expensive: Snowflake cannot use indexes or search optimizations when a wildcard appears at the beginning of the search string. It forces a full scan of the column.
The Fix:
Avoid leading wildcards: If possible, use
LIKE 'search_term%'for better performance.External Search Indexes: For complex text searches, consider using an external search service or specialized text search features if available.
Pre-process/Tokenize: For full-text search capabilities, consider pre-processing text into tokens.
8. Large IN or NOT IN Lists (especially with subqueries)
The Pattern:
SELECT * FROM my_table WHERE id IN (SELECT large_list_of_ids FROM another_table);Why it's Expensive: The subquery can be inefficient if it returns a massive list, and then the outer query has to compare against that large list.
The Fix:
Use
EXISTSorJOIN: Often,EXISTSor anINNER JOINperforms better thanINfor large sets.Materialize Subquery: For extremely large and static lists, consider creating a temporary table or CTE from the subquery result.
9. Excessive Use of QUALIFY with Large Result Sets
The Pattern:
SELECT * FROM (SELECT ..., ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) as rn FROM large_table) QUALIFY rn = 1;Why it's Expensive:
QUALIFYeffectively filters a previously computed window function result. If the initial window function creates a massive intermediate result before filtering, it's already expensive.The Fix:
Filter Early: Try to reduce the dataset before applying complex window functions and
QUALIFY.Partition Efficiently: Ensure your
PARTITION BYclause is well-defined to reduce the size of individual partitions.
10. TRUNCATE or DATE_TRUNC on Timestamp Columns (especially without a filter)
The Pattern:
SELECT DATE_TRUNC('day', timestamp_column) FROM large_table;Why it's Expensive: Applying functions to columns in
WHEREclauses (or evenSELECTwithout filters) can prevent Snowflake from using micro-partitions effectively, leading to full scans.The Fix:
Avoid functions in
WHERE: Instead ofWHERE DATE_TRUNC('day', timestamp_column) = '2023-01-01', useWHERE timestamp_column >= '2023-01-01' AND timestamp_column < '2023-01-02'.Pre-compute if frequent: If you frequently need
DATE_TRUNC('day', timestamp_column), consider adding a derivedDATEcolumn to your table and clustering on it.
How Anavsan Helps: Proactive Cost Prevention
Manually identifying and fixing these patterns across an entire data warehouse is a monumental task. This is where Anavsan becomes indispensable. Our AI-driven platform actively monitors your Snowflake environment, identifies these expensive query patterns before they cause significant cost overruns, and even suggests or applies optimized SQL.
Instant Query Optimization: Our AI instantly analyzes and rewrites inefficient SQL to reduce compute time and cost.
Cost Anomaly Shield: Proactively flags or pauses queries exhibiting expensive patterns, preventing bill shock.
AI Performance Simulators: Test optimization suggestions to see exact cost and performance benefits before deployment.
With Anavsan, you don't just react to high bills; you prevent them, allowing your data engineers to focus on innovation, not optimization.
FAQ: Connecting the Dots to Automation
Question | Answer |
Q: Can Anavsan fix my existing inefficient queries automatically? | Yes. Anavsan’s ⚡ Instant Query Optimization analyzes your Snowflake query history, identifies instances of these 10 expensive patterns, and provides optimized, cost-efficient code suggestions you can accept with one click, saving hours of manual tuning and implementing Snowflake query optimization best practices. |
Q: How does Anavsan prevent these expensive patterns from happening in the future? | Our Cost Anomaly Shield provides a proactive guardrail. It monitors real-time query execution and can block or automatically tune costly patterns like unnecessary ORDER BY on large datasets or full table scans, ensuring they never run up your bill. This is a core Snowflake cost reduction strategy. |
Q: Which plan is best for a Data Engineer who just wants to optimize their own SQL? | The Individual Plan is perfect for Data Engineers. It provides the core features of Instant Query Optimization, Real-Time Query Analysis, and the Cost Anomaly Shield to immediately boost your personal query performance and efficiency, a key component of Snowflake data warehouse optimization. |
Optimizing Snowflake queries is an ongoing effort, but by understanding these top 10 expensive patterns, you're well on your way to a more efficient and cost-effective data environment. Implementing these fixes manually will yield significant results, but for continuous, autonomous optimization, tools like Anavsan provide an invaluable safety net, ensuring your Snowflake spend always aligns with your business value.
Stop Paying for Slow Queries Today.
Ready to eliminate the top 10 expensive patterns and deploy autonomous Snowflake cost optimization?
Start Your 15-Day Free Trial and deploy your 24/7 AI optimization partner now.
