Sigma Computing is a powerful data modeling and analysis platform designed to leverage the power of modern cloud technology. This native integration offers users the fastest and most powerful connection possible. Because the largest impact on workbook performance is going to be the underlying warehouse, we always recommend selecting Snowflake as your warehouse to guarantee the best performance.
Want to take the guesswork out of optimizing your Snowflake? Reach out to phData to learn more about our Toolkit and free Snowflake optimizer.
Once connected to Snowflake,Sigma utilizes Machine Generated SQL to produce the most optimal results. However, there are a few ways users can help optimize calculations, which will help increase performance and decrease Snowflake consumption costs.
In this blog, we will cover several ways users can optimize their calculations to increase performance, save on consumption costs, and decrease waiting times. This is not intended to be an exhaustive list of SQL best practices but rather provide specific examples of common use cases and SQL best practices within Sigma workbooks.
1. Use table summaries for high-level date calculations
Table summaries are calculations or aggregations applied for all of the data in an element. Think of these as “Grand Totals” or data source-level results. Almost every analysis involves some sort of date comparison and most scenarios compare dates to today or the most recent date available.
By creating today() or Max([Date Field]) as a table summary, Sigma only needs to compute the answer once, and those results can then be used and reused in other calculations.
Maximize Sigma’s parent-child relationships by calculating date fields as early as possible and then using the results instead of recalculating.
2. Model tables to minimize using CountDistinct
CountDistinct is a resource-intensive calculation identifying unique items in each row of your ever-growing dataset. Sigma’s grouping functionality makes it significantly easier to count distinct lists. All you need to do is group the table by the item you want to count and use “1” as a calculated field. Then, instead of CountDistinct, you can use SUM(), which is less intense.
This is a more optimized way to calculate the count of unique orders for each Brand.
Want to learn more about maximizing table aggregations in Sigma?
3. Use Booleans whenever possible
A Boolean is a data type that represents binary logic, it most expresses True and False. Because a Boolean field can only be represented as one of two options, they are typically more efficient than other result options. For example, it is faster to return a True/False as a boolean field than it is to return the word “True” or “False” since these are string fields.
Boolean logic is everywhere in analytics and data modeling: Is this date within the current year? True or False. Is this category the one I want to filter to? True or False. Should this customer’s orders be included in the average order value calculation? True or False. Even though asking Sigma to output a Boolean field and then using that field to aggregate or filter on is a small change that probably won’t make much of a difference in an individual element, those incremental changes are compounded as data models become more complex.
Output calculations as Booleans whenever possible in Sigma for optimized warehouse queries.
4. Using First or Last Instead of Min or Max
In a similar vein to the Boolean recommendation, use First/Last instead of Min/Max when possible. By asking Sigma to pull in the first or last value row in a column, Sigma does not need to compare every row to every other row to determine which values are the largest or smallest available. Note that this recommendation is only applicable in certain scenarios and heavily depends on the order of your data.
For example, if I’m calculating the last date available in my dataset, I will almost always use Max([Date]) because I don’t want to be restricted in sorting my date column. The most common scenarios I use First/Last are if a field needs to be treated as an aggregate but is the same value for every row; for example, a quarterly goal compared against monthly sales.
The second scenario is when I would like a value I’ve already calculated to be displayed in a visual element but not aggregated. For example, ranking the sales team next to their sum of sales (I want to prevent the rank from being summed for each row needed to calculate the sum of sales).
Ask Sigma to check fewer rows of data by utilizing First/Last instead of Min/Max.
5. Keep Dynamic Logic Simple
When creating advanced and nested calculations, create as many independent steps in the logic as possible. Dividing the logic into smaller chunks lets Sigma’s SQL optimizer run as efficiently as possible, enables parallel execution of logic (you can evaluate True and False results, not just True or False), and helps your workbooks be easier to debug.
This is not an optimal calculation because it references a parameter selector, a time frame, and an aggregate calculation.
Instead, use filters and groups by calculations to isolate and calculate what you need.
Conclusion
Sigma is incredibly powerful and can handle any analysis scenario you can think of. However, there are still some simple best practice principles developers can follow to help provide the best experience for their end users. The strategies listed here will not only help improve efficiency and save computing costs but will help developers build more robust analytic solutions.
FAQs
How can I figure out which calculations are slowing down my workbook?
How can I optimize my data before it gets to Sigma?
What are some other best practices for data modeling in Sigma?
This content was originally posted on phData’s website. Click here to read the original post