Updated: Feb 14
Why is my report slow? What could I do to prevent my report from becoming slow?
I hear these questions a lot, and the truth is, there is no silver bullet. If you've built your report already and you noticed the performance is not as good as you thought it would be, there are multiple reasons why this can be happening. In this case, it's important that you understand the root cause of the problem, as it's the only way to quickly fix it.
In this blog post, we are going to focus on the second question: What could I do to prevent my report from becoming slow?
Even though we will be focusing on best practices for report development, these could be applied to any stage of the process, including on existing reports with performance issues.
Before we dive into the best practices, I wanted to introduce to you my Power BI Report Development 101 flowchart:
With this flowchart, I tried to summarize the steps I usually take to develop a report in Power BI, being the main ones:
> Analyse data and build your data model
> Report Design
> Prepare to share your report
> Share your report
Inside each one of the "main steps", I included the smaller steps I usually have to go through.
You probably noticed I have all these different colours in my steps. This is because I consider some of the steps as optional, depending on the situation (check the flowchart legend). I'm not going into the details of this flowchart specifically, maybe a topic for another blog post :)
During the post, I will follow the logic on this flowchart and I will share with you some of the things you should take into account in each one of the steps if you want to make sure your report performs well.
Step 1. Analyse data and build your data model
Understand the problem/request:
This first step is often overlooked. Building a report without having a good understanding of the business problem you’re trying to solve is like driving a car with no GPS. If you don’t know where you’re going, the chances are you will probably end up in a completely different place from the one you’re supposed to be.
Know your audience:
With Power BI, it’s really easy to build a report in a very short time, but if you don’t know your audience and what their expectations are, it will be really hard for you to come up with a solution that will fulfil their needs. A report built for C level users will be different from a report built for analysts for example. Executive reports tend to be a lot more high level.
Know your data:
Understanding the data you will use in your report is fundamental for any successful project. Ask yourself questions like: What data do I need/have? What will the data model look like? Do I understand the tables, metrics? Is the data cleaned and in a good shape for the report I want to build?
Define success criteria:
What does success look like for your business stakeholders and end users?
Defining the success criteria for your reporting project will allow you to align all the previous steps. Also, it is common that you enter what it looks like an "infinite loop" of discussions with your end users and stakeholders about small changes, including new features in the report, new KPIs, views etc. Having clearly defined success criteria will allow you to close this "infinite loop of feedback" more easily.
Connect to data
Deciding the connectivity mode you will use for your data sources is one of the first steps you go through when building your Power BI data model. You should invest time on a solution that will optimize your data model performance.
Check on the image below the "rules" I usually follow to decide if I will use Import, Direct Query or Live Connection:
* Import – What to do when your data model is too large (>1GB)
There will be times when your data model will become quite large and you will still want to use Import mode. Here are a few options that can help you improve your performance in these situations:
-Consider using dataflows + shared datasets (link)
-Consider using composite models (link)
-Use parameters to filter large tables and reduce the amount of data loaded (link)
-Consider using incremental refresh (link)
-For very large datasets, consider creating a subset of the model for the most common reporting scenarios (consider using composite models too)
Often, stakeholders/users will ask to have the maximum amount of data possible in the report, and for it to be refreshed in near real time.
• Import only necessary fields and tables as most of the times the reality is that this is not needed. If the source data is only refreshed once a week, refreshing your Power BI report every day won’t bring any value to your solution or your end users.
How can I improve performance when connecting to data?
•Import only necessary fields and tables
• Centralised vs departmental/personal data sources: you should avoid using departmental/personal data sources when possible
• Minimize the use of Excel, csv and text files when practical - check if the project is big enough to create a use case around putting your data in a database
• Use relational database sources when practical - faster, cleaner, easier to deal with
• Prefer connectivity on data sources which support native queries and filters (e.g. SQL Server)
• Delegate as much processing to the data source as possible
• Disable background data (link)
• If safe, disable privacy settings or set both sources to Organisational (link)
• Test data refresh in the Power BI service regularly during development
How can I improve performance when transforming data?
• Leverage query folding
Place filters steps before row-holding steps - operations pushed down to source are often much faster
• Filter out unnecessary columns and rows
Always start with the minimum data you need to build your report. This includes removing redundant columns in related tables, and removing columns that contain values calculated from other columns
• Reduce usage of long-length columns with high precision and cardinality
Examples are columns with decimal places, long text, Date/Time… the more unique values a column contains, the less efficient the compression will be. Consider reduce the number of decimal places, split date and time in two separate columns
• Turn off Auto Date Time (link)
Auto Date Time creates many internal date tables that can be significant in smaller models
• Handle dirty data, incorrect data and errors
• Avoid transformations that scan whole tables like joins etc
If not folded, the entire table needs to be loaded to memory before moving to next step – consider using DAX measures instead
• Don’t load intermediate queries
When using tables that are only used as intermediate queries disable the data load
• Group by and summarize
Load pre-summarized data
If you are using Direct Query mode, you should follow the best practices mentioned previously, but also the ones below, that apply specifically to queries connected to the source using Direct Query:
•Avoid complex Power Query queries
An efficient model design can be achieved by removing the need for the Power Query queries to apply any transformations. It means that each query maps to a single relational database source table or view
•Examine the use of calculated columns and data type changes
Direct Query models support adding calculations and Power Query steps to convert data types. However, better performance is often achieved by materializing transformation results in the relational database source, when possible
•Do not use Power Query relative date filtering
This type of filter translates to an inefficient native query
•Limit parallel queries
You can set the maximum number of connections Direct Query opens for each underlying data source. It controls the number of queries concurrently sent to the data source
Build your data model
How can I improve performance when building my data model?
• Ensure tables have relationships
• Validate and Use Inactive Relationships Purposefully
• Avoid bi-directional relationships against high-cardinality columns
• Avoid excessive bi-directional or many-to-many relationships
• Many-to-many relationships should be single direction
• Aim for star schemas, avoid snowflake schemas
• Hide all fields not used directly by users
• Model should have a date table
• Reduce number of calculated columns
• Reduce usage of calculated tables
• Optimize column data types and precision
• Turn off column hierarchies (IsAvailableInMDX column property)
Here are a few extra things you should take into consideration when using Direct Query mode:
•Avoid relationships on calculated columns
The calculation expression will be embedded into the source queries. Not only is it inefficient, it commonly prevents the use of indexes
•Set relationships to enforce integrity
The Assume Referential Integrity property of Direct Query relationships determines whether Power BI will generate source queries using an inner join rather than an outer join
•Examine the use of calculated columns and data type changes
Better performance is often achieved by materializing transformation results in the relational database source, when possible
How can I improve performance when building DAX measures?
• Use DAX variables if possible
• Try to avoid DAX iterator functions (e.g. sumx, averagex...)
• Consider using the divide() function
• Use calculated measures rather than calculated columns when possible
• Store all your measures in a separate table
• Name your measures in a meaningful way - avoid ambiguity in names of columns and measures
• Format all currency & decimal measures to defined standard (e.g. 2 decimal, thousand separator)
• Use Explicit Measures, not Implicit Measures. Simply put, implicit measures are measures that are automatically assigned an aggregation such as a Sum or a Count by Power BI
Step 2. Report Design
• Use templates (.PBIT files) to speed up and standardize report development instead of starting with an empty .PBIX
• Have a focus on usability of the report for end users