Power BI - Report performance best practices
Why is my report slow? What could I do to prevent my report from becoming slow?
I hear these questions a lot, and the truth is, there is no silver bullet. If you've built your report already and you noticed the performance is not as good as you thought it would be, there are multiple reasons why this can be happening. In this case, it's important that you understand the root cause of the problem, as it's the only way to quickly fix it.
In this blog post, we are going to focus on the second question: What could I do to prevent my report from becoming slow?
Even though we will be focusing on best practices for report development, these could be applied to any stage of the process, including on existing reports with performance issues.
Before we dive into the best practices, I wanted to introduce to you my Power BI Report Development 101 flowchart:
With this flowchart, I tried to summarize the steps I usually take to develop a report in Power BI, being the main ones:
> Analyse data and build your data model
> Report Design
> Prepare to share your report
> Share your report
Inside each one of the "main steps", I included the smaller steps I usually have to go through.
You probably noticed I have all these different colours in my steps. This is because I consider some of the steps as optional, depending on the situation (check the flowchart legend). I'm not going into the details of this flowchart specifically, maybe a topic for another blog post :)
During the post, I will follow the logic on this flowchart and I will share with you some of the things you should take into account in each one of the steps if you want to make sure your report performs well.
Step 1. Analyse data and build your data model
Understand the problem/request:
This first step is often overlooked. Building a report without having a good understanding of the business problem you’re trying to solve is like driving a car with no GPS. If you don’t know where you’re going, the chances are you will probably end up in a completely different place from the one you’re supposed to be.
Know your audience:
With Power BI, it’s really easy to build a report in a very short time, but if you don’t know your audience and what their expectations are, it will be really hard for you to come up with a solution that will fulfil their needs. A report built for C level users will be different from a report built for analysts for example. Executive reports tend to be a lot more high level.
Know your data:
Understanding the data you will use in your report is fundamental for any successful project. Ask yourself questions like: What data do I need/have? What will the data model look like? Do I understand the tables, metrics? Is the data cleaned and in a good shape for the report I want to build?
Define success criteria:
What does success look like for your business stakeholders and end users?
Defining the success criteria for your reporting project will allow you to align all the previous steps. Also, it is common that you enter what it looks like an "infinite loop" of discussions with your end users and stakeholders about small changes, including new features in the report, new KPIs, views etc. Having clearly defined success criteria will allow you to close this "infinite loop of feedback" more easily.
Connect to data
Deciding the connectivity mode you will use for your data sources is one of the first steps you go through when building your Power BI data model. You should invest time on a solution that will optimize your data model performance.
Check on the image below the "rules" I usually follow to decide if I will use Import, Direct Query or Live Connection:
* Import – What to do when your data model is too large (>1GB)
There will be times when your data model will become quite large and you will still want to use Import mode. Here are a few options that can help you improve your performance in these situations:
-Consider using dataflows + shared datasets (link)
-Consider using composite models (link)
-Use parameters to filter large tables and reduce the amount of data loaded (link)
-Consider using incremental refresh (link)
-For very large datasets, consider creating a subset of the model for the most common reporting scenarios (consider using composite models too)
Often, stakeholders/users will ask to have the maximum amount of data possible in the report, and for it to be refreshed in near real time.
• Import only necessary fields and tables as most of the times the reality is that this is not needed. If the source data is only refreshed once a week, refreshing your Power BI report every day won’t bring any value to your solution or your end users.
How can I improve performance when connecting to data?
•Import only necessary fields and tables
• Centralised vs departmental/personal data sources: you should avoid using departmental/personal data sources when possible
• Minimize the use of Excel, csv and text files when practical - check if the project is big enough to create a use case around putting your data in a database
• Use relational database sources when practical - faster, cleaner, easier to deal with
• Prefer connectivity on data sources which support native queries and filters (e.g. SQL Server)
• Delegate as much processing to the data source as possible
• Disable background data (link)
• If safe, disable privacy settings or set both sources to Organisational (link)
• Test data refresh in the Power BI service regularly during development
How can I improve performance when transforming data?
• Leverage query folding
Place filters steps before row-holding steps - operations pushed down to source are often much faster
• Filter out unnecessary columns and rows
Always start with the minimum data you need to build your report. This includes removing redundant columns in related tables, and removing columns that contain values calculated from other columns
• Reduce usage of long-length columns with high precision and cardinality
Examples are columns with decimal places, long text, Date/Time… the more unique values a column contains, the less efficient the compression will be. Consider reduce the number of decimal places, split date and time in two separate columns
• Turn off Auto Date Time (link)
Auto Date Time creates many internal date tables that can be significant in smaller models
• Handle dirty data, incorrect data and errors
• Avoid transformations that scan whole tables like joins etc
If not folded, the entire table needs to be loaded to memory before moving to next step – consider using DAX measures instead
• Don’t load intermediate queries
When using tables that are only used as intermediate queries disable the data load
• Group by and summarize
Load pre-summarized data
If you are using Direct Query mode, you should follow the best practices mentioned previously, but also the ones below, that apply specifically to queries connected to the source using Direct Query:
•Avoid complex Power Query queries
An efficient model design can be achieved by removing the need for the Power Query queries to apply any transformations. It means that each query maps to a single relational database source table or view
•Examine the use of calculated columns and data type changes
Direct Query models support adding calculations and Power Query steps to convert data types. However, better performance is often achieved by materializing transformation results in the relational database source, when possible
•Do not use Power Query relative date filtering
This type of filter translates to an inefficient native query
•Limit parallel queries
You can set the maximum number of connections Direct Query opens for each underlying data source. It controls the number of queries concurrently sent to the data source
Build your data model
How can I improve performance when building my data model?
• Ensure tables have relationships
• Validate and Use Inactive Relationships Purposefully
• Avoid bi-directional relationships against high-cardinality columns
• Avoid excessive bi-directional or many-to-many relationships
• Many-to-many relationships should be single direction
• Aim for star schemas, avoid snowflake schemas
• Hide all fields not used directly by users
• Model should have a date table
• Reduce number of calculated columns
• Reduce usage of calculated tables
• Optimize column data types and precision
• Turn off column hierarchies (IsAvailableInMDX column property)
Here are a few extra things you should take into consideration when using Direct Query mode:
•Avoid relationships on calculated columns
The calculation expression will be embedded into the source queries. Not only is it inefficient, it commonly prevents the use of indexes
•Set relationships to enforce integrity
The Assume Referential Integrity property of Direct Query relationships determines whether Power BI will generate source queries using an inner join rather than an outer join
•Examine the use of calculated columns and data type changes
Better performance is often achieved by materializing transformation results in the relational database source, when possible
How can I improve performance when building DAX measures?
• Use DAX variables if possible
• Try to avoid DAX iterator functions (e.g. sumx, averagex...)
• Consider using the divide() function
• Use calculated measures rather than calculated columns when possible
• Store all your measures in a separate table
• Name your measures in a meaningful way - avoid ambiguity in names of columns and measures
• Format all currency & decimal measures to defined standard (e.g. 2 decimal, thousand separator)
• Use Explicit Measures, not Implicit Measures. Simply put, implicit measures are measures that are automatically assigned an aggregation such as a Sum or a Count by Power BI
Step 2. Report Design
• Use templates (.PBIT files) to speed up and standardize report development instead of starting with an empty .PBIX
• Have a focus on usability of the report for end users
• Address accessibility early in the design phase
• Carefully review the data displayed to assess if it can be easily misinterpreted
How to present information:
• Use careful placement and a clear hierarchy for displaying information on the page
• Have a specific purpose for each page, and for each chart on each page
• Be consistent with placement of items which appear on multiple pages
• Use a layout that focuses on telling the story you want to tell
• Avoid clutter on the page to allow report consumers to focus on what is important
• Use the right visualisation for the data (Pie charts, donut charts, gauges, and other circular chart types aren't a data visualisation best practice)
• Use a colour palette that follows visualization best practices and the company branding guidelines
• Use clear titles, labels, and descriptions
• Use bookmarks to create 'help' information to guide users
• Avoid dense report pages
Aim to minimize both the amount of data displayed as well as the number of visuals displayed on a page
• Avoid “data dump”-style reports with tables containing hundreds of columns and thousands of rows. This is usually created when the user expects to export this data to Excel. If the user intends to export data to Excel, consider designing the report directly within Excel to provide the data required but with a live connection to the dataset
• Consider using slicers and filters in your reports to allow the user to select just the data that is needed to be displayed at any one time.
Set the slicers or filters at the report publishing time, so the initial rendering of the report performs acceptably
• Consider adding an “Apply filters” button to your report, so that your visuals don’t update automatically every time you change a filter in a page (specially useful for Direct Query!)
• Consider using techniques such as bookmarks, drill through pages and tooltips to reduce the amount of data displayed on a page
• Be intentional with selections for each visual
Focus on its ability to convey information, resisting the urge to use a variety of visual types for aesthetic appeal only
• Only use custom visuals from sources that you trust – preferably certified
Test the custom visuals in your report page to ensure fast report loading time
• Change the interactions between visuals
Visuals filtering other visuals unnecessarily can make your report run slow (specially useful for Direct Query!)
• Enable Row Level Security
With RLS, Power BI only imports data the user is authorized to view
Step 3. Prepare to share your report
Row Level Security
RLS works by automatically applying filters to every DAX query, and these filters may have a negative impact on query performance. So, efficient RLS comes down to good model design (Refer to Analyse data and build your data model section).
• In general, it's often more efficient to enforce RLS filters on dimension-type tables, and not fact-type tables
• Rely on well-designed relationships to ensure RLS filters propagate to other model tables
• Avoid using the LOOKUPVALUE DAX function when model relationships could achieve the same result
• Avoid complex RLS rules
• Whenever RLS filters are enforced on Direct Query tables and there are relationships to other Direct Query tables, be sure to optimize the source database
The performance of the queries done through Q&A will depend directly on all the work you’ve done previously on trying to improve your data model performance.
There are a lot of best practices regarding Q&A (naming conventions, synonyms etc) that we are not going to discuss in this post.
Step 4. Share your report
Setup Data Gateway
• Use Enterprise data gateway instead of Personal Gateway
Personal Gateway takes data and imports it into Power BI. Enterprise Gateway (on-premises data gateway) imports nothing, which is more efficient when working with large databases
• Use different Power BI gateways for Scheduled Refresh and Direct Query
If the same gateway is used for Scheduled Data Refresh and Direct Query, the Direct Query performance will slow down when Scheduled Data Refresh is active
Schedule data refresh
Schedule data refresh:
• If possible, enable incremental refresh
With incremental refresh, only data that has changed needs to be refreshed. Less data to refresh reduces overall consumption of memory and other resources.
• Analyse carefully the frequency of your data refresh
The users may want a dataset refreshed 48x a day, but if your data source only refreshes 1x a day, having a refresh frequency higher than that won’t bring any value to your solution
• Think about the refresh time for your dataset
You might want to schedule your refresh at a “quieter” time to ensure you don’t have a lot of concurrent refreshes with other reports
Other relevant best practices:
• Add users to Office or Azure Active Directory groups and then use groups to share report content
If you are using role-based or row-level security, use groups to manage role membership
• Consider promote/certify your dataset so that it can be reused by others in the organisation. This can potentially avoid a lot of dataset duplication
• Ensure the Power BI report and data source are in the same region
With the tenant and data source in the same region, you can reduce network latency. The results are faster data transfer and faster query execution.
Even though this blog post looks huge, I'm pretty sure there are a lot of best practices I missed, and because of that, I will try to keep this post updated :)