Showing posts with label Best Practices. Show all posts
Showing posts with label Best Practices. Show all posts

Wednesday, September 28, 2016

Performance Tips and Techniques for Power BI and Power Query (update)

Power BI Performance Tips and Techniques update:

Back in March of this year I created an article and a webinar for the Pragmatic Works Blog.

The original concepts still hold, Power BI and Excel Power Query are very powerful tools for building data models and sharing data discovery and visualizations. As you move from initial prototyping to a more mature model you may find that performance can be bogged down if you don't manage the structure of your reporting information. This talk provides Tips and Techniques to manage this information (link to updated slides).







Since the original webinar I have presented this topic at a few user groups and SQL Saturdays and was asked some very good questions.

I'd like to answer these questions here, and to update the slide deck as well.


1) In Providence at the BI User Group I was asked: What happens in Kasper de Jonge's Memory Usage Tool macro when there is a join between two queries loaded into the model?




If the join is done in the data load, the data model contains only one table, as in the example discussed in the original presentation.

If there is a relationship created between two tables loaded in to memory, the joined tables are both shown in the pivot tables that result from the macro, and how much memory taken by each column is shown as a datatype and total for the column.

2) In NYC at the Power BI User Group a member asked: Does the HierarchySlicer cause performance problems?
To test this, I used a data table with nearly 14M rows linked to a Date table. Power BI connected via direct query to SQL Server 2016. 



The Hierarchy Slicer (HierarchySlicer.0.7.9.pbiviz) at Year / Month on the Date table compared to a "flat" YearMonthName non hierarchical slicer. In this case, each new slicer selection took about 2 seconds to repopulate the summary table to the left. The two took the same amount of time, I did not see a difference in performance.

3) At SQL Saturday in Dallas this question came up: Is the SSAS formula engine still single threaded in Power BI on the portal as it is on the desktop? The SSAS engine is the same in Power BI on app.powerBI.com as it is on your local drive. There would not be an improvement on calculations caused by multi-threading, although the local machine may or may not have more resources to dedicate to the formula engine thread.

Thank you all for the great questions, please comment on others questions or if you have more information!

Friday, July 8, 2016

World Class Data Organizations and SQL Server 2016


World Class Data Organizations (WCDO) intentionally invest in and build their data cultures to create business value. These businesses integrate data into daily activities beyond report or dashboard reviews. Employees know what data elements represent, how to access them, and how to use them to create new insights.



WCDO and SQL Server 2016 (pdf) presents the primary characteristics and activities of WCDO's and expands on some of the features of SQL Server 2016 designed to enable those activities.

This is the slide deck presented on June 30, 2016, and the beginning of a series of presentations I will deliver on this topic.

Saturday, June 11, 2016

Power BI Best Practices - Tactics

I've been asked to present Power BI best practices to a client on Monday. This two post series is part reference to great articles and partly my own insights on the topic, some gained by painful experience. Part one addresses BI project strategy, and this is part two on tactics:

#1 Dashboards are "a single place to monitor the current state of the data".
Microsoft support has tips for great dashboards and points out some design techniques that are important to keep in mind:
#2 Design  for the user's reporting needs.
Like Delora Bradish's exhortation to always consider reporting when designing your data warehouse, your data model should provide friendly table and column names that reflect familiar business terms well understood by the system's users.

Include only necessary data and hide any information that might be used for creating the data model or calculations but not shown (like key values) in reports.

#3 Design your data model to support self-service.
From Marco Russo we are guided to take advantage of data views to stabilize our data loading process:
  • Use views for your data sources to avoid dependence on table structures
  • Create a view schema in the database that consists only of the views for each table to be imported
  • Each view should only contain the columns necessary for reporting from the data model
  • Use meaningful names that will make sense to your users
Double check relationships in your data model to ensure that self service reporting will return valid data.

Carefully specify check data types and how or whether fields should be summarized.

Separate unrelated data into multiple tables to clarify reporting.

Use star schema when possible, as snowflakes can limit scalability. Use snowflakes when lookup tables can provide usability improvements without performance implications.

#4 Consider security and data refresh early in the design of your project.
Work with appropriate administrators, whether in Security or your DBA's, to build gateways for data refresh or direct query access from on premises data sources to the Power BI portal.

Schedule refreshes to correspond with your data frequency, there is no reason to update data eight times a day if it is loaded weekly.

#5 Take advantage of Power BI Content Packs internally and to access services.
Use organizational Content Packs to provide read-only pre-built solutions to your organization or other groups. Members of the group can copy these and personalize them as needed.

Use Content Packs for services published for many common business services. Use this link to see available content packs from the Power BI portal.

#6 Configure your solution to maximize Q&A Natural Language capabilities.
Chuck Schaeffer's CRM Search article provides considerable information about designing for Q&A:

  • "Datasets not represented with tiles are not considered, anticipate the types of question that may be asked and design the dashboard with tiles from datasets which may respond to sure questions".
  • Questions will join entities to provide answers (example sales and fiscal calendar to answer a question about "Last year's sales revenue").
  • Be sure to fix data types so that Q&A can correctly represent the answers.
  • Be sure to mark those that should not be aggregated with do not summarize.
  • Add data categories wherever possible.
  • Put your topics in their own datasets, break out each topic from a given entity.
  • Add synonyms to improve Natural Language capabilities of PBI on your data set.

#7 Be wary of performance issues that might affect your data model's performance.
These are a few tips from my blog entry on Pragmatic Work's site:
Tall, narrow tables are faster than wide tables with many columns, especially if the columns are highly unique (columns that are highly unique are said to exhibit "high cardinality").

Use integer keys for primary/foreign key relationships.

Reduce uniqueness in columns by splitting date and time values, and by reducing the "granularity" of time to hours if that is required for reporting.

Include only relevant data in the data model. One powerful example is not to include unnecessary dates in your Date

Please comment on my choices, or add your own best practices to this growing list.

The first post in this short series addresses strategic concerns for developing BI projects.