Friday, March 24, 2017
I'd like to help save you some time banging your head and searching the internet by pointing out some of the issues I've run into when working with clients over the last year.
There are many articles and blogs about the technical aspects of loading data into Power BI so I won't cover that here in great detail. However, I would like to call out some basic data transformations and cleansing activities that can make a big difference to the experience of report users.
Once you have loaded a table check the columns:
1. Rename your data columns to use business terminology in the field names.
This isn't a database, this is a reporting data model and your business users need to be able to understand the names of the columns.
2. Add units to your column names.
Again, the objective here is to make the reports readable and easy to interpret. You don't want your uses to wonder if you were summing minutes or seconds in the reporting graphic.
3. Remove unnecessary columns:
Will the column be used for reporting? Really think about this, will the users of your reports NEED to see this column, slice or filter by it? If not, it doesn't belong in your reporting data model.
4. Remove bad data, or unnecessary data records:
Will the RECORDS in your table all be reported on? Again, every record might not have valid data. Check for this and remove records that are not going to provide good insight in your reporting.
5. Remove unused records, even from date tables!
Are the records part of the data set you need? Sometimes we load records that go all the way back years before our reporting range. Unless those records contribute to the value of information shown in the report, remove them. This is ESPECIALLY true of the Date or Calendar tables - think about those extra records. This is much more data than you need to be filtering through for your reports and interactive visualizations.
6. Use data types wisely:
Change data types to be accurate for your business definition of the column. Sometimes colums will come in as whole numbers to Power BI because the sample used to define the column "sees" only numbers in the first 1,000 rows for the preview. If this is a text based column, your data load will error when you load the rest of the data.
7. Use Trim if you think there might be leading or training spaces in your data.
Once you have loaded the data there is still preparation to complete. A common question often comes up right about now though, so let's talk about the
These are similar: that the engine will calculate them for you, and both can be aggregated in your reports and visualizations. The key difference is that calculations are values that are stored in each row of your table, and measures are not calculated until used in the report context.
1. Creating Calculations and Measures
Format your facts as part of your data model setup:
Often you will be told to use custom calculations instead of the default summarization on your columns.
1. You will then know, and be able to define in a data dictionary, exactly how the calculation was built.
2. However, if you do use default summarization, there will be a benefit when viewing reports: you will be able to use the "View records" command to see what records were included in the aggregate. This does not work for measures.
2. Sometimes you need to create a new column as part of each row:
For example, you might want to calculate Amount as a Quantity * Price - that is a calculated column
1. The "context" of the calculation is the row - always
2. Created on data load and stored just like other columns loaded by Power Query
3. Sometimes you want to show calculation results that change as we interact with our reports:
You might need to know the maximum Amount which is max(Quantity * Price), but when reviewed across all rows for orders placed within the current Year it will give a different answer than when viewed for all orders within a given City.
1. In this case the "context" is all of the rows that are being evaluated, and which rows will be determined by "where" the measure is being used
i. Used as a value in a chart by dates, it might be the max Amount per Year, then you might drill down to max by Quarter and max by Month
ii. Used as a value in a Geographic map, it might be the max Amount in the Region, then drill down to a given State or Province, City, etc…
1. Go through and format your facts as part of your data model setup
i. Set default summarization.
ii. Set the format to use for showing the fields
It is easier to do all of these steps for all of your columns at one time so that your reports look great from the start and you will know that the standard has been implemented. It is still likely that you will need to return to update or fix some which you can do a t any time.
2. Create Groups:
You can categorize your records automatically by creating groups that become a new column in your table. "Other" as a category allows for new data values added in your incoming data. Be sure to check overtime that you don't need to add a new category though.
3. Create Hierarchies:
i. Creating a hierarchy is something people do accidentally all of the time, then they worry about damaging their data model by deleting them. Don't worry if you get an accidental hierarchy with an unintended drag/drop of a column in the Fields list, just go ahead and delete that hierarchy - your fields will stay as they were.
ii. To create an intentional hierarchy drag a lower level field over to the highest level in the hierarchy and drop it there. There will be a yellow dotted line shown where it will drop. Each other column added to the "pile" will be added at the bottom.
iii. If necessary rearrange them by dragging them up or down, or delete one using the "x" to the far right.
iv. Adding a Hierarchy to the axis of a visualization will enable drop down.
v. In the axis of a graph - you can remove levels of hierarchy that are not desired for that graph without any effect on the hierarchy.
4. Even if you set the format of your column, you may need to set the visual scale on each individual visualization settings: None, Thousands. Pay attention to this, it makes a big difference to readability
5. You can now "See Data" AND "See Records"
i. See Data will show the values that are in the visualization - as a table.
ii. See Records will show the table records that comprise the aggregations that are shown in the graph. Every record in it's entirety.
iii. As mentioned above, "See Records" only works with data elements (including calculated columns?) not with measures that can be summarized
iv. Trick - if you want a count as your visual value, but your data doesn't have a uniquely identifying column to "count", create a column that can be summarized for the counting (like "Trips") and default it to the value of 1. You can then sum or count the values in this column to know how many unique "trips".
6. If you create a visualization (a bar chart) and the values are the same in every data point you have probably forgotten to create your relationships. You can draw them on the Relationships tab, or us the Manage Relationships dialog box from the Home ribbon bar.
7. How to review your relationships:
If you have many tables and many relationships and want to review them all quickly, the dialog box from Manage Relationships may be the better choice, as it is quickly scanned for errors in column names, etc.…
8. Single and "Both" direction filtering make a big difference in some reports:
i. Single direction filtering enables the value from one slicer or visual to affect the values on the related table. However, filtering the second table by another value will not affect the values shown from the slicer.
ii. The best way to visualize this is to see 2 slicers from two dimensional tables and when the dimensions are both single direction, the two slicers will not affect each other.
iii. If the 2 dimensions are "Both" direction filtering, then selecting a value from one will affect the values shown from the other.
9. Drill down in charts
i. There are 3 drill down buttons on a chart that contains hierarchies:
ii. The single down arrow button is a toggle to turn drill down on, when on selecting a value in the chart will "drill down" to the next level.
iii. The double arrow will drill down to the next level, aggregating at that level ONLY:
i. Ex: if you go from showing 3 years of data down to showing months, the values for each month will be the aggregate of all of the month's values for every year, all together
iv. The forked arrow aggregates at the combination of the level above and the new level:
i. Ex: if you go from showing 3 years down to months, you will see the values for each month of every year.
10. A Date table enables time calculations if certain criteria are met:
i. A date column must have a date data type
ii. The date column must show a continuous value for every date in the time period (ex: 1/1/2017, 1/2/2017, …)
iii. For any time intelligence calculations to work:
i. Joins with the date table must be made on the date column - NOT on a surrogate key.
ii. The values in the time calculation must also be continuous. If you are aggregating ParallelPeriod values, you must be doing it across a dataset that has a value for every date in the time period.
11. Why isn't my calculation working?
i. Check your calc - of course, but if you don't find anything
ii. Save some time by reviewing your:
i. Page and Report filters
ii. Visualization filters
iii. Whether you have Relationship filtering in Single or Both directions
These are some of the "gotchas" that can waste your time when you create your reports and can cause your users confusion when they use your reports.
Are there others that you've spent time resolving that you'd like to share?