In addition to being a fantastic data transformation and modeling tool, Power BI gives the analyst the ability to organize data graphically in a very appealing way.
Assuming you want to display a time-series including also the prediction interval for the forecasting part, Power BI by default allows you to do so thanks to the Line chart visual, but it is the tool itself that determines the forecasting according to the type of series through two different algorithms: a seasonal (ETS AAA) and a non-seasonal (ETS AAN) algorithm.
Take the following time-series as an example:
One can guess by eye that an annual seasonality is present. If we add a forecast in Power BI thanks to the Analytics tab and let the tool automatically determine the seasonality, the results are not the best:
If, on the other hand, we manually assign the number of time-series points associated with seasonality, the results improve dramatically:
As you can see, the forecasting follows the trend of the peaks quite well, although it is slightly overestimated. It basically failed to capture the fact that the trend line is slightly lower than estimated.
For this reason, the analyst may prefer not to use the default forecasting algorithms in Power BI, but to use a more accurate model developed by a data scientist that returns both forecasting values and those associated with the prediction interval band.
Now suppose to perform a batch scoring through the aforesaid model and therefore to have a CSV file as output with all the necessary information to realize the plot. It doesn't exist a visual in Power BI that allows to draw actual values, predicted values and also the upper and the lower bound of the predictive interval taken from an existing table.
This Power BI custom visual that I am sharing with you was created to fill the gap just described. Thanks to it, it is possible to plot the shared time-series forecasting data in this repository (data/ts_forecast.csv) as well:
Unlike the previous forecasting provided by Power BI, it is evident that the values predicted through the external model are slightly underestimated, but more correctly follow the trend line of the actual values.
The main goal I set for myself was not only to give the analyst the ability to plot a time-series forecasting plot given all its component values, but also to create an interactive custom visual.
Since I'm a big fan of the R language and I'm aware that Power BI allows you to create custom visuals using R, I got to think about what R packages might be able to generate a time-series forecasting plot. Anyone who knows a little about R and the Tidyverse framework will surely know about the Tidymodels framework and, specifically for time-series, Matt Dancho's fantastic timetk and modeltime packages. In particular, the modeltime package contains the plot_modeltime_forecast() function that allows you to plot a time-series with the predicted values from one or more models. Moreover, the output of this function can be an interactive plot thanks to the use of plotly.R. It is exactly what I was looking for!
The only problem I encountered was the fact that Power BI service has an outdated R engine (Microsoft R Open 3.4.4) and the available R packages do not provide neither timetk nor modeltime. Because of this, to be able to publish a report that used this custom visual on the service, I had to extract the plot_modeltime_forecast() dependencies from these packages and had to make very minor changes to integrate them with each other. The files extracted from the various packages are as follows:
After making the above changes, the result was as follows:
But let's see now how to install the time-series custom visual.
In order to install a Power BI custom visual, you must click on the ellipses in the Visualizations pane, click on the menu item "Import a visual from file" and then select the custom visual to import:
The custom visual to import is the .pbiviz file located in the "dist" folder of the R HTML visual folder structure once you clone the repository (see references for more details):
Otherwise you can download the latest release of the visual directly from the Releases page on GitHub:
After loading the .pbiviz file, a new icon identifying the time-series custom visual will be displayed in the Visualizations pane, as shown in the figure above.
Right after clicking on the new icon, you will be asked to enable the custom visual. Once you click "Enable", you can interact with the data fields of the custom visual:
The requested data are the following:
- Date: (mandatory) variable containing date values
- Value: (mandatory) actual values of the time-series
- Value Type: (mandatory) variable containing labels about the type of value to which it refers ("actual" or "prediction")
- Confidence Low: (optional) value of the low bound of the predictive interval
- Confidence High: (optional) value of the high bound of the predictive interval
- Model ID: (optional) Numerical ID of the model that scored the predicted values it refers in the dataset. If the predicted values present in the dataset are generated by a single model, this field may be omitted
- Model Description: (optional) Description of the model that scored the predicted values it refers in the dataset. If the predicted values present in the dataset are generated by a single model, this field may be omitted
Once you have entered the first data in the above fields, you can explore the Format tab of the custom visual. In particular, there is the "Plot Settings" section that allows you to interact with the elements of the displayed plot:
Now let's see what is the expected format of the input dataset that feeds the data fields.
Suppose you have a time-series running from time T0 to time Tn, for a total of n observations. We will then have n current values in the series. Suppose now you want to do the forecasting for the values associated with the time range from Ti to Tn. This means that you want to predict the last n-i+1 values of the series. Since the dataset in input has a single variable containing the values (Value), distinguished in their function by the variable Value Type, the variable Value will contain both all the actual values (n values for which Value Type = "actual"), and the prediction values (the n-i+1 values for which Value Type = "prediction"). This means that in the Date column we will find the temporal range from Ti to Tn repeated twice, once for the actual values and once for the predicted values:
If you want to plot the forecasting obtained from 2 models, in addition of adding other n-i+1 rows to the dataset, you have to introduce two new variables: Model ID and Model Description. Through these two variables it is possible to associate the predictions contained in the Value field to a specific model from which they have been obtained. All becomes clearer looking at the following figure:
Let's see now how to put into practice everything seen.
You can find the ts_forecasting.pbix file in the "demo" folder of the repository. In it I have highlighted the differences in applying forecasting to a time-series provided in a CSV file using the standard Power BI features and using the time-series custom visual. The CSV file is located in the "data" folder of the repository and it contains data extracted from the Kaggle's Retail Data Analytics competition made available under the CC0: Public Domain license.
You can also test the time-series custom visual in a live demo published via Power BI's "Publish to web" option. Here is the link to the live demo.
Below is a list of links to key references:
- Describing the forecasting models in Power View
- Timetk, time series analysis in the tidyverse
- Modeltime, tidy time series forecasting with tidymodels
- Create visuals by using R packages in the Power BI service
If you want to learn how to create a Power BI R Custom Visual from scratch, as well as many other topics on how to better integrate Python and R into Power BI, you can find detailed guides in my book "Extending Power BI with Python and R":