Analytics with Power BI and R

Who is this Blog For?

This blog is a guide for BI Developers, Consultants and Data scientists who want to get familiar with developing machine learning solutions inside Power BI.

How to Start?

  1. First download R in your machine. You can download it from any of the given links:

https://mran.revolutionanalytics.com/open

Or

https://docs.microsoft.com/en-us/machine-learning-server/r-client/install-on-windows

  1. Download desktop version of R studio form this URL:

https://www.rstudio.com/products/rstudio/#Desktop

Configure Power BI with R Server:

In Power BI, click on the File > Options and Settings > Options.

Configure Power BI with R Server

Under the “Global” option, click on the “R Scripting” and specify the R version.

R Scripting

Install Required Libraries:

If you are working in R for the first time, you are required to install the packages that you need. In this example, I am going to use “ggplot2″ in Power BI. Open R studio and enter following command, it will download and install the required packages for ggplot2 on your machine.

Install.packages(“ggplot2”)

ggplot2 in Power BI

Download Sample Data:

In power BI, I have a dataset that shows specifications of cars such as: speed in city and highway, cylinder and so forth. If you are interested in downloading this dataset, you can do so from this URL

https://forge.scilab.org/index.php/p/rdataset/source/tree/master/csv/ggplot2/mpg.csv

Import this file into Power BI data model.

Power BI data model

Power BI data model 2

R Visual:

Step 1:

Click on the R visual in Power BI and put in the report canvas.

R Visual in Power BI

Note: You may get a prompt message to enable R script on report. Press Enable button.

Step 2: (Prepare Dataset)

After bringing R visual in the report area, select “cty” (speed in city), “hwy” (speed in highway), and “cyl” (cylinder”). After that, in “R script editor”, you will see R codes there!

R script editor

“#” is a symbol for comments in R language which you can see in R Scripts Editor. Power BI automatically puts the selected fields in a variable name “dataset” so all fields (cty,hwy, and cyl) will be stored in a dataset variable by “<-” sign. Also, it automatically removes the duplicated rows. All of this is explained in R script editor area.

R language in R Scripts Editor

Step 3: (R Scripting)

We are going to put our R code for drawing a two-dimensional graph in power BI. In order to use any R scripts in power BI, we have to install the R version and then call the packages using "library" as shown below

library(ggplot2)

So, from now on, whatever library you use in power BI, call it by library function first. There are some cases where you must install some other packages to make them work. However, based on my experience, I think this part is a bit challenging!

To draw a chart, use “ggplot” function to draw a two-dimensional chart. The first argument is “dataset” which holds our three fields. Then we have another function inside the ggplot, named “aes” that identifies which files should be in x-axis and which files should be in y axis. Finally, I am also interested in showing the car cylinder in chart. This can be done by adding another layer in aes function as “Size”. So, bigger cylinder cars will have bigger dots in the graph.

ggplot(dataset, aes(x=cty, y=hwy,size=cyl))

graph in power BI

However, this just shows the graphs without anything! We need a dot chart here to create that, we need to add another layer with a function name. geom_point enables to draw a scatter chart. This function has a value as pch=(Any Number) the shape in chart. For instance, if I put this value as 20, it will become a filled circle and 23 will become a diamond shape.

geom point enables

Let’s put some colors in chart by providing fill factor in aes function.

aes function

Step 4: (Publish)

Now this report is ready to be published on Power BI service. We don’t need to configure anything special on Power BI service to run R analytics. Just Publish it in your workspace and you are good to go.

Power BI service

Known limitations

R visual is a great feature to use with Power BI. However, it has few limitations:

  • Data size limitations – data used by the R visual for plotting is limited to 150,000 rows. If more than 150,000 rows are selected, only the top 150,000 rows will be used and a message will be displayed on the image.
  • Calculation time limitation – if an R visual calculation exceeds five minutes, the execution time will run out. It will result in an error.
  • Relationships – as with other Power BI Desktop visuals, if data fields from different tables with no defined relationship between them are selected, an error occurs.
  • R visuals are refreshed upon data updates, filtering, and highlighting. However, the image itself is not interactive and cannot be the source of cross-filtering.
  • R visuals respond to highlighting other visuals, but you cannot click on elements in the R visual in order to cross filter other elements.
  • Only plots that are plotted to the R default display device are displayed correctly on the canvas. Avoid explicitly using a different R display device.

In this release, RRO installations are not automatically identified by the 32-bit version of Power BI Desktop, so you must manually provide the path to the R installation directory in Options and settings > Options > R Scripting.

Leave a Reply

Your email address will not be published. Required fields are marked *