One of the coolest things about Business Intelligence is finding new ways to get and analyze data. That’s why one of my favorite features in the PowerBI suite is the ability to extract data from websites using Power Query. It’s a cool tool to have in your toolbox if you want to quickly grab some interesting data you find while surfing and want to do some analysis. In this post I’ll show you how to get started with Web Data Extracts using Power Query.
First things first, we need some web data. I love to play games, particularly board games. My go to website for board games is Board Game Geek which, as it so happens, has some really great data sets for pretty much every board game in existence. The particular dataset we’re going to use is the Top 100 Board Games based on ranking.
Now we need Power Query. If you don’t already have the add-in, you can grab it here.
Once you have Power Query set up, go and ahead and load up Excel. In the top ribbon there should be a tab for Power Query (A). Select the tab and you’ll see all the features available. Click on the “From Web” button (B) and a form will pop up to enter a URL. Use the URL from above and click OK.
This will load the page into Power Query. Once the page has loaded into Excel you’ll see a Navigator window on the right side by default. Within the Navigator there will be multiple elements from the page that you can import through Power Query. Hovering over each element will give you a preview of what’s inside. Select Table 1 and click “Edit”
Once the table is loaded the Query Editor will open. This is where you can transform the data extract you got from the web to make it more suitable for reporting and analysis.
The first thing we’ll do is change the name of the query. Using the Name field in the Properties setting you can changed the name to “Board Game Geek Top 100 Games” (1).
Next remove the blank second column using the Remove Columns button (2).
Next we’ll split the Title column to separate the year value into its own column. Use the Split column button (3). I took the far right 6 characters to split off into a separate column.
Replace the parenthesis in the newly created year column by using the Replace Values button (4).
Now we want to format the shop column to just get the list price value. Use the Split Columns button again to split the Shop column using a custom delimiter character for “$”
Now we’ll split the Shop.2 column again, but this time using the “Space” delimiter. We should now have a bunch of different Shop columns.
Use the Replace Values function again in column Shop.2.1 to replace the value “[Shop]” with nothing. Now we’ll remove all the Shop columns except for column Shop.2.1.
Now that we have all the base columns, we’ll want to rename them to more friendly names.
Rename Column Shop.2.1 to “List Price”
Rename Column Title.1 to “Game Title”
Rename Column Title.2 to “Year”
Now that we have finished with the transformations that we want for our query we’ll want to save, close and load it. In the top left corner of the query editor, select “Close and Load To” (5). This will load the query to either a table in Excel, the data model, or both.
I loaded the query as a table into a worksheet and then summarized the table with a Pivot Table. I can now start doing some analysis of data, or do some further data modelling using Power Pivot.
And that’s it! Power Query is a pretty powerful tool for quick data extracts and transformations from the web. The best part is that it’s all based in Excel which doesn’t require deep knowledge of SQL or ETL processes. You can just find a cool dataset on the web and just load it up into Power Query. And this is just a portion of the types of data sources that work with Power Query. You can pull in data from Azure, from SQL Server, from Oracle, even from Facebook and MS Exchange. This is a fun tool to play around with and come up with some cool data analysis. Happy Data Hunting!
By: Sam Schwartz
Posted by admin