vrijdag 29 mei 2015

Power Query

Introduction

Powerquery is an Excel add-in that can be used for examining data, transforming the data and combine data from multiple sources.  Powerquery is the ETL tool for Excel. You can extract data from multiple sources like Oracle, DB2, Hadoop, online locations and many other sources. You can transform the data, such as conversions, splitting columns, add columns, refine and merging. PowerQuery helps you to connect to sources accross your organization or public data sources.


Power query is one of the Excel Add-ins family and you have to download and install it and use this together with Excel 2013. I'm using version 2.22.4007.242. 

In this blogpost I'll investigate PowerQuery in to more detail. I've borrowed this example of the blogpost "Tutorial Introduction to Power Query".

The Toolbar

First, let's look at the toolbar (or ribbon). Below the toolbar of  PowerQuery :



There are a couple of options available on the ribbon:
  • Get external Data.
  • Excel Data.
  • Combine.
  • WorkBook Queries.
  • Settings.
  • PowerBI.
  • Help.
If we investigate the Get External Data in more detail, we have the following options:
  • From the Web
  • From File
    • From Excel
    • From CSV
    • From XML
    • From Text
    • From Folder
  • From Database
    • From SQL Server Database
    • From Access Database
    • From SQL Server Analysis Database
    • From Oracle Database
    • From IBM DB2 Database
    • From MySQL Database
    • From PostgressSQL Database
    • From Sybase Database
    • From TerData Database
  • From Azure
    • From SQL Azure Database
    • FromAzure Marketplace
    • From Microsoft Azure HDInsight.
    • From Microsoft Azure Blob Storage.
    • From Microsoft Azure Table Sorage
  • From Other Sources
    • From Sgarepoint List
    • From Odata Feed
    • From Hadoop File (HDFS)
    • From Active Directory
    • From Microft Exchange
    • From Dynamics CRM online
    • From Facebook
    • From SAP BusinessObjects BI Universe
    • Fro Salesforce objcts
    • From SalesForce Objects
    • From ODBC
    • BlankQuery
  • Recent Sources

This is quite an extensive list.

Now for this blogpost I'll use the example of the following blogpost "Tutorial Introduction to PowerQuery". In this tutorial the Top 250 movies of the IMDB.com site is combined with the revenue of the top 50 Revenue movies on Wikipedia.

Load the Movie data from IMDB Top 250 


1. Open an Empty WorkBook in Excel, Go the PowerQuery Tab and click on From Web.




2. Enter the URL and Press on OK.




3. Click on Table 0 and examine the data and press Load




4. The data is loaded in Excel and now we have to delete some columns and clean some stuff.


5. Click on the PowerQuery Tab and click on the Launch editor.


5. Remove some columns (Column0, rating and the last column). Right click on the column "Rank & Title" and press Split Column and By Delimiter.
 
The Results of this transformation looks like this screenshot:
 



7. Right click again on the column Rank& Title.2 and split the column again on the year by using the delimiter (and the option Split on the Right most delimiter. Then replace  the ) by nothing. The result should look look like this.


8. Rename the columns to Rank, Title, Year and IMDB Rating and Trim the Title column. The resulting window should look like the screenshot as shown below.



On the right you see the steps I've taken to come to the results.This is very handy. If you take some steps that you may regret, just delete the step and the action is rolled back.

9. Press on Close& Load.

Load the Revenue data from Wikipdia 

10. The next thing is to load the revenue per movie data from the Wiki page. Press From Web, Press Ok and press Load.


11. The data is loaded into Excel and it should like the screenshot below.
 

12. Press on Launch Editor and let's edit some stuff. I've renamed the two queries to IMDB Top250 and Revenue per movie. I've also deleted the last column


Merge the two datasets into one

13. Now press on Merge on the Toolbar, and configure the Merge operation as follows: select the IMDBTop250 and the Revenue per Movie, Click on the columns on how to merge the two columns and press OK.




14. Click on the button on the Column "NewColumn" and select "WorldWide gross" and press OK.


The final result should look like below.


The two datasets are combined and now you can visualise this in multiple ways.

Conclusion

Pretty impressive Self Service BI tooling of Microsoft. I've investigated PowerQuery for a limited time and I've just scratched the surface. The ease on investigating and combining data is really great. There are far more options of PowerQuery I've to investigate;-) IT seems that the PowerQuery language M is also integrated in SQL Server 2016 SSIS. Worthwhile looking into this...

Greetz,
Hennie


Geen opmerkingen:

Een reactie posten