Tuesday, October 11, 2016

Azure Cloud Based Analytics

The elements of analytics are the same whether on premises or in the cloud. Cloud based tools for analytics can provide a platform that is far more extensible, economical and powerful than traditional on-premise. This session will review some cloud analytics use cases and how the tools from Microsoft combine to create business insight and value.
Azure Cloud Tools
Microsoft Intelligence (edited, previously Cortana Intelligence Suite) is a powerful combination of cloud based tools that manage data, provide analysis capability and store or visualize results. Together these tools represent the six elements of data analytics.


The talk on Data Based analytics includes a demonstration of Data Lake, PolyBase, Azure SQL Data Warehouse, Power BI, and Azure Machine Learning. The links to Power BI and Azure ML will take you to the apps, each has a free account you can sign up and use for testing. I encourage you to sign up for the free offer for $200 in Azure Credits (United States based offer, at the time of this posting Oct. 11, 2016).

Analytics projects
According to Gartner analytics projects include 6 elements and there are common challenges for each of these elements that must be addressed in any analytics project:

  1. Data sources - volume alone and the need to move large data sets
  2. Data models - complexity of setup as they simulate business processes
  3. Processing applications - data cleansing, remediation of bad data
  4. Computing power - analysis requires considerable power
  5. Analytics models - design of experiments and management of the models, model complexity
  6. Sharing and storage of results - informative visualizations and results embedded in applications

Tools provided by cloud vendors provide assistance and in some cases, with far superior solutions to these issues.
The Cloud analytics tools are designed to address the challenges we’ve been discussing:
 1) Scalability: Agility/ Flexibility
     -large data sources –power them off when not in use
     - heavy computing capability – pay only for what you use
    - Stand up a new environment – pilot a new capability
   - integrate data sources with multiple visualization and analytics environment
2) Economy –
    - less expensive to stand up than hardware and software,
    - fewer skills necessary in house for implementation, configuration, integration, use,
     - management and maintenance of systems
3) Security
   - Azure is designed for security from the ground up
   - Microsoft spends $1B a year on security research – more than most security firms
   - identity based security throughout the stack

   - protection for disaster recovery, regulatory specific platforms to address security needs
4) Capability
  - Integration with desktop tools for embedding visualizations, integrating predictive analytics
  - Consumer technology style interfaces enable faster learning and require fewer skills than script based on premises tools
  - Collaboration enabled by ties across Office 365 tools and Mobile capabilities
  - Powerful servers can simply process far more data and analysis than most local servers

How can my business benefit from analytics in the cloud?
Data, provided and used as a valuable asset of the firm provides leverage employees apply to problem solving activities. Fact based trend prediction leads to insights that provide business value.
Businesses taking advantage of the cloud to benefit from increased cloud computing power, the ability to handle large data sets and for the mobility provided by access to the analytics platform from any location.



Other articles that may help to answer this question are:

Power BI is a great starting point for moving analytics to the cloud
Power BI brings many customers their first exposure to cloud analytics. Companies find multiple benefits of collaboration between users, mobile access, and the experience of adopting a new way to use data. Workspaces provide secure areas to work together on Power BI solutions and are linked to Exchange groups and OneDrive folders where other collateral can be shared.

Storage in the cloud 
Using Azure cloud based storage is relatively inexpensive, remotely accessible from anywhere, secure and in coming data is free (data moving out of your region is charged). Large data sets can be managed in Azure Blob Storage, Azure Data Lake, Azure SQL DW and Azure SQL Database.

Predictive Analytics with Azure Machine Learning
Azure ML provides a friendly tool for building machine learning experiments, cleaning and visualizing data. The environment provides familiar analytic tools for R and Python scripting, built in models for statistical analysis and a process for encapsulating trained models in web services as predictive experiments. Integration with Excel and other applications is simplified by the web service interface.

Questions:
Q1) In the PolyBase example does modeling the data by casting columns hinder performance of the data load in the "Create Table As" (CTAS) statement?
(Original posting left this question unanswered, edited to add the answer below)
In creating the External Table the columns must be of the same data type.

However, when querying the external table to use CTAS to populate a SQL DW table I found that using cast to change the datatype of the column was faster to complete the new table creation. This was true for two cases, one where I converted in the input column varchar2(20) to an Int and another where I converted the input varchar(100) to a Date data type. I suspect this is due to the increased speed of input for Int and Date data types over varchars, but it indicates that the original query time isn't different.



Q2) Is it simple to switch from cold to hot in Azure Blob Storage?

The Azure hot storage tier is optimized for storing data that is accessed frequently. The Azure cool storage tier is optimized for storing data that is infrequently accessed and long-lived. - Microsoft documentation 
In regions where these settings have been enabled (not East yet), they can be changed in the Configuration blade of the Storage Account.

Q3) When to use Data Lake in comparison to Blob Storage.

In the demo for this talk I read the NYC 2013 Taxi data from the 12 - 2+GB CSV files into Azure SQL DW using CTAS. At this time, Data Lake is still in preview, and we cannot yet use CTAS with Azure Data Lake. This forced me to use Azure Blob Storage.
Additoinal differences are that Data Lake is optimized for parallel analytics workloads and for high throughput and IOPS. Where Blob Storage is not optimized for analytics but is designed to hold any type of file. Data Lake will hold unlimited data and Blob Storage is limited to 500TB. There are cost differences too, I recommend the Azure Pricing Calculator.

I recommend the Microsoft article Comparing Azure Data Lake Store and Azure Blob Storage for details.