Simplifying Big Data Management with Azure Data Lake
Explore how Azure Data Lake offers a scalable solution for big data storage and analytics, addressing challenges like limited storage, processing power, and high costs. Learn about its architecture, features, and benefits for optimizing big data analytics.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Asanka Padmakumara Business Intelligence Consultant, Blog: asankap.wordpress.com Linked In: linkedin.com/in/asankapadmakumara Twitter: @asanka_e Facebook: facebook.com/asankapk
Move Your On- Prem Data to a Lake in the Clouds
Agenda Where are we right now? Why we need to go for Data Lake? What is Azure Data Lake? How do we get there? Demo Q & A
What are the challenges? Limited storage Limited processing power High hardware cost High maintains cost No disaster recovery Availability and reliability issues Scalability issues Security Solution: Azure Data Lake
What is Azure Data Lake? Highly scalable data storage and analytics service Intended for big data storage and analysis A faster and efficient solution than on-prem data centers Three services: Azure Data Lake Analytics HDInsight ( managed clusters ) Analytics Azure Data Lake Storage Storage
Azure Data Lake Store Built for Hadoop Compatible with most components in Hadoop Eco- systems Web HDFS API Unlimited storage, petabyte files Performance-tuned for big data analytics High throughput, IOPs Multiple parts of a file in multiple servers: Parallel reading Enterprise-ready: Highly-available and secure All Data, One Place Any Data in native format No schema, No prior processing
Optimized for Big Data Analytics Multiple copies of same file in improve reading Locally-redundant (multiple copies of data in one Azure region) Parallel reading and writing Configurable throughput No Limitation in file size or storage
Secure Data in Azure Data Lake Store Authentication Azure Active Directory All AAD features End-user authentication or Service-to-service authentication Access Control POSIX-style permissions Read, Write, Execute ACLs can be enabled on the root folder, on subfolders, and on individual files. Encryption Encryption at rest Encryption at transit -HTTPS
How to ingest data to Azure Data Lake Store Large Data Set Azure Power Shell Azure Cross Platform CLI 2.0 Azure Data Lake Store .NET SDK Azure Data Factory Really Large Data Sets Azure ExpressRoute Azure Import/Export service Small Data Sets Azure Portal Azure Power Shell Azure Cross Platform CLI 2.0 Data Lake Tools For Visual Studio Streamed data Azure Stream Analytics Azure HDInsight Storm Data Lake Store .NET SDK Relational data Apache Sqoop Azure Data Factory
How it different from Azure Blob Storage Azure Data Lake Store Azure Blob Storage Optimized storage for big data analytics workloads General purpose Purpose Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data Use Case Contains folders, which in turn contains data stored as files Contains containers, which in turn has data in the form of blobs Key Concepts No limits on account sizes, file sizes or number of files 500 TiB Size limits Locally-redundant (multiple copies of data in one Azure region) Locally redundant (LRS), globally redundant (GRS), read-access globally redundant (RA-GRS). Geo-redundancy
Azure Data Lake Analytics Massive processing power Adjustable parallelism No server, VM, Cluster to maintain. Pay for the Job Use existing .Net, R and Python libraries. New language : U-SQL
U-SQL @ExtraRuns = SELECT IPLYear, Bowler, SUM( string.IsNullOrWhiteSpace(ExtraRuns)? 0: Convert.ToInt32(ExtraRuns) ) AS ExtraRuns, ExtraType FROM @MatchData GROUP BY IPLYear,Bowler,ExtraType; U-SQL C# SQL Combination of Declarative Logic of SQL and Procedure logic of C# Case sensitive Schema on Read
Pricing Pay-as-you-go For a 1 TB storage, for a month = $39.94 Monthly commitment packages For a 1 TB storage, for a month = $35 Usage base: Usage Price Write operations (per 10,000) $0.05 Read operations (per 10,000) $0.004 Delete operations Free Transaction size limit No limit https://azure.microsoft.com/en-us/pricing/details/data-lake-store/