Exploring Data Analytics: Introduction, Terminology, Challenges, Platforms, Tools, Applications
Delve into the world of data analytics through this comprehensive guide covering topics such as the definition of data, big data, analytics vs analysis, the importance of data analytics, real-world applications, and more. Explore the classification of data, the 3Vs of big data, and how data analytics has transformed industries like healthcare and retail. Discover the power of predictive analytics in shaping decisions and driving innovation.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data Analytics tools Hadoop Data Analytics Application Recommendations
Introduction What is data ? What is big data ? Analysis v/s Analytics
WHAT IS DATA.. ? Collection of Facts and Statistics
WHAT IS DATA.. ? (contd..) CLASSIFICATION OF DATA Structured High degree of organization such as relational database Unstructured Information that is difficult to organize using traditional mechanisms Eg: Facebook, Whatsapp, Gmail
WHAT IS BIG DATA Complex and Dynamic 3V 90% of World s DATA produced in Last 2 year -IBM
ANALYTICS Vs ANALYSIS ANALYTICS Extensive use of mathematics & statistics, use of descriptive techniques and predictive models to gain valuable knowledge ANALYSIS ANALYTICS Why did something happen? What is likely to happen?
WHY DATA ANALYTICS ? From Reactive strategy to proactive strategy: Helped in Determining President of America
DATA ANALYTICS IN REAL WORLD WALLMART Using predictive analytics to better identify customer preferences on a regional basis and stock their branch locations accordingly
REAL WORLD APPLICATIONS (contd..) Medical diagnostics company analyzed and developed first non-intrusive test for predicting coronary artery disease: . Researchers analyzed over 100 million gene samples Identified the 23 primary predictive genes for coronary artery disease The resulting test, known as the Corus CAD Test, was recognized as on of the Top Ten Medical Breakthroughs of 2010 by TIME Magazine
Data Analytics terminology Data mining Data Warehousing OLAP Big Data Analytics Business Analytics Descriptive Analytics Predictive Analytics 11
PREDICTIVE ANALYTICS Extracting information from existing data sets in order to determine patterns and predict future outcomes and trends Predictive analytics is an enabler of big data Faster, cheaper computers and easier-to-use software
What Is Machine Learning Type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. Some Application Of ML Spam filtering Topic Spotting Weather pridiction Medical diagnosis Fraud Detection 14
Types Of Machine Learning Supervised learning: 15
Types Of Machine Learning UnSupervised learning: 16
Some Algorithms Used For ML Linear Regression Decision Tree Na ve Byes theorem K-means Algorithm 17
R R is a programming language Open Source environment High Availability An interpreted Language Good data handling capability Most advanced graphical capability R support procedural and object oriented programming Get better result faster 19
SAS SAS is a commercial software developed by SAS institute It is expensive Easy to learn Good data handling capability SAS releases updates in controlled environment SAS provide dedicated customer support 20
DATA ANALYTICS IN CANADIAN RAILWAY 21
IBM PURE DATA ANALYTICS TOOLS Fast and Easy Set Up Peta scale user data capacity Better Access to Information Customized Analytics Integrated third party software 3 X faster scan rate 128 GB/sec scan rate per rack 50% greater data capacity per rack 22
DATA ANALYTICS PLATFORMS (contd.) Cloudera Cloudera Inc. was founded by big data geniuses from Facebook, Google, Oracle and Yahoo in 2008. First company to develop and distribute Apache Hadoop-based software. Use Cloudera management suite to automate the installation process It uses HDFS component for file system access Centralized metadata architecture 24
DATA ANALYTICS PLATFORMS (contd.) Hortonworks Hortonworks, founded in 2011, has quickly emerged as one of the leading vendors of Hadoop It is a completely open source platform based on Apache Hadoop for analysing, storing and managing big data It is better than MapReduce in the sense that it will enable inclusion of more data processing frameworks It uses HDFS component for File system access Centralized metadata architecture 25
HADOOP Apache Hadoop is an open-sourcesoftware framework written in java fordistributed storageand distributed processing of very large data sets on computer clusters built from commodity hardware
HDFS Specially designed file system for storing huge data sets with cluster of commodity hardware with streaming access pattern
MAP REDUCE Apache Hadoop MapReduce is a framework for processing large data sets in parallel across a Hadoop cluster. Data analysis uses a two step map and reduce process MapReduce is a programming model Google has used successfully is processing its big-data sets (~ 20000 peta bytes per day) Users specify the computation in terms of a map and a reduce function
EXISTING CHALLENGES IN INDIAN RAIL SYSTEM Delays Signaling problem Broken down trail Congestion QoS One Solution to these problems can be Analysis of BIG Data through Predictive maintenance Big Data in the Rail industry can be used in Predictive analysis to predicts fault before they happen, thus improving the services
PREDICTIVE MAINTENANCE: BIG DATA ON RAILS
PREDICTIVE MAINTENACE (contd) Choose the right system or subsystem for prediction The prediction possibility zone Prediction effectiveness zone Identify the required data sets as early as possible. Identify the value-add of PM for maintenance strategies Complement your data science team with rail expertise Look for the right skills when hiring data scientists
CHOOSING THE RIGHT SYSTEM OR SUBSYSTEM FOR PREDICTION The prediction possibility zone Prediction effectiveness zone
APPLICATION OF DATA ANALYTICS IN INDIAN RAILWAYS
AUTOMATED FARE COLLECTION Using ticket vending machine Using smart card that provides access to all type of transit services across multiple operating agencies AFC Analytics provides details of passengers are using systems , identify the trends and help improve the services
AUTOMATED PASSENGER COUNTING No of passengers boarding de-boarding each vehicle in a particular Station Rate of Increase of passengers can be predicted over the years by using the recorded data Peak hours in a day and Peak Months in a year can be identified These data can used to provide better services and project evolving ridership trends