Data Analytics: Introduction, Terminology, Challenges, Platforms, Tools, Applications

Table of Contents
Introduction
Why Data Analytics
Data Analytics Terminology
Predictive Analytics
Data Analytics challenges
Data Analytics Platform
Data Analytics tools
Hadoop
Data Analytics Application
Recommendations
Introduction
What is data ?
What is big data ?
Analysis v/s Analytics
WHAT IS DATA.. ?
Collection
 of Facts and Statistics
CLASSIFICATION OF DATA
Structured
    
High degree of organization such as relational database
Unstructured
    Information that is difficult to organize using traditional mechanisms
E
g: Facebook, Whatsapp, Gmail
WHAT IS DATA.. ?                    (
c
ontd..)
WHAT IS BIG DATA
Complex and Dynamic
3V
90% of World’s DATA
produced in Last 2 year
-IBM
 ANALYTICS Vs ANALYSIS
ANALYTICS
 
Extensive use of mathematics & statistics, use of
descriptive techniques and predictive models to gain
valuable knowledge
WHY DATA ANALYTICS ?
From Reactive strategy to proactive
strategy: 
Helped in Determining President of
America
DATA ANALYTICS IN REAL WORLD
WALLMART
   
 
Using predictive analytics to better identify customer
preferences on a regional basis and stock their branch
locations accordingly
REAL WORLD APPLICATIONS
(contd..)
Medical diagnostics company analyzed and developed
first non-intrusive test for predicting coronary artery
disease:  .
 Researchers analyzed over 100 million gene samples
 Identified the 23 primary predictive genes for coronary 
artery disease
 The resulting test, known as the “Corus CAD Test,” was
recognized as on of the “Top Ten Medical Breakthroughs
of 2010” by 
TIME Magazine
Data Analytics terminology
Data mining
Data Warehousing
OLAP
Big Data Analytics 
Business Analytics
Descriptive Analytics
Predictive Analytics
11
PREDICTIVE ANALYTICS
Extracting information from existing data sets in order to
determine patterns and predict future outcomes and trends
Predictive analytics is an enabler of big data
Faster, cheaper computers and easier-to-use software
PREDICTIVE ANALYTICS ( contd..)
What Is Machine Learning
14
Type of artificial intelligence  that  provides computers
with the ability to learn without being explicitly
programmed.
 
Some Application Of ML
   
Spam filtering
   Topic Spotting
   Weather pridiction
   Medical  diagnosis
   Fraud Detection
Types Of Machine Learning
15
Supervised learning:
Types Of Machine Learning
16
UnSupervised learning:
 Some Algorithms Used For ML
17
  Linear Regression
  Decision Tree
  Na
ï
ve Byes theorem
  K-means Algorithm
SOME DATA ANALYTICS TOOLS
18
R
R is a programming language
Open Source environment
High Availability
An interpreted Language
Good data handling capability
Most advanced graphical capability
R support procedural and object oriented programming
Get better result faster
19
SAS
SAS is a commercial software developed by SAS
institute
It is expensive
Easy to learn
Good data handling capability
SAS releases updates in controlled environment
SAS provide dedicated customer support
20
DATA ANALYTICS IN CANADIAN
RAILWAY
21
IBM PURE DATA ANALYTICS TOOLS
Fast and Easy Set Up
Peta scale user data
capacity
Better Access to
Information
Customized Analytics
Integrated third party
software
3 X faster scan rate
128 GB/sec scan rate per
rack
50% greater data capacity
per rack
 
22
 DATA ANALYTICS PLATFORM
23
DATA ANALYTICS PLATFORMS (contd.)
Cloudera
Cloudera Inc. was founded by big data geniuses from Facebook,
     Google, Oracle and Yahoo in 2008.
First company to develop and distribute Apache Hadoop-based
     software.
Use Cloudera management suite to automate the installation
      process
It uses HDFS component for file system access
Centralized metadata architecture
24
Hortonworks
Hortonworks, founded in 2011, has quickly emerged as
one of the leading vendors of Hadoop
It is a completely open source platform based on Apache
Hadoop for analysing, storing and managing big data
It is  better than MapReduce in the sense that it will
enable inclusion of more data processing frameworks
It uses HDFS component for File system access
Centralized metadata architecture
25
DATA ANALYTICS PLATFORMS (contd.)
HADOOP
Apache Hadoop is an open-source software
framework written in java for distributed
storage and distributed processing of very large data
sets on computer clusters built from commodity
hardware
HDFS
Specially designed file
system for storing huge
data sets with cluster of
commodity hardware
with streaming access
pattern
 
MAP REDUCE
Apache Hadoop MapReduce is a framework for processing
large data sets in parallel across a Hadoop cluster.
Data analysis uses a two step map and reduce process
MapReduce is a programming model Google has used
successfully is processing its “big-data” sets (~ 20000 peta
    bytes per day)
 Users specify the computation in terms of a map and a
     reduce function
EXISTING CHALLENGES IN INDIAN RAIL
SYSTEM
Delays
Signaling problem
Broken down trail
Congestion
QoS
One Solution to these problems can be Analysis of BIG Data
through Predictive maintenance
 Big Data in the Rail industry can be used in Predictive
analysis to predicts fault before they happen, thus improving
the services
PREDICTIVE MAINTENANCE:
BIG DATA ON RAILS
PREDICTIVE MAINTENACE (contd…)
Choose the right system or subsystem for prediction
The prediction possibility zone
Prediction effectiveness zone
Identify the required data sets as early as possible.
Identify the value-add of PM for maintenance
strategies
Complement your data science team with rail expertise
Look for the right skills when hiring data scientists
CHOOSING THE RIGHT SYSTEM OR
SUBSYSTEM FOR PREDICTION
The prediction possibility
zone
Prediction effectiveness zone
Automatic vehicle location
PASSENGER INFORMATION SYSTEM
AUTOMATED FARE COLLECTION
Using ticket vending machine
Using smart card that provides access to all type of
transit services across multiple operating agencies
AFC Analytics provides details of passengers are using
systems , identify the trends and help improve the
services
AUTOMATED PASSENGER
COUNTING
No of passengers boarding de-boarding each vehicle in
a particular Station
Rate of Increase of passengers can be predicted over
the years by using the recorded data
Peak hours in a day and Peak Months in a year can be
identified
These data can used to provide better services and
project evolving ridership trends
Slide Note
Embed
Share

Delve into the world of data analytics through this comprehensive guide covering topics such as the definition of data, big data, analytics vs analysis, the importance of data analytics, real-world applications, and more. Explore the classification of data, the 3Vs of big data, and how data analytics has transformed industries like healthcare and retail. Discover the power of predictive analytics in shaping decisions and driving innovation.

  • Data Analytics
  • Big Data
  • Predictive Analytics
  • Data Challenges
  • Data Tools

Uploaded on Apr 16, 2024 | 5 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data Analytics tools Hadoop Data Analytics Application Recommendations

  2. Introduction What is data ? What is big data ? Analysis v/s Analytics

  3. WHAT IS DATA.. ? Collection of Facts and Statistics

  4. WHAT IS DATA.. ? (contd..) CLASSIFICATION OF DATA Structured High degree of organization such as relational database Unstructured Information that is difficult to organize using traditional mechanisms Eg: Facebook, Whatsapp, Gmail

  5. WHAT IS BIG DATA Complex and Dynamic 3V 90% of World s DATA produced in Last 2 year -IBM

  6. ANALYTICS Vs ANALYSIS ANALYTICS Extensive use of mathematics & statistics, use of descriptive techniques and predictive models to gain valuable knowledge ANALYSIS ANALYTICS Why did something happen? What is likely to happen?

  7. WHY DATA ANALYTICS ? From Reactive strategy to proactive strategy: Helped in Determining President of America

  8. DATA ANALYTICS IN REAL WORLD WALLMART Using predictive analytics to better identify customer preferences on a regional basis and stock their branch locations accordingly

  9. REAL WORLD APPLICATIONS (contd..) Medical diagnostics company analyzed and developed first non-intrusive test for predicting coronary artery disease: . Researchers analyzed over 100 million gene samples Identified the 23 primary predictive genes for coronary artery disease The resulting test, known as the Corus CAD Test, was recognized as on of the Top Ten Medical Breakthroughs of 2010 by TIME Magazine

  10. Data Analytics terminology Data mining Data Warehousing OLAP Big Data Analytics Business Analytics Descriptive Analytics Predictive Analytics 11

  11. PREDICTIVE ANALYTICS Extracting information from existing data sets in order to determine patterns and predict future outcomes and trends Predictive analytics is an enabler of big data Faster, cheaper computers and easier-to-use software

  12. PREDICTIVE ANALYTICS ( contd..)

  13. What Is Machine Learning Type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. Some Application Of ML Spam filtering Topic Spotting Weather pridiction Medical diagnosis Fraud Detection 14

  14. Types Of Machine Learning Supervised learning: 15

  15. Types Of Machine Learning UnSupervised learning: 16

  16. Some Algorithms Used For ML Linear Regression Decision Tree Na ve Byes theorem K-means Algorithm 17

  17. SOME DATA ANALYTICS TOOLS 18

  18. R R is a programming language Open Source environment High Availability An interpreted Language Good data handling capability Most advanced graphical capability R support procedural and object oriented programming Get better result faster 19

  19. SAS SAS is a commercial software developed by SAS institute It is expensive Easy to learn Good data handling capability SAS releases updates in controlled environment SAS provide dedicated customer support 20

  20. DATA ANALYTICS IN CANADIAN RAILWAY 21

  21. IBM PURE DATA ANALYTICS TOOLS Fast and Easy Set Up Peta scale user data capacity Better Access to Information Customized Analytics Integrated third party software 3 X faster scan rate 128 GB/sec scan rate per rack 50% greater data capacity per rack 22

  22. DATA ANALYTICS PLATFORM 23

  23. DATA ANALYTICS PLATFORMS (contd.) Cloudera Cloudera Inc. was founded by big data geniuses from Facebook, Google, Oracle and Yahoo in 2008. First company to develop and distribute Apache Hadoop-based software. Use Cloudera management suite to automate the installation process It uses HDFS component for file system access Centralized metadata architecture 24

  24. DATA ANALYTICS PLATFORMS (contd.) Hortonworks Hortonworks, founded in 2011, has quickly emerged as one of the leading vendors of Hadoop It is a completely open source platform based on Apache Hadoop for analysing, storing and managing big data It is better than MapReduce in the sense that it will enable inclusion of more data processing frameworks It uses HDFS component for File system access Centralized metadata architecture 25

  25. HADOOP Apache Hadoop is an open-sourcesoftware framework written in java fordistributed storageand distributed processing of very large data sets on computer clusters built from commodity hardware

  26. HDFS Specially designed file system for storing huge data sets with cluster of commodity hardware with streaming access pattern

  27. MAP REDUCE Apache Hadoop MapReduce is a framework for processing large data sets in parallel across a Hadoop cluster. Data analysis uses a two step map and reduce process MapReduce is a programming model Google has used successfully is processing its big-data sets (~ 20000 peta bytes per day) Users specify the computation in terms of a map and a reduce function

  28. EXISTING CHALLENGES IN INDIAN RAIL SYSTEM Delays Signaling problem Broken down trail Congestion QoS One Solution to these problems can be Analysis of BIG Data through Predictive maintenance Big Data in the Rail industry can be used in Predictive analysis to predicts fault before they happen, thus improving the services

  29. PREDICTIVE MAINTENANCE: BIG DATA ON RAILS

  30. PREDICTIVE MAINTENACE (contd) Choose the right system or subsystem for prediction The prediction possibility zone Prediction effectiveness zone Identify the required data sets as early as possible. Identify the value-add of PM for maintenance strategies Complement your data science team with rail expertise Look for the right skills when hiring data scientists

  31. CHOOSING THE RIGHT SYSTEM OR SUBSYSTEM FOR PREDICTION The prediction possibility zone Prediction effectiveness zone

  32. APPLICATION OF DATA ANALYTICS IN INDIAN RAILWAYS

  33. Automatic vehicle location

  34. PASSENGER INFORMATION SYSTEM

  35. AUTOMATED FARE COLLECTION Using ticket vending machine Using smart card that provides access to all type of transit services across multiple operating agencies AFC Analytics provides details of passengers are using systems , identify the trends and help improve the services

  36. AUTOMATED PASSENGER COUNTING No of passengers boarding de-boarding each vehicle in a particular Station Rate of Increase of passengers can be predicted over the years by using the recorded data Peak hours in a day and Peak Months in a year can be identified These data can used to provide better services and project evolving ridership trends

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#