Course Overview: Extracting Value from Data Analytics

So, what was this course about?
 
Ingredients
Data Analytics
Analyzing & extracting value from data
Humans
As analysts extracting value
As workers helping the analysis
Course Objectives
Reading and Comprehension Skills
You read ~25 papers
Critical Thinking and Discussion Skills
Active engaging in critically analyzing papers flaws and
insights
Research Skills
Semester-long meaty project
Presentation Skills
Present the key ideas of a database style paper
Optimization Objectives
Accuracy
Better, more complete results
Power
Enabler of more interesting analyses
Speed
Want results quickly
Ease of use
For both novice and expert users
Cost
Crowds, resources
Topics Covered
A.
Dealing with Unstructured/Noisy Data
1)
Crowd-Powered Algorithms
2)
Crowd-Powered Systems
B.
Dealing with More Data
1)
Scalable Data Analytics
2)
Approximate Data Analytics
C.
Dealing with Novice Analysts
1)
Visual Analytics Systems
2)
New Interfaces & Usability
D.
Dealing with New Scenarios
1)
Data Science, ML, and Graph Processing
2)
Collaborative Query Processing
For each, we covered a
A) system or an algorithm
+
B) connections to other
(sometimes old)
database topics
Topics Covered
A.
New forms of data
1)
Crowd-Powered Algorithms
a.
CrowdScreen: Filtering data with humans: 
 
  
cost/latency/accuracy; probabilistic reasoning
b.
So Who Won: Max
         Graph-based maximum-likelihood reasoning
c.
Sorts and Joins: Sorting and joins with humans
  
New types of interfaces (hybrid), batching
2)   Crowd-Powered Systems
a.
CrowdDB: DB + Crowds
  
Data model (CNULL), query constructs, query processing
b.
Deco: DB + Crowds
  
A more complete language
Topics Covered
B. Dealing with more data
1) Scalable Data Analytics
a.
Dremel: Google’s parallel column-store system
  
distributed query processing, column stores
b.
SparkSQL: DB layer on Spark
  
Translation from SQL to Spark queries,  ..
Topics Covered
B. Dealing with more data
2) Approximate Analytics: tradeoff between c/l/a
a.
BlinkDB: Approximate Query Answering System
  
stratified samples help! Query column sets
b.
Sample+Seek 
         
 
Importance-biased sampling can help
    
   
Topics Covered
C. Dealing with novice analysts
1) Visual Analytics Systems
a.
Trust me, I’m partially right: approximate vis
  
online aggregation
b.
I’ve seen enough
          
 
ideas of incremental visualization
c.
Immens
             
 
in-situ data cube plus brushing and linking
d.
Polaris: Basis for tableau
  
Idea of a data cube, visualizations = cube aggregates!
e.
Zenvisage: visual data exploration
  
scalable grouped query execution techniques
Topics Covered
C. Dealing with novice analysts
2) New Interfaces and Usability
a.
DBTouch
  
touch-based querying of data: pinch+zoom
b.
Gestural Query Specification
  
completeness of operators; user study!
c.
Making Database Systems Usable
  
natural language
  
interface types: forms, keyword search, QBE
d.
DataPlay
  
building on visual query builders with feedback
e.
DataSpread
  
spreadsheets+databases
Topics Covered
D. Dealing with new settings
1) Machine Learning and Graph Processing
a.
MLBase: Wrapper on ML algorithms
  
parameter tuning for ML is a pain
b.
Laura Haas’s lecture 
          
 
Data Science as a Service
c.
MADSkills: Wrapper on traditional database
  
kinds of ML-based analyses of interest
d.
Graphlab: Distributed Graph Analytics tools
  
graph analytics systems, “thinking like a vertex”
Topics Covered
D. Dealing with new settings
2) Collaborative Analyses
a.
Orpheus: data collaboration
   
collaboration as a versioning problem
“Historical” Takeaways
Examples:
Storage layer:
column stores, data compression, data sampling
Processing layer:
noSQL, adaptive QP, parallel QP
c-l-a tradeoff in crowdsourcing, interfaces, batching
Usability layer:
forms, keyword search, QBE
data integration, data cleaning
Visualization layer:
binning, aggregation, data cubes, online aggregation
Applications layer:
graph processing
machine learning primitives
Mix of Papers: Vision vs. Details
Visionary, examples
Database usability
DBTouch/GestureDB
MLBase
Detail-Oriented, examples
CrowdScreen
GraphLab
Dremel, SparkSQL
Mix of Papers: Algorithmic vs. Systems
Algorithmic: probably 30-35%
Systems-oriented: the majority
Not surprising given that this is a database
systems course ….
(Hopefully) Lessons Learned
Don’t solve non-problems!
Importance of
thinking about users
Interface
Language
careful systems architecture
Generalizable
Efficient / Powerful
Tailored to use-cases
Data analytics involves:
Usability
Careful, Scalable system architecture (Systems)
Principled algorithms design (Algorithms)
Slide Note
Embed
Share

This course focused on extracting value from data analytics, with an emphasis on reading, comprehension, critical thinking, research, and presentation skills. It aimed to optimize accuracy, power, speed, ease of use, and cost in data analysis. Various topics covered included dealing with unstructured data, scalable analytics, visual analytics, new interfaces, data science, ML, and collaborative query processing. The course also explored crowd-powered algorithms, systems, and new forms of data, providing insights into handling more data effectively and efficiently.

  • Data Analytics
  • Value Extraction
  • Optimization
  • Data Science
  • Critical Thinking

Uploaded on Dec 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. So, what was this course about?

  2. Ingredients Data Analytics Analyzing & extracting value from data Humans As analysts extracting value As workers helping the analysis

  3. Course Objectives Reading and Comprehension Skills You read ~25 papers Critical Thinking and Discussion Skills Active engaging in critically analyzing papers flaws and insights Research Skills Semester-long meaty project Presentation Skills Present the key ideas of a database style paper

  4. Optimization Objectives Accuracy Better, more complete results Power Enabler of more interesting analyses Speed Want results quickly Ease of use For both novice and expert users Cost Crowds, resources

  5. Topics Covered A. Dealing with Unstructured/Noisy Data 1) Crowd-Powered Algorithms 2) Crowd-Powered Systems B. Dealing with More Data 1) Scalable Data Analytics 2) Approximate Data Analytics C. Dealing with Novice Analysts 1) Visual Analytics Systems 2) New Interfaces & Usability D. Dealing with New Scenarios 1) Data Science, ML, and Graph Processing 2) Collaborative Query Processing For each, we covered a A) system or an algorithm + B) connections to other (sometimes old) database topics

  6. Topics Covered A. New forms of data 1) Crowd-Powered Algorithms a. CrowdScreen: Filtering data with humans: cost/latency/accuracy; probabilistic reasoning b. So Who Won: Max Graph-based maximum-likelihood reasoning c. Sorts and Joins: Sorting and joins with humans New types of interfaces (hybrid), batching 2) Crowd-Powered Systems a. CrowdDB: DB + Crowds Data model (CNULL), query constructs, query processing b. Deco: DB + Crowds A more complete language

  7. Topics Covered B. Dealing with more data 1) Scalable Data Analytics a. Dremel: Google s parallel column-store system distributed query processing, column stores b. SparkSQL: DB layer on Spark Translation from SQL to Spark queries, ..

  8. Topics Covered B. Dealing with more data 2) Approximate Analytics: tradeoff between c/l/a a. BlinkDB: Approximate Query Answering System stratified samples help! Query column sets b. Sample+Seek Importance-biased sampling can help

  9. Topics Covered C. Dealing with novice analysts 1) Visual Analytics Systems a. Trust me, I m partially right: approximate vis online aggregation b. I ve seen enough ideas of incremental visualization c. Immens in-situ data cube plus brushing and linking d. Polaris: Basis for tableau Idea of a data cube, visualizations = cube aggregates! e. Zenvisage: visual data exploration scalable grouped query execution techniques

  10. Topics Covered C. Dealing with novice analysts 2) New Interfaces and Usability a. DBTouch touch-based querying of data: pinch+zoom b. Gestural Query Specification completeness of operators; user study! c. Making Database Systems Usable natural language interface types: forms, keyword search, QBE d. DataPlay building on visual query builders with feedback e. DataSpread spreadsheets+databases

  11. Topics Covered D. Dealing with new settings 1) Machine Learning and Graph Processing a. MLBase: Wrapper on ML algorithms parameter tuning for ML is a pain b. Laura Haas s lecture Data Science as a Service c. MADSkills: Wrapper on traditional database kinds of ML-based analyses of interest d. Graphlab: Distributed Graph Analytics tools graph analytics systems, thinking like a vertex

  12. Topics Covered D. Dealing with new settings 2) Collaborative Analyses a. Orpheus: data collaboration collaboration as a versioning problem

  13. Historical Takeaways Examples: Storage layer: column stores, data compression, data sampling Processing layer: noSQL, adaptive QP, parallel QP c-l-a tradeoff in crowdsourcing, interfaces, batching Usability layer: forms, keyword search, QBE data integration, data cleaning Visualization layer: binning, aggregation, data cubes, online aggregation Applications layer: graph processing machine learning primitives

  14. Mix of Papers: Vision vs. Details Visionary, examples Database usability DBTouch/GestureDB MLBase Detail-Oriented, examples CrowdScreen GraphLab Dremel, SparkSQL

  15. Mix of Papers: Algorithmic vs. Systems Algorithmic: probably 30-35% Systems-oriented: the majority Not surprising given that this is a database systems course .

  16. (Hopefully) Lessons Learned Don t solve non-problems! Importance of thinking about users Interface Language careful systems architecture Generalizable Efficient / Powerful Tailored to use-cases Data analytics involves: Usability Careful, Scalable system architecture (Systems) Principled algorithms design (Algorithms)

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#