Insights into the World of Statistics and Data Science

Some Musings on Life
Some Musings on Life
& “Data Science”
& “Data Science”
Statistical Learning and Data Science
Statistical Learning and Data Science
Friday Center, UNC, Chapel Hill
Friday Center, UNC, Chapel Hill
J. S. Marron
Dept. of Statistics and Operations Research
University of North Carolina
Some Views of Statistics
 
Most People
Reality
Some Views of Statistics
Statistics
Bayes
Functional Data
HDLSS
Machine Learning
Sparsity
MCMC
Kernels
Bootstrap
Survival Analysis
Mixed Models
Time Series
Etc.  Etc.   Etc.  …
Some Views of Statistics
Statistics in Science
Statistics
Some Views of Statistics
Statistics in Science
Statistics
Medicine
Biology
Agriculture
Psychology
Economics
Geology
Physics
Some Views of Statistics
John Tukey Quote:
Statistics in Science
From:    http://www.morris.umn.edu/~sungurea/introstat/history/w98
Some Views of Statistics
John Tukey Quote:
“The best thing about being a statistician
is that you get to play in everyone's
backyard”
Statistics in Science
From: http://www.york.ac.uk/depts/maths/histstat/tukey_nytimes.htm
Some Views of Statistics
 
 
Words coined by John Tukey:
 
    Bit    (0 – 1 data unit)
 
    Software
 
(mention to Computer Science friends…)
Some Views of Statistics
 
Another Prescient Statistician:
Bill Cleveland
 
Coined the Term “Data Science”
 
Cleveland, W. S. (2001). Data science: an action plan
for expanding the technical areas of the field of
statistics. 
International Statistical Review
.
Some Views of Statistics
Most People
Some Views of Statistics
 
“Data Science (Analytics)”
 Computer Science
 Math (Applied)
 Bus. / Finance
 Others (Info. Sci., Psych, …)
Some Views of Statistics
What is (should be) the relationship?
Data Science
Machine Learning
            (Cleveland View)
Some Views of Statistics
What is (should be) the relationship?
Machine Learning
Data Science
The Big Question
 
 
What are the 
Boundaries of Statistics
?
 
 
NSF/DMS Program Director (late 2004):
 
“That is not statistics”
The Big Question
 
 
What are the 
Boundaries of Statistics
?
 
OK, then where are they?
 
We should discuss this much more…
 
Openly, not in the “Rejection Process
(Publications, Grants, etc.)”
Variation
Thoughts From Business Statistics Course
Variation
 
A Fundamental Concept:
  Sounds Obvious
  Easy to Not Consider (Forget)
{Surprisingly So}
Variation
 
  Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged 6 Inches in Depth…
o
  Hard to visualize?
 
                                                                                                 Thanks to N. I. Fisher
Variation
  Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged 6 Inches in Depth…
o
  Hard to visualize?
Lake Eyre, Australia, from Wikipedia
Variation
  Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged 6 Inches in Depth…
o
  Hard to visualize?
Lake Eyre, Australia, from     www.airadventure.com.au
Variation
 
  Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged
 6 Inches in Depth…
o
  Hard to visualize?
o
  Key is 
Variation About “Average”
o
  
Simple Idea Takes a Minute to Recall
(happens a lot)
 
Variation
 
A Fundamental Concept:
  Sounds Obvious
 
Common Gross Oversimplification:
They
 are going to …
They
 all want to ..
Group of people:
Political. Religious,
Ethnic Origin, …
U.S. Presidential
Politics ?!?
Variation
Homework C0.1
Find an Example of 
Ignoring Variation
.
Send me an email, with: text, and attribution.
Plan to discuss in class.
Variation
 
Homework C0.1
 
Results:    Out of First 10 Quotes
9 Were From
Donald Trump
Ideas on Human Relationships
 
Common Question:
“How Are Dep’t Politics Going?”
Background:
  Long Dubious History
  Merger of Statistics & OR
(More Diverse Interests)
  Rapidly Changing University
Ideas on Human Relationships
 
Response:
“Best I’ve Seen in Chapel Hill”
Reason:
Respect
 Key to Current Interactions
 Moved Beyond “Politics of Disrespect”
Ideas on Human Relationships
 
Fundamental Observation:
Human Interactions Work Best In An
Atmosphere of 
Respect
  Day to Day Interactions w/ Colleagues
  Reviews of Papers / Grant Proposals
  US Congress
  US Presidential Politics…
Special Thanks
Special Thanks
Department of Statistics and Applied Prob.
National University of Singapore
For Many Discussions    
     This Talk
BIG DATA Models & Concepts
BIG DATA Models & Concepts
Challenge from the Recent Media:
Mayer-Schönberger and Cukier  (2014)
“Big Data: A Revolution That Will Transform
How We Live, Work, and Think”
BIG DATA Models & Concepts
BIG DATA Models & Concepts
Challenge from the Recent Media:
Mayer-Schönberger and Cukier  (2014)
Major Premise:    Differing Data Analytic Goals
“Correlational”  vs. “Causal”
BIG DATA Models & Concepts
BIG DATA Models & Concepts
“Causal” Data Analysis:
 Goal:  Underlying Causes of Phenomena
 Approach: Classical “Scientific Method”
Formulate Hypothesis
Collect Data
Test Hypothesis
 Consequences:
Solid 
Knowledge
 w/ 
Measurable Certainty
BIG DATA Models & Concepts
BIG DATA Models & Concepts
“Correlational” Data Analysis:
 Goal:  Find (and Use) Mere Correlations
 Motivation: Correlations are
Useful  (e.g. ___  Recognition Software)
Valuable (Buying and Selling of Data…)
Insightful????
 Consequences:
Automatic
 Solutions to Some Hard Problems
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
Naïve Readers 
[Of Mayer-Schönberger and Cukier  (2014)]
:
     This is Exciting!!!
     Great New Ideas!!!
     Change Statistics Curricula!!!
     Start Up “Data Analytics”!!!
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
 
Time
Statistics
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
      
Pattern Recognition
      
Artificial Intelligence
      
Neural Networks
      
Data Mining
      
Machine Learning
 
Time
Statistics
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
      
Pattern Recognition
      
Artificial Intelligence
      
Neural Networks
      
Data Mining
      
Machine Learning
       
???
 
Time
Statistics
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
      
Pattern Recognition
      
Artificial Intelligence
      
Neural Networks
      
Data Mining
      
Machine Learning
      
Big Data – Data Science
 
Time
Statistics
A Small Aside
 
A Personal Apology to
Xiaotong Shen
For My Skepticism About
ASA Section on Data Mining
My (Wrong) Idea:  Name Would Change,
    So Not Appropriate as “Section”
{Great to See Recent Name Change}
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
      
Pattern Recognition
      
Artificial Intelligence
      
Neural Networks
      
Data Mining
      
Machine Learning
      
Big Data – Data Science
 
Time
Statistics
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
      
Pattern Recognition
      
Artificial Intelligence
      
Neural Networks
      
Data Mining
      
Machine Learning
      
Big Data
 
Some Came
With Major
New Ideas
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
      
Pattern Recognition
      
Artificial Intelligence
      
Neural Networks
      
Data Mining
      
Machine Learning
      
Big Data
 
Less So For
Others, But
More Focus
On
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
Data Mining
Great Correlational Discovery
 
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
Data Mining
Great Correlational Discovery:
Super Market Scanner Data
Baby Diapers (aka Nappies)   &   Beer
 
Correlation vs. Causation
Correlation vs. Causation
How New Is This Discussion?
Data Mining
Baby Diapers (aka Nappies)   &   Beer
Some Perspective:
 Correlational Discovery
 Makes 
Causational Sense
(Too Soon To Totally Dump Causation)
 
Correlation vs. Causation
Correlation vs. Causation
Relative Emphasis???
 
Correlation vs. Causation
Correlation vs. Causation
Relative Emphasis???
Classical Statistics:
Correlation
 vs. 
Causation
 
Correlation vs. Causation
Correlation vs. Causation
Relative Emphasis???
Mayer-Schönberger and Cukier:
Correlation
 vs. 
Causation
 
Correlation vs. Causation
Correlation vs. Causation
Relative Emphasis???
Suggested Actual Future Course:
Correlation & Causation
 
Correlation vs. Causation
Correlation vs. Causation
Relative Emphasis???
Suggested Actual Future Course:
Correlation & Causation
Note:   Changes 
Are Needed
 in Curricula, Etc.
 
The Big Question
 
 
What are the 
Boundaries of Statistics
?
 
 
NSF/DMS Program Director (late 2004):
 
“That is not statistics”
The Big Question
What are the 
Boundaries of Statistics
?
We Should Openly Discuss Much More…
 
OR
The Big Question
 
 
What are the 
Boundaries of Statistics
?
 
We Should Openly Discuss Much More…
 
How Much Leadership Should We Take?
 
Let’s 
Embrace
 Our 
Wide Diversity
of Opinions on This Point
Challenges for You
 
  Lead Statistics (D. S.) into the Future
  Promote 
Increasing Breadth
  Embrace 
New Ideas
  Advocate Them While Reviewing
  Speak Up Serving On Panels
  Openly Discuss Boundaries
Slide Note
Embed
Share

Musings on life intertwined with the intricate realms of statistical learning, data science, and their multi-faceted applications across various disciplines. Delve into the wisdom shared by prominent statisticians like John Tukey and Bill Cleveland, uncovering the essence of statistics in science and society.

  • Statistics
  • Data Science
  • John Tukey
  • Bill Cleveland
  • Insights

Uploaded on Oct 06, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Some Musings on Life & Data Science Statistical Learning and Data Science Friday Center, UNC, Chapel Hill J. S. Marron Dept. of Statistics and Operations Research University of North Carolina

  2. Some Views of Statistics Statistics z X 1 : s 2 n = = H 1 2 o EY X Most People

  3. Some Views of Statistics Statistics z X 1 : s Bayes 2 n Bootstrap = = H 1 2 o Kernels EY X HDLSS Survival Analysis Sparsity Functional Data Machine Learning MCMC Time Series Mixed Models Etc. Etc. Etc. Reality

  4. Some Views of Statistics Statistics Statistics in Science

  5. Some Views of Statistics Medicine Physics Statistics Biology Geology Agriculture Economics Psychology Statistics in Science

  6. Some Views of Statistics John Tukey Quote: From: http://www.morris.umn.edu/~sungurea/introstat/history/w98 Statistics in Science

  7. Some Views of Statistics John Tukey Quote: The best thing about being a statistician is that you get to play in everyone's backyard From: http://www.york.ac.uk/depts/maths/histstat/tukey_nytimes.htm Statistics in Science

  8. Some Views of Statistics Words coined by John Tukey: Bit (0 1 data unit) Software (mention to Computer Science friends )

  9. Some Views of Statistics Another Prescient Statistician: Bill Cleveland Coined the Term Data Science Cleveland, W. S. (2001). Data science: an action plan for expanding the technical areas of the field of statistics. International Statistical Review.

  10. Some Views of Statistics Statistics z X 1 : s 2 n = = H 1 2 o EY X Most People

  11. Some Views of Statistics Data Science (Analytics) Computer Science Math (Applied) Bus. / Finance Statistics z X 1 : Others (Info. Sci., Psych, ) s 2 n = = H Caution: a desire to replace old ideas with exciting new ones 1 2 o EY X

  12. Some Views of Statistics What is (should be) the relationship? Statistics Data Science Machine Learning (Cleveland View)

  13. Some Views of Statistics What is (should be) the relationship? Data Science Machine Learning Statistics

  14. The Big Question What are the Boundaries of Statistics? NSF/DMS Program Director (late 2004): That is not statistics

  15. The Big Question What are the Boundaries of Statistics? OK, then where are they? We should discuss this much more Openly, not in the Rejection Process (Publications, Grants, etc.)

  16. Variation Thoughts From Business Statistics Course

  17. Variation A Fundamental Concept: Sounds Obvious Easy to Not Consider (Forget) {Surprisingly So}

  18. Variation Easy to Not Consider (Forget) E.g. An Explorer Drowned in a Lake That Averaged 6 Inches in Depth o Hard to visualize? Thanks to N. I. Fisher

  19. Variation Easy to Not Consider (Forget) E.g. An Explorer Drowned in a Lake That Averaged 6 Inches in Depth o Hard to visualize? Lake Eyre, Australia, from Wikipedia

  20. Variation Easy to Not Consider (Forget) E.g. An Explorer Drowned in a Lake That Averaged 6 Inches in Depth o Hard to visualize? Lake Eyre, Australia, from www.airadventure.com.au

  21. Variation Easy to Not Consider (Forget) E.g. An Explorer Drowned in a Lake That Averaged 6 Inches in Depth o Hard to visualize? o Key is Variation About Average oSimple Idea Takes a Minute to Recall (happens a lot)

  22. Variation A Fundamental Concept: Sounds Obvious U.S. Presidential Politics ?!? Common Gross Oversimplification: Theyare going to Group of people: Political. Religious, Ethnic Origin, They all want to ..

  23. Variation Homework C0.1 Find an Example of Ignoring Variation. Send me an email, with: text, and attribution. Plan to discuss in class.

  24. Variation Homework C0.1 Results: Out of First 10 Quotes 9 Were From Donald Trump

  25. Ideas on Human Relationships Common Question: How Are Dep t Politics Going? Background: Long Dubious History Merger of Statistics & OR (More Diverse Interests) Rapidly Changing University

  26. Ideas on Human Relationships Response: Best I ve Seen in Chapel Hill Reason: Respect Key to Current Interactions Moved Beyond Politics of Disrespect

  27. Ideas on Human Relationships Fundamental Observation: Human Interactions Work Best In An Atmosphere of Respect Day to Day Interactions w/ Colleagues Reviews of Papers / Grant Proposals US Congress US Presidential Politics

  28. National University of Singapore Special Thanks UNC, Stat & OR Department of Statistics and Applied Prob. National University of Singapore For Many Discussions This Talk 28

  29. BIG DATA Models & Concepts UNC, Stat & OR Challenge from the Recent Media: Mayer-Sch nberger and Cukier (2014) Big Data: A Revolution That Will Transform How We Live, Work, and Think 29

  30. BIG DATA Models & Concepts UNC, Stat & OR Challenge from the Recent Media: Mayer-Sch nberger and Cukier (2014) Major Premise: Differing Data Analytic Goals Correlational vs. Causal 30

  31. BIG DATA Models & Concepts UNC, Stat & OR Causal Data Analysis: Goal: Underlying Causes of Phenomena Approach: Classical Scientific Method Formulate Hypothesis Collect Data Test Hypothesis Consequences: Solid Knowledge w/ Measurable Certainty 31

  32. BIG DATA Models & Concepts UNC, Stat & OR Correlational Data Analysis: Goal: Find (and Use) Mere Correlations Motivation: Correlations are Useful (e.g. ___ Recognition Software) Valuable (Buying and Selling of Data ) Insightful???? Consequences: Automatic Solutions to Some Hard Problems 32

  33. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? 33

  34. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Na ve Readers [Of Mayer-Sch nberger and Cukier (2014)]: This is Exciting!!! Great New Ideas!!! Change Statistics Curricula!!! Start Up Data Analytics !!! 34

  35. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Statistics Time 35

  36. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Statistics Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning Time 36

  37. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Statistics Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning ??? Time 37

  38. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Statistics Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning Big Data Data Science Time 38

  39. A Small Aside A Personal Apology to Xiaotong Shen For My Skepticism About ASA Section on Data Mining My (Wrong) Idea: Name Would Change, So Not Appropriate as Section {Great to See Recent Name Change}

  40. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Statistics Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning Big Data Data Science Time 40

  41. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Some Came With Major New Ideas Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning Big Data 41

  42. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning Big Data Less So For Others, But More Focus On 42

  43. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Data Mining Great Correlational Discovery 43

  44. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Data Mining Great Correlational Discovery: Super Market Scanner Data Baby Diapers (aka Nappies) & Beer 44

  45. Correlation vs. Causation UNC, Stat & OR How New Is This Discussion? Data Mining Baby Diapers (aka Nappies) & Beer Some Perspective: Correlational Discovery Makes Causational Sense (Too Soon To Totally Dump Causation) 45

  46. Correlation vs. Causation UNC, Stat & OR Relative Emphasis??? 46

  47. Correlation vs. Causation UNC, Stat & OR Relative Emphasis??? Classical Statistics: Correlation vs. Causation 47

  48. Correlation vs. Causation UNC, Stat & OR Relative Emphasis??? Mayer-Sch nberger and Cukier: Correlation vs. Causation 48

  49. Correlation vs. Causation UNC, Stat & OR Relative Emphasis??? Suggested Actual Future Course: Correlation & Causation 49

  50. Correlation vs. Causation UNC, Stat & OR Relative Emphasis??? Suggested Actual Future Course: Correlation & Causation Note: Changes Are Needed in Curricula, Etc. 50

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#