Visualizing Categorical Data in Data Analysis

 
Lecture 2
 
Displaying and describing
categori
c
al data
 
Make a picture
 
Large tables are inconvenient: we see many
many rows, but can not observe anything
(see next slide)
 
It has
about
100
rows
 
Make a picture
 
In the previous table, what if we wanted to
see proportion of
freshmen/sophmores/juniors/seniors in the
Commodores football team?
We would have to draw a chart. Chart
should make our eye immediately capture
differences between proportions.
 
A frequency table
 
We first summarize the table we have into a
shorter one
 
 
 
 
 
 
 
 
 
This table is still a bit too hard. We can, of
course, compare 4 numbers. But what if we had
more rows? Say, ages 0—2, 2—4, 4—6, and so
on. Or the numbers are large: compare
10123248 to 10123419.
 
A bar chart
 
A pie chart
And many more!
 
Just open MS Word and hit “Insert
chart”
 
Why
this
one is
bad?
 
Exploring the relationship
 
A single football player has two
categorical “properties”: say, year of
study and position?
We want to know: are they related or
“independent”? I.e., if one is a senior,
can we confidently say that, most
probably, he is not a wide receiver?
 
Let’s switch to the book: Titanic survivors
 
Let’s identify the “who”s and the “what”s. Can we
now say that someone from the first class had
more chances to survive?
 
The bad thing is that we see too much. We
see that 203 1
st
 class passengers survived
versus 178 from the 3
rd
 class. But then we
look down and see 325 vs 706
 
Instead of “Alive + Total” we now have only
one number to compare
 
Conditional distributions
 
We can do, for example, this: how many
alive passengers were in the first class? In
the second class? And so on.
Mathematically we ask: what is the
proportion of survivors 
CONDITIONED
 to
the fact that they are in the first class?
We get the following table
First column reads:
 
203 out of 711 survivors were from the first class.
Or: 28.6% of all survivors were from the first class
 
Rule of thumb
 
The rule of thumb is: we have a table with
certain property as row (alive/dead) and
certain property as column (class). We then
restrict ourselves to one particular column
or row. Say, “how does the 
survival
 % differ
for different classes?” This means that we
care only about survivors; thus, so we
condition to the fact that one survived.
 
Bar chart again
 
We express 
survivor percentages 
depending
on class
 
One more bar chart
 
And here is a side-by-side chart of survivors vs nonsurvivors
 
We (almost) see that the survival chance
DEPENDS 
on the class. If all conditional
distributions (conditioned to what?) were the
same, we would say that survival chances and
class are 
INDEPENDENT
 
Homework
 
Read chapter 2.
Work through
examples and
carefully read the
“what can go
wrong” section
Do p.33+: 1, 4, 5, 6,
17, 31, 34, 37bce,
41abd
Slide Note
Embed
Share

Explore methods for displaying and describing categorical data effectively, from frequency tables to bar and pie charts. Understand the importance of visual representation in drawing insights and making comparisons. Dive into examples using football team data and Titanic survivors. Learn to identify relationships between categorical properties and interpret results.

  • Data Analysis
  • Categorical Data
  • Visualization
  • Frequency Tables
  • Charts

Uploaded on Sep 08, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Lecture 2 Displaying and describing categorical data

  2. Make a picture Large tables are inconvenient: we see many many rows, but can not observe anything (see next slide)

  3. It has about 100 rows

  4. Make a picture In the previous table, what if we wanted to see proportion of freshmen/sophmores/juniors/seniors in the Commodores football team? We would have to draw a chart. Chart should make our eye immediately capture differences between proportions.

  5. A frequency table We first summarize the table we have into a shorter one Freshmen 34 25 30 14 Sophmores Juniors Seniors

  6. Freshmen 34 25 30 14 Sophmores Juniors Seniors This table is still a bit too hard. We can, of course, compare 4 numbers. But what if we had more rows? Say, ages 0 2, 2 4, 4 6, and so on. Or the numbers are large: compare 10123248 to 10123419.

  7. A bar chart Chart Title 40 35 30 25 20 15 10 5 0 FR SO JR SR

  8. A pie chart Count FR SO JR SR

  9. And many more! Just open MS Word and hit Insert chart Chart Title 40 Why this one is bad? 35 30 25 20 15 10 5 0 FR SO JR SR

  10. Exploring the relationship A single football player has two categorical properties : say, year of study and position? We want to know: are they related or independent ? I.e., if one is a senior, can we confidently say that, most probably, he is not a wide receiver?

  11. Lets switch to the book: Titanic survivors First Class 203 Second Class 118 Third Class 178 Crew Total Alive 212 711 Dead 122 167 528 673 1490 Total 325 285 706 885 2201 Let s identify the who sand the what s. Can we now say that someone from the first class had more chances to survive?

  12. First Class 203 Second Class 118 Third Class 178 Crew Total Alive 212 711 Dead 122 167 528 673 1490 Total 325 285 706 885 2201 The bad thing is that we see too much. We see that 203 1st class passengers survived versus 178 from the 3rd class. But then we look down and see 325 vs 706

  13. First Class 203 Second Class 118 Third Class 178 Crew Total Alive 212 711 Dead 122 167 528 673 1490 Total 325 285 706 885 2201 First Class 62% Second Class 41% Third Class 25% Crew Alive 24% Dead 38% 59% 75% 76% Instead of Alive + Total we now have only one number to compare

  14. Conditional distributions We can do, for example, this: how many alive passengers were in the first class? In the second class? And so on. Mathematically we ask: what is the proportion of survivors CONDITIONED to the fact that they are in the first class?

  15. We get the following table First Secon d 118 16.6% Third Crew Total 203 28.6% 178 25% 212 29.8% 711 First column reads: 203 out of 711 survivors were from the first class. Or: 28.6% of all survivors were from the first class

  16. Rule of thumb The rule of thumb is: we have a table with certain property as row (alive/dead) and certain property as column (class). We then restrict ourselves to one particular column or row. Say, how does the survival % differ for different classes? This means that we care only about survivors; thus, so we condition to the fact that one survived.

  17. Bar chart again We express survivor percentages depending on class 70 60 50 40 30 20 10 0 First Second Third Crew

  18. One more bar chart And here is a side-by-side chart of survivors vs nonsurvivors 76 75 62 59 41 38 25 24 FIRST SECOND THIRD CREW

  19. 76 75 62 59 41 38 25 24 FIRST SECOND THIRD CREW We (almost) see that the survival chance DEPENDS on the class. If all conditional distributions (conditioned to what?) were the same, we would say that survival chances and class are INDEPENDENT

  20. Homework Read chapter 2. Work through examples and carefully read the what can go wrong section Do p.33+: 1, 4, 5, 6, 17, 31, 34, 37bce, 41abd

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#