Statistical Concepts Through Visualizations

H
OW
 T
ALL
 A
RE
 
YOU
?
(IN CENTIMETERS)
IF
 
YOU
 
ARE
 
NOT
 
SURE
, 
HAVE
 
YOUR
 
PARTNER
 
MEASURE
YOU
!
undefined
2.1A
D
ESCRIBING
 L
OCATION
 
IN
 
A
D
ISTRIBUTION
Using percentiles
Making an Ogive
Calculating and Interpreting
a 
z
-score
What is a
percentile?
Wins in Major League Baseball
The stemplot below shows the number of wins
for each of the 30 Major League Baseball teams
in 2009.
Key
:  5|9 represents a
team with 59 wins.
  5  9
  6  2455 
  7  00455589
  8  0345667778
  9  123557
10  3
Key
:  5|9 represents a
team with 59 wins.
Calculate and interpret the percentiles for the
Colorado Rockies (92 wins), the New York Yankees
(103 wins) and the Cleveland Indians (65 wins).
  5  9
  6  2455 
  7  00455589
  8  0345667778
  9  123557
10  3
Ogive?
Age of Senators in the 103
rd
Congress (n=100)
Age of Senators in the 103
rd
Congress (n=100)
Age of Representatives in the 103
rd
Congress (n=435)
 
 
State Median Household Incomes
Below is a cumulative relative frequency graph
showing the distribution of median household
incomes for the 50 states and the District of
Columbia.
 
 
State Median Household Incomes
Below is a cumulative relative frequency graph
showing the distribution of median household
incomes for the 50 states and the District of
Columbia.
 
 
a)
California, with a median household income of
$57,445, is at what percentile?  Interpret this
value.
 
 
b) What is the 25
th
 percentile for this distribution?
      What is another name for this value?
 
 
c)
Where is the graph the steepest?
      What does this indicate about the distribution?
Bellwork: 9/11/19
1)
Find the median for France
2)
Find the IQR for the
Philippines
z-score?
Macy, a 3-year-old female is 100 cm tall.
Brody, her 12-year-old brother is 158 cm
tall.  Obviously, Brody is taller than
Macy—but who is taller, relatively
speaking?  That is, relative to other kids
of the same ages, who is taller?
(According to the CDCP, the heights of
three-year-old females have a mean of
94.5 cm and a standard deviation of 4
cm.  The mean height for 12-year-olds
males is 149 cm with a standard
deviation of 8 cm.)
To make a fair comparison, we should see
how these performances rate relative to
others hitters during the same year.
Calculate the standardized score for each
player and compare.
In 2001, Arizona Diamondback
Mark Grace’s home run total had
a standardized score of 
z
 = –0.48.
Interpret this value and calculate
the number of home runs he hit.
undefined
2.1B
D
ESCRIBING
 L
OCATION
 
IN
 
A
D
ISTRIBUTION
TRANSFORM data
DEFINE and DESCRIBE density curves
 
Here is a dotplot of Kobe Bryant’s point totals for
each of the 82 games in the 2008-2009 regular
season.  The mean of this distribution is 26.8 with
a standard deviation of 8.6 points.
In what percentage of games did he score
within one standard deviation of his mean?
Within two standard deviations?
 
Here is a dotplot of Tim Lincecum’s strikeout totals
for each of the 32 games he pitched in during the
2009 regular season.  The mean of this distribution
is 8.2 with a standard deviation of 2.8
.
In what percentage of games were his strikeouts
within one standard deviation of his mean?
Within two standard deviations?
Adding the same number 
a
 (either positive,
zero, or negative) to each observation:
adds 
a
 to measures of center and location
(mean, median, quartiles, percentiles), but
Does not change the shape of the
distribution or measures of spread (range,
IQR
, standard deviation).
Multiplying (or dividing) each observation by the
same number 
b
 (positive, negative, or zero):
multiplies (divides) measures of center and
location by 
b
multiplies (divides) measures of spread by |
b
 | ,
but
does not change the shape of the distribution
In 2010, Taxi Cabs in New York City charged an
initial fee of $2.50 plus $2 per mile.
In equation form:  
fare
 = 2.50 + 2(
miles
).
At the end of a month a businessman collects all
of his taxi cab receipts and analyzed the
distribution of fares.  The distribution was
skewed to the right with a mean of $15.45 and a
standard deviation of $10.20.
a) 
What are the mean and standard
deviation of the lengths of his cab rides in
miles?
Density Curves
In Chapter 1, we developed a kit of
graphical and numerical tools for
describing distributions. Now, we’ll add
one more step to the strategy.
1.
Always plot your data: make a graph.
2.
Look for the overall pattern (shape, center,
and spread) and for striking departures
such as outliers.
3.
Calculate a numerical summary to briefly
describe center and spread.
4.
Sometimes the overall pattern of a large
number of observations is so regular that
we can describe it by a smooth curve.
Definition:
 
A 
density curve
 is a curve that
is always on or above the horizontal axis, and
has area exactly 1 underneath it.
 
A density curve describes the overall pattern
of a distribution. The area under the curve
and above any interval of values on the
horizontal axis is the proportion of all
observations that fall in that interval.
 
In this section, we learned that…
There are two ways of describing an individual’s location
within a distribution – the 
percentile
 and 
z
-score
.
A 
cumulative relative frequency graph
 allows us to
examine location within a distribution.
It is common to 
transform data
, especially when
changing units of measurement.  Transforming data can
affect the shape, center, and spread of a distribution.
We can sometimes describe the overall pattern of a
distribution by a 
density curve
 (an idealized description
of a distribution that smooths out the irregularities in the
actual data).
Slide Note
Embed
Share

Explore various statistical concepts such as percentiles, Ogive, z-score, and distribution using real-world examples like Major League Baseball wins, senator ages in the 103rd Congress, and state income distributions. Learn how to interpret data, make analyses, and derive insights through visual representations like stem plots, cumulative frequency graphs, and more.

  • Statistics
  • Data Analysis
  • Visualizations
  • Percentiles
  • Z-score

Uploaded on Sep 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. HOW TALL ARE YOU? (IN CENTIMETERS) IF YOU ARE NOT SURE, HAVE YOUR PARTNER MEASURE YOU!

  2. 2.1A DESCRIBING LOCATION IN A DISTRIBUTION Using percentiles Making an Ogive Calculating and Interpreting a z-score

  3. What is a percentile?

  4. Wins in Major League Baseball The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009. Key: 5|9 represents a team with 59 wins. 5 9 6 2455 7 00455589 8 0345667778 9 123557 10 3

  5. 5 9 6 2455 7 00455589 8 0345667778 9 123557 10 3 Key: 5|9 represents a team with 59 wins. Calculate and interpret the percentiles for the Colorado Rockies (92 wins), the New York Yankees (103 wins) and the Cleveland Indians (65 wins).

  6. Ogive?

  7. Age of Senators in the 103rd Congress (n=100) Freq Rel Freq Cumul. Rel. Freq 30 < 40 40 < 50 50 < 60 60 < 70 70 < 80 80 < 90 1 0.01 0.16 0.49 0.22 0.11 0.01 16 49 22 11 1

  8. Age of Senators in the 103rd Congress (n=100) Freq Rel Freq Cumul. Rel. Freq 0.01 0.17 0.66 0.88 0.99 1.00 30 < 40 40 < 50 50 < 60 60 < 70 70 < 80 80 < 90 1 0.01 0.16 0.49 0.22 0.11 0.01 16 49 22 11 1

  9. Age of Representatives in the 103rd Congress (n=435) Freq Rel Freq Cumul. Rel. Freq 30 < 40 40 < 50 50 < 60 60 < 70 70 < 80 80 < 90 47 153 131 89 12 3

  10. State Median Household Incomes Below is a cumulative relative frequency graph showing the distribution of median household incomes for the 50 states and the District of Columbia.

  11. State Median Household Incomes Below is a cumulative relative frequency graph showing the distribution of median household incomes for the 50 states and the District of Columbia.

  12. a) California, with a median household income of $57,445, is at what percentile? Interpret this value.

  13. b) What is the 25th percentile for this distribution? What is another name for this value?

  14. c) Where is the graph the steepest? What does this indicate about the distribution?

  15. Bellwork: 9/11/19 1) Find the median for France 2) Find the IQR for the Philippines

  16. z-score?

  17. Macy, a 3-year-old female is 100 cm tall. Brody, her 12-year-old brother is 158 cm tall. Obviously, Brody is taller than Macy but who is taller, relatively speaking? That is, relative to other kids of the same ages, who is taller? (According to the CDCP, the heights of three-year-old females have a mean of 94.5 cm and a standard deviation of 4 cm. The mean height for 12-year-olds males is 149 cm with a standard deviation of 8 cm.)

  18. Year 1927 1961 1998 2001 Player Babe Ruth Roger Maris Mark McGwire Barry Bonds HR 60 61 70 73 Mean 7.2 18.8 20.7 21.4 SD 9.7 13.4 12.7 13.2 To make a fair comparison, we should see how these performances rate relative to others hitters during the same year. Calculate the standardized score for each player and compare.

  19. Year 1927 1961 1998 2001 Player Babe Ruth Roger Maris Mark McGwire Barry Bonds HR 60 61 70 73 Mean 7.2 18.8 20.7 21.4 SD 9.7 13.4 12.7 13.2 In 2001, Arizona Diamondback Mark Grace s home run total had a standardized score of z = 0.48. Interpret this value and calculate the number of home runs he hit.

  20. 2.1B DESCRIBING LOCATIONINA DISTRIBUTION TRANSFORM data DEFINE and DESCRIBE density curves

  21. Dot Plot Kobe 2009 0 10 20 30 40 50 60 70 PTS Here is a dotplot of Kobe Bryant s point totals for each of the 82 games in the 2008-2009 regular season. The mean of this distribution is 26.8 with a standard deviation of 8.6 points. In what percentage of games did he score within one standard deviation of his mean? Within two standard deviations?

  22. Dot Plot individual_player_gamebygamelog 0 2 4 6 8 10 12 14 16 SO Here is a dotplot of Tim Lincecum s strikeout totals for each of the 32 games he pitched in during the 2009 regular season. The mean of this distribution is 8.2 with a standard deviation of 2.8. In what percentage of games were his strikeouts within one standard deviation of his mean? Within two standard deviations?

  23. Effect of Adding (or Subtracting) a Constant Adding the same number a (either positive, zero, or negative) to each observation: adds a to measures of center and location (mean, median, quartiles, percentiles), but Does not change the shape of the distribution or measures of spread (range, IQR, standard deviation).

  24. Effect of Multiplying (or Dividing) by a Constant Multiplying (or dividing) each observation by the same number b (positive, negative, or zero): multiplies (divides) measures of center and location by b multiplies (divides) measures of spread by |b | , but does not change the shape of the distribution

  25. In 2010, Taxi Cabs in New York City charged an initial fee of $2.50 plus $2 per mile. In equation form: fare = 2.50 + 2(miles). At the end of a month a businessman collects all of his taxi cab receipts and analyzed the distribution of fares. The distribution was skewed to the right with a mean of $15.45 and a standard deviation of $10.20.

  26. a) What are the mean and standard deviation of the lengths of his cab rides in miles?

  27. Density Curves In Chapter 1, we developed a kit of graphical and numerical tools for describing distributions. Now, we ll add one more step to the strategy.

  28. Exploring Quantitative Data 1. Always plot your data: make a graph. 2. Look for the overall pattern (shape, center, and spread) and for striking departures such as outliers. 3. Calculate a numerical summary to briefly describe center and spread. 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve.

  29. Definition: A density curve is a curve that is always on or above the horizontal axis, and has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval.

  30. In this section, we learned that There are two ways of describing an individual s location within a distribution the percentile and z-score. A cumulative relative frequency graph allows us to examine location within a distribution. It is common to transform data, especially when changing units of measurement. Transforming data can affect the shape, center, and spread of a distribution. We can sometimes describe the overall pattern of a distribution by a density curve (an idealized description of a distribution that smooths out the irregularities in the actual data).

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#