Understanding Chi-Square Test in Statistics

undefined
 
 
 
Karl Pearson introduced Chi-Square (
X
2)  
which
is a statistical test used to determine whether
your experimentally observed results are
consistent with your hypothesis.
Test statistics measure the agreement between
actual counts and expected counts assuming the
null hypothesis.  It is a non-parametric test.
The chi-square test of independence can be used
for any variable; the group (independent) and
the test variable (dependent) can be nominal,
dichotomous, ordinal, or grouped interval.
 
Introduction
Characteristics of the test
Chi-square distribution
Application of Chi square test
Calculation of the Chi square test
Condition for the application of the test
Example
Limitations of the test
 
Parametric test- The test in which the population
constants like mean, std. deviation, std error, correlation
coefficient, proportion etc. and data tend to follow one
assumed or established distribution such as normal,
binomial, poisson etc.
Non-parametric test- the test in which no constant of a
population is used. Data do not follow any specific
distribution and no assumption are made in these tests.
Eg. To classify goods, better and best, we just allocate
arbitrary numbers or marks to each category.
Hypothesis- It is a definite statement about the
population parameters.
 
H
0
- states that no association exists between
the two cross-tabulated variables in the
population and therefore the variables are
statistically independent e.g. If we wanna
compare 2 methods, A & B for its superiority
and if the population is that both methods
are equally good, then this assumption is
called as Null Hypothesis.
H
1
- Proposes that two variables are related
in the population. If we assume that from 2
methods A is superior than b method, then
this assumption is called as Alternative
Hypothesis
 
It denotes the extent of independence
(freedom) enjoyed by a given set of observed
frequencies. Suppose we are given set of
observed frequencies which are subjected to
k independent constant(restriction) then.
D.f.=(number of frequencies)-(number of
independent constraints on them)
D.f.=)r-1) (c-1)
 
1 or more categories
Independent observations
A sample size of at least 10
Random sampling
All observations must be used
For the test to be accurate, the expected
frequency should be at least 5
 
Implying cause rather than association
 
Overestimating the importance of a
finding, especially with large sample
sizes
 
Failure to recognize spurious
relationships
 
Nominal variables only (both IV and
DV)
 
A
 
c
h
i
-
s
q
u
a
r
e
 
a
n
a
l
y
s
i
s
 
i
s
 
n
o
t
 
u
s
e
d
 
t
o
 
p
r
o
v
e
 
a
h
y
p
o
t
h
e
s
i
s
;
 
i
t
 
c
a
n
,
 
h
o
w
e
v
e
r
,
 
r
e
f
u
t
e
 
o
n
e
.
A
s
 
t
h
e
 
c
h
i
-
s
q
u
a
r
e
 
v
a
l
u
e
 
i
n
c
r
e
a
s
e
s
,
 
t
h
e
p
r
o
b
a
b
i
l
i
t
y
 
t
h
a
t
 
t
h
e
 
e
x
p
e
r
i
m
e
n
t
a
l
 
o
u
t
c
o
m
e
 
c
o
u
l
d
o
c
c
u
r
 
b
y
 
r
a
n
d
o
m
 
c
h
a
n
c
e
 
d
e
c
r
e
a
s
e
s
.
T
h
e
 
r
e
s
u
l
t
s
 
o
f
 
a
 
c
h
i
-
s
q
u
a
r
e
 
a
n
a
l
y
s
i
s
 
t
e
l
l
 
y
o
u
:
W
h
e
t
h
e
r
 
t
h
e
 
d
i
f
f
e
r
e
n
c
e
 
b
e
t
w
e
e
n
 
w
h
a
t
 
y
o
u
o
b
s
e
r
v
e
 
a
n
d
 
t
h
e
 
l
e
v
e
l
 
o
f
 
d
i
f
f
e
r
e
n
c
e
 
i
s
 
d
u
e
 
t
o
s
a
m
p
l
i
n
g
 
e
r
r
o
r
.
T
h
e
 
g
r
e
a
t
e
r
 
t
h
e
 
d
e
v
i
a
t
i
o
n
 
o
f
 
w
h
a
t
 
w
e
 
o
b
s
e
r
v
e
 
t
o
w
h
a
t
 
w
e
 
w
o
u
l
d
 
e
x
p
e
c
t
 
b
y
 
c
h
a
n
c
e
,
 
t
h
e
 
g
r
e
a
t
e
r
t
h
e
 
p
r
o
b
a
b
i
l
i
t
y
 
t
h
a
t
 
t
h
e
 
d
i
f
f
e
r
e
n
c
e
 
i
s
 
N
O
T
 
d
u
e
 
t
o
c
h
a
n
c
e
.
 
Critical values for chi-square are found
on tables, sorted by degrees of freedom
and probability levels.  Be sure to use p
< 0.05.
I
f
 
y
o
u
r
 
c
a
l
c
u
l
a
t
e
d
 
c
h
i
-
s
q
u
a
r
e
 
v
a
l
u
e
 
i
s
g
r
e
a
t
e
r
 
t
h
a
n
 
t
h
e
 
c
r
i
t
i
c
a
l
 
v
a
l
u
e
c
a
l
c
u
l
a
t
e
d
,
 
y
o
u
r
e
j
e
c
t
 
t
h
e
 
n
u
l
l
h
y
p
o
t
h
e
s
i
s
.
I
f
 
y
o
u
r
 
c
h
i
-
s
q
u
a
r
e
 
v
a
l
u
e
 
i
s
 
l
e
s
s
 
t
h
a
n
 
t
h
e
c
r
i
t
i
c
a
l
 
v
a
l
u
e
,
 
y
o
u
f
a
i
l
 
t
o
 
r
e
j
e
c
t
 
t
h
e
n
u
l
l
 
h
y
p
o
t
h
e
s
i
s
 
To test the null hypothesis, compare the
frequencies which were observed with the
frequencies we 
expect
 to observe if the null
hypothesis is true
If the differences between the observed and
the expected are small, that supports the null
hypothesis
If the differences between the observed and
the expected are large, we will be inclined to
reject the null hypothesis
 
Normally requires sufficiently large sample size:
 
In general N > 20.
 
No one accepted cutoff – the general rules are
No cells with 
observed 
frequency = 0
No cells with the 
expected
 frequency < 5
Applying chi-square to very small samples
exposes the researcher to an unacceptable
rate of Type II errors.
 
 
Note: chi-square must be calculated on actual
count data, not substituting percentages, which
would have the effect of pretending the sample
size is 100.
 
Conceptually, the chi-square test of
independence statistic is computed by summing
the difference between the expected and
observed frequencies for each cell in the table
divided by the expected frequencies for the cell.
We identify the value and probability for this test
statistic from the SPSS statistical output.
If the probability of the test statistic is less than
or equal to the probability of the alpha error
rate, we reject the null hypothesis and conclude
that our data supports the research hypothesis.
We conclude that there is a relationship between
the variables.
If the probability of the test statistic is greater
than the probability of the alpha error rate, we
fail to reject the null hypothesis.  We conclude
that there is no relationship between the
variables, i.e. they are independent.
 
This test can be used in
 
1. Goodness of fit of distributions.
2.  Test of independence of attributes.
3.   Test of Homogeneity
 
1)
Make a hypothesis based on your basic question
2)
Determine the expected frequencies
3)
Create a table with observed frequencies, expected
frequencies, and chi-square values using the
formula:
(O-E)
2
E
4)
Find the degrees of freedom: (c-1)(r-1)
5)
Find the chi-square statistic in the Chi-Square
Distribution table
6)
If chi-square statistic > your calculated chi-square
value, you 
do not
 reject 
your null hypothesis and vice
versa.
 
H
O
:  Indian customers have no brand preference.
H
A
:   Indian customers have distinct brand preference.
 
χ
2  =  
Sum of all: 
(O-E)
2
                     E
Calculate degrees of freedom: (c-1)
(r-1)
 = 3-1 = 
2
 
Under a critical value of your choice (e.g. 
α
 
= 0.05 
or 95% confidence),
look up Chi-square statistic on a Chi-square distribution table.
 
χ
2
α=0.05
 = 5.991
 
Chi-square statistic:  
χ
2 
= 
5.991
 
             
Our calculated value: 
χ
2 
= 
1.90
*If chi-square statistic > your calculated value, then you 
do not
 reject
your null hypothesis. There is a significant difference that is not due to
chance.
5.991 > 1.90 
We 
do not
 reject
 our null hypothesis.
 
Test enables us to explain whether or not
two attributes are associated.
Eg. We may be interested in knowing
whether anew medicine is effective in
controlling fever or not, Chi square is useful.
We proceed with the H0 that the two
attributes viz. new medicine and control of
fever are independent which means that new
medicine is not effective in controlling fever.
X2( calculated) > X2 (tabulated) at a certain
level of significant for given degrees of
freedom, the H0 is rejected and can conclude
that new medicine is effective in controlling
fever.
 
This test can also be used to test whether
the occurrence of events follow
uniformity or not eg. The admission of
student in  University in all days of week
is uniform or not can be tested with the
help of X2.
X2(calculated) > X2 (tabulated), then H0-
rejected and can conclude that admission
of students in University is not uniform.
 
Q
MH
 = (n-1)r
2
 
r
2
 is the Pearson correlation coefficient (which
also measures the linear association between
row and column)
http://support.sas.com/documentation/cdl/en/procstat/6
3104/HTML/default/viewer.htm#procstat_freq_a00000006
59.htm
Tests alternative hypothesis that there is a
linear association between the row and
column variable
Follows a Chi-square distribution with 1
degree of freedom
 
The data is from a random sample.
This test is applied in a four fould tabel, will
not give a reliabel result with one degree of
freedom if the expected value in any cell is
less than 5.
In contingency table larger than 2x2. Yate’s
correction can not be applied.
Only absolute value of original data should be
used for the test.
P & Ab. Of association does not measure the
strength of association.
Does not indicate cause and effect.
Slide Note
Embed
Share

Karl Pearson introduced the Chi-Square (X2) test for statistical analysis to determine experimental consistency with hypotheses. The test measures the agreement between actual and expected counts under the null hypothesis, making it a non-parametric test. It can be applied to various types of variables and is based on observed versus expected results. This test is used to determine the association between variables and requires certain conditions for accurate results. Learn about parametric and non-parametric tests, hypothesis statements, degrees of freedom, and more in this comprehensive guide.


Uploaded on Jul 26, 2024 | 6 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Karl Pearson introduced Chi-Square (X2)which is a statistical test used to determine whether your experimentally consistent with your hypothesis. Test statistics measure the agreement between actual counts and expected counts assuming the null hypothesis. It is a non-parametric test. The chi-square test of independence can be used for any variable; the group (independent) and the test variable (dependent) can be nominal, dichotomous, ordinal, or grouped interval. observed results are

  2. Introduction Characteristics of the test Chi-square distribution Application of Chi square test Calculation of the Chi square test Condition for the application of the test Example Limitations of the test

  3. Parametric test- The test in which the population constants like mean, std. deviation, std error, correlation coefficient, proportion etc. and data tend to follow one assumed or established distribution such as normal, binomial, poisson etc. Non-parametric test- the test in which no constant of a population is used. Data do not follow any specific distribution and no assumption are made in these tests. Eg. To classify goods, better and best, we just allocate arbitrary numbers or marks to each category. Hypothesis- It is a definite statement about the population parameters.

  4. H0- states that no association exists between the two cross-tabulated variables in the population and therefore the variables are statistically independent e.g. If we wanna compare 2 methods, A & B for its superiority and if the population is that both methods are equally good, then this assumption is called as Null Hypothesis. H1- Proposes that two variables are related in the population. If we assume that from 2 methods A is superior than b method, then this Hypothesis assumption is called as Alternative

  5. It denotes the extent of independence (freedom) enjoyed by a given set of observed frequencies. Suppose we are given set of observed frequencies which are subjected to k independent constant(restriction) then. D.f.=(number of frequencies)-(number of independent constraints on them) D.f.=)r-1) (c-1)

  6. 1 or more categories Independent observations A sample size of at least 10 Random sampling All observations must be used For the test to be accurate, the expected frequency should be at least 5

  7. Implying cause rather than association Overestimating the importance of a finding, especially with large sample sizes Failure to recognize spurious relationships Nominal variables only (both IV and DV)

  8. A chi-square analysis is not used to prove a hypothesis; it can, however, refute one. As the chi-square value increases, the probability that the experimental outcome could occur by random chance decreases. The results of a chi-square analysis tell you: Whether the difference between what you observe and the level of difference is due to sampling error. The greater the deviation of what we observe to what we would expect by chance, the greater the probability that the difference is NOT due to chance.

  9. Critical values for chi on tables, sorted by degrees of freedom and probability levels. Be sure to use p < 0.05. If your calculated chi greater than the critical value calculated, you hypothesis. If your chi critical value, you null hypothesis Critical values for chi- -square are found on tables, sorted by degrees of freedom and probability levels. Be sure to use p < 0.05. If your calculated chi- -square value is greater than the critical value calculated, you hypothesis. If your chi- -square value is less than the critical value, you null hypothesis square are found square value is reject the null square value is less than the fail to reject reject the null fail to reject the the

  10. To test the null hypothesis, compare the frequencies which were observed with the frequencies we expect to observe if the null hypothesis is true If the differences between the observed and the expected are small, that supports the null hypothesis If the differences between the observed and the expected are large, we will be inclined to reject the null hypothesis

  11. Normally requires sufficiently large sample size: In general N > 20. No one accepted cutoff the general rules are No cells with observed frequency = 0 No cells with the expected frequency < 5 Applying chi-square to very small samples exposes the researcher to an unacceptable rate of Type II errors. Note: chi-square must be calculated on actual count data, not substituting percentages, which would have the effect of pretending the sample size is 100.

  12. Conceptually, independence the observed divided We statistic If If the or rate, that We the If If the than fail that variables, Conceptually, independence statistic the observed frequencies divided by We identify statistic from the probability or equal rate, we that our We conclude the variables the probability than the fail to that variables, i i. .e e. . they the between expected frequencies the value from the probability of equal to we reject our data conclude that variables. . probability of the probability to reject there they are chi computed by the for each frequencies for value and the SPSS of the to the reject the data supports that there of the probability of reject the there are independent chi- -square the each cell and probability SPSS statistical the test the probability the null supports the there is is a a relationship the test of the the null is is independent. . test summing in the for the for this output. . statistic is is less of the null hypothesis the research relationship between test statistic the alpha null hypothesis no the between square test of of statistic is is computed difference frequencies for by the identify the by summing expected cell in the cell probability for statistical output test statistic probability of hypothesis and research hypothesis difference expected and table cell. . test than error and the table this test less than alpha error and conclude hypothesis. . between statistic is is greater alpha error hypothesis. . relationship the expected the alpha conclude greater rate, we conclude error rate, We between we the We conclude between no relationship the

  13. This test can be used in 1. Goodness of fit of distributions. 2. Test of independence of attributes. 3. Test of Homogeneity

  14. Make a hypothesis based on your basic question Determine the expected frequencies Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E)2 E Find the degrees of freedom: (c-1)(r-1) Find the chi-square statistic in the Chi-Square Distribution table If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa. 1) 2) 3) 4) 5) 6)

  15. HO: Indian customers have no brand preference. HA: Indian customers have distinct brand preference. Brand A A 25 20 5 1.25 Brand Brand B B 18 20 -2 0.2 Brand Brand C C 17 20 -3 0.45 Brand Total Total Observed Expected O-E (O-E)2 E 60 60 0 2 = 1.90 2 = Sum of all: (O-E)2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 2 Under a critical value of your choice (e.g. = 0.05 look up Chi-square statistic on a Chi-square distribution table. = 0.05 or 95% confidence),

  16. 2 =0.05 = 5.991

  17. Brand A A 25 20 5 1.25 Brand Brand B B 18 20 -2 0.2 Brand Brand C C 17 20 -3 0.45 Brand Total Total Observed Expected O-E (O-E)2 E 60 60 0 2 = 1.90 Chi-square statistic: 2 = 5.991 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5.991 > 1.90 5.991 Our calculated value: 2 = 1.90 1.90 We do not reject our null hypothesis.

  18. Test enables us to explain whether or not two attributes are associated. Eg. We may be interested in knowing whether anew medicine is effective in controlling fever or not, Chi square is useful. We proceed with the H0 that the two attributes viz. new medicine and control of fever are independent which means that new medicine is not effective in controlling fever. X2( calculated) > X2 (tabulated) at a certain level of significant for given degrees of freedom, the H0 is rejected and can conclude that new medicine is effective in controlling fever.

  19. This test can also be used to test whether the occurrence of events follow uniformity or not eg. The admission of student in University in all days of week is uniform or not can be tested with the help of X2. X2(calculated) > X2 (tabulated), then H0- rejected and can conclude that admission of students in University is not uniform.

  20. Head Head Tail Tail Expected Observed 25 28 25 22 (O-E)2/E 9/25 0/25 0.72< 3.841(Table value)

  21. QMH = (n-1)r2 r2 is the Pearson correlation coefficient (which also measures the linear association between row and column) http://support.sas.com/documentation/cdl/en/procstat/6 3104/HTML/default/viewer.htm#procstat_freq_a00000006 59.htm Tests alternative hypothesis that there is a linear association between the row and column variable Follows a Chi-square distribution with 1 degree of freedom

  22. The data is from a random sample. This test is applied in a four fould tabel, will not give a reliabel result with one degree of freedom if the expected value in any cell is less than 5. In contingency table larger than 2x2. Yate s correction can not be applied. Only absolute value of original data should be used for the test. P & Ab. Of association does not measure the strength of association. Does not indicate cause and effect.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#