Applied statistics

Slide Note

Delve into the fundamental principles of applied statistics, covering topics such as expected value, variance of random variables, correlation coefficients, and Pearson correlation. Explore the significance of correlations between variables and the interpretation of correlation values. Learn about statistical inference, P-values, Bayesian decision theory, and Gaussian distributions. Understand the concept of p-values and their role in hypothesis testing using examples from Gaussian distributions.

yoon701 Follow

Uploaded on Mar 08, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Applied statistics Usman Roshan

A few basic stats Expected value of a random variable example of Bernoulli and Binomal Variance of a random variable example of Bernoulli and Binomial Correlation coefficient (same as Pearson correlation coefficient) Formulas: Covariance(X,Y) = E((X- X)(Y- Y)) Correlation(X,Y)= Covariance(X,Y)/ X Y Pearson correlation

Correlation between variables Measures the correlation between two variables The correlation r is between -1 and 1. A value of 1 means perfect positive correlation and -1 in the other direction The function f(r) has a t-distribution with n-2 df that can be used to obtain a p-value n-2 1-r2 f(r)= r

Pearson correlation coefficient From Wikipedia

Basic stats in Python Mean and variance calculation Define list and compute mean and variance Correlations Define two lists and compute correlation

Statistical inference P-value Bayes rule Posterior probability and likelihood Bayesian decision theory Bayesian inference under Gaussian distribution Chi-square test, Pearson correlation coefficient, t-test

P-values What is a p-value? It is the probability of your estimate assuming the data is coming from some null distribution For example if your estimate of mean is 1 and the true mean is 0 and is normally distributed what is the p-value of your estimate? It is the area under curve of the normal distribution for all values of mean that are at least your estimate A small p-value means the probability that the data came from the null distribution is small and thus the null distribution could be rejected. A large p-value supports the null distribution but may also support other distributions

P-values from Gaussian distributions -(x-m1)2 2s1 1 2 P(x|C1)= e 2ps1 Courtesy of Wikipedia

P-values from chi-square distributions Courtesy of Wikipedia

Type 1 and type 2 errors Courtesy of WIkipedia

Bayes rule Fundamental to statistical inference Conditional probability Posterior = (Likelihood * Prior) / Normalization ( | ) ( ( ) P x ) ( | ( | P x M P M ) ( ) ( ) P x M P M P x M P M = = ( | ) P M x ) M

Hypothesis testing We can use Bayes rule to help make decisions An outcome or action is described by a model Given two models we pick the one with the higher probability Coin toss example: use likelihood to determine which coin generated the tosses

Likelihood example Consider a set of coin tosses produced by a coin with P(H)=p (P(T)=1-p) We are given some tosses (training data): HTHHHTHHHTHTH. Was the above sequence produced by a fair coin? What is the probability that a fair coin produced the above sequence of tosses? What is the p-value of your sequence of tosses assuming the coin is fair? This is the same as asking what is the probability that a fair coin generates 9 or more heads out of 13 heads. Let s start with exactly. Solve it with R. Was the above sequence more likely to be produced by a biased coin 1 (p=0.85) or a biased coin 2 (p=.75)? Solution: Calculate the likelihood (probability) of the data with each coin Alternatively we can ask which coin maximizes the likelihood?

Chi-square test Contingency table We have two random variables: Label (L): 0 or 1 Feature (F): Categorical Null hypothesis: the two variables are independent of each other (unrelated) Under independence P(L,F)= P(D)P(G) P(L=0) = (c1+c2)/n P(F=A) = (c1+c3)/n Expected values E(X1) = P(L=0)P(F=A)n We can calculate the chi-square statistic for a given feature and the probability that it is independent of the label (using the p-value). We look up chi-square value in distribution with degrees of freedom = (cols-1)*(rows-1) to get p- value Features with very small probabilities deviate significantly from the independence assumption and therefore considered important. Feature=A Feature=B Label=0 Observed=c1 Expected=X1 Observed=c2 Expected=X2 Label=1 Observed=c3 Expected=X3 Observed=c4 Expected=X4 (ei-ci)2 ei i c2=

Applied statistics

Download Presentation

Presentation Transcript

Related

More Related Content