Foundations of Parameter Estimation and Decision Theory in Machine Learning
Explore the foundations of parameter estimation and decision theory in machine learning through topics such as frequentist estimation, properties of estimators, Bayesian parameter estimation, and maximum likelihood estimator. Understand concepts like consistency, bias-variance trade-off, and the Bayesian approach to parameter estimation.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Parameter Estimation and Decision Theory Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Example Observe whether the sky is cloudy or not cloudy on n successive days Predict whether the sky will be cloudy on the n+1th day Step 1: Parameter estimation Model as a random variable with a known distribution but unknown parameter Guess the unknown parameter Step 2: Decision making Use guess about unknown parameter to find probability of event of interest Decide based on the probability
Frequentist Estimation Problem Problem: find the true value of a parameter based on data sample Estimator: function from sample space to parameter space Estimate: specific point in sample space. Loss: measure of error wrt true value of parameter
Properties of Estimators Consistency Whether true value is recovered for infinite sample size Bias: Expected deviation of estimate from true value Variance Mean squared error Bias variance trade-off Properties of Maximum Likelihood Estimator Asymptotically Unbiased Consistent Smallest variance among unbiased estimators
Bayesian Parameter Estimation Model parameter as a random variable Prior distribution ? Parameter estimation problem Find posterior distribution ? ? of given observed data ? ? ?(?| ) ? ? ? ? ? ? = Likelihood: ? = ?(?| )
Bayesian Parameter Estimation Point estimation Maximum likelihood estimator ??= argmax ? = argmax ?(?| ) Maximum a posterior estimator ???= argmax ?( |D)= argmax ? ?(?| ) Bayesian estimator ?????= ? = ?( |?)?
Maximum Likelihood Estimator: Illustration Given seq. of coin tosses, guess probability of H Model ? ??? ???(?) Likelihood ? ? = ?(?1,?2, ;?) Loglikelihood ? ? log? ? = ? ??;? = ??log? + ??log(1 ?) ? Where ?? is #heads and ?? is #tails in N tosses Maximum Likelihood Estimate ?(?) =?? ???= argmax ? p
MAP Estimator: Illustration Model p as random variable with a prior distribution ? ???? ?,? ;? ? ?? 11 ?? 1 Formulate posterior distribution ? ??;? = ??+ ?? 11 ??+ ?? 1 ? ? ? ? ? ? Maximum A Posteriori Estimate ??+ ? 1 ? + ? + ? 2 ????= argmax ?(?|?) = p
Bayes Estimator: Illustration Model p as random variable with a prior distribution ? ???? ?,? ;? ? ?? 11 ?? 1 Formulate posterior distribution ? ??;? = ??+ ?? 11 ??+ ?? 1 ? ? ? ? ? ? = ????(? + ??,? + ??) Bayes Estimate ??+ ? ? + ? + ? ??= ?[?|?1, ,??] =
Bayes Estimator: Analysis ??= ? ? ?1, ,?? = ? + ? + ? = ?+?+? = ?+?+?? ? + ??+ ? ?+? ?? ?+?+ ? ?? ? ?+?+? ? ?+?+? p?? ?+? Weighted average of prior mean and MLE Weight of MLE proportional to no of observations
Role of priors Uniform prior vs Beta prior With Uniform prior ? ? 1 ? ??;? = ???+11 ???+1 ? ? ? ? ? ? ?(?|?) =??+ 1 ????= argmax ? + 2 p
Decision Theory Choose a specific point estimate under uncertainty Loss functions measure extent of error Choice of estimate depends on loss function
Loss functions 0 1 loss ? ?,? = ? ? ? =0 ?? ? = ? Minimized by MAP estimate (posterior mode) 1 ?? ? ? ?2 ???? ? ?,? = ? ?2 Expected loss: ?[ ? ?2|?] (Min mean squared error) Minimized by Bayes estimate (posterior mean) ?1loss ? ?,? = ? ? Minimized by posterior median
Predictive distribution Find the probability of the outcome of the n+1th experiment given outcomes of previous n experiments ? ??+1?1, ,?? Frequentist Construct point estimate of parameter from n outcomes ? ??+1?1, ,?? ?(??+1; ) Bayesian Consider the entire posterior distribution of ? ??+1?1, ,?? = ? ? ? ?1, ,???
Summary Parameter estimation problem Frequentist vs Bayesian MLE, MAP and Bayes estimators for Ber trials Optimal estimators for different loss functions Prediction using estimated parameters