
Environmental Data Analysis with MATLAB or Python 3rd Edition Lecture 5
Explore the concepts of linear models, measurement error, Fourier series, hypothesis testing, and more in this comprehensive lecture series on environmental data analysis using MATLAB or Python.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Environmental Data Analysis with MATLAB or Python 3rdEdition Lecture 5
SYLLABUS Lecture 01 Lecture 02 Lecture 03 Lecture 04 Lecture 05 Lecture 06 Lecture 07 Lecture 08 Lecture 09 Lecture 10 Lecture 11 Lecture 12 Lecture 13 Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Intro; Using MTLAB or Python Looking At Data Probability and Measurement Error Multivariate Distributions Linear Models The Principle of Least Squares Prior Information Solving Generalized Least Squares Problems Fourier Series Complex Fourier Series Lessons Learned from the Fourier Transform Power Spectra Filter Theory Applications of Filters Factor Analysis and Cluster Analysis Empirical Orthogonal functions and Clusters Covariance and Autocorrelation Cross-correlation Smoothing, Correlation and Spectra Coherence; Tapering and Spectral Analysis Interpolation and Gaussian Process Regression Linear Approximations and Non Linear Least Squares Adaptable Approximations with Neural Networks Hypothesis testing Hypothesis Testing continued; F-Tests Confidence Limits of Spectra, Bootstraps
Goals of the lecture develop and apply the concept of a Linear Model
data, d d what we measure quantitative model links model parameters to data model parameters, m m what we want to know
data, d d carats, color, clarity Photo credit: Wikipedia Commons quantitative model economic model for diamonds model parameters, m m dollar value, celebrity value
N = number of observations, d d M = number of model parameters, m m usually (but not always) N>M many data, a few model parameters
The matrix G G is called the data kernel it embodies the quantitative model the relationship between the data and the model parameters
because of observational noise no m m can exactly satisfy this equation it can only be satisfied approximately d d Gm Gm
data, d dpre prediction of data pre evaluate equation quantitative model model parameters, m mest estimate of model parameters est
data, d dobs observation of data obs solve equation quantitative model model parameters, m mest estimate of model parameters est
because of observational noise m mest m mtrue the estimated model parameters differ from the true model parameters and d dpre d dobs the predicted data differ from the observed data
interpretion of xi the model is only linear when the xi s are neither data nor model parameters we will call them auxiliary variables they are assumed to be exactly known they specify the geometry of the experiment
MATLAB script for G G in straight line case M=2; G=zeros(N,M); G(:,1)=1; G(:,2)=x;
Python script for G G in straight line case M=2; G = np.zeros((N,M)); G[0:N,0:1] = np.ones((N,1)); G[0:N,1:2] = x;
MATLAB script for G G in quadratic case M=3; G=zeros(N,M); G(:,1)=1; G(:,2)=x; G(:,3)=x.^2;
Python script for G G in quadratic case M=3; G = np.zeros((N,M)); G[0:N,0:1] = np.ones((N,1)); G[0:N,1:2] = x; G[0:N,2:3] = np.power(x,2);
fitting a sum of cosines and sines (Fourier series)
grey-scale images of data kernels B) Fourier series 1 A) Polynomial 1 1 M M j j 1 N N i i
any data kernel can be thought of as a concatenation of its columns G c(1) c(2) c(3) c(4) c(M) M 1 1 N i
thought of this way, the equation d d=Gm Gm means
sometimes, models do represent literal mixing but more often the mixing is more abstract
any data kernel also can be thought of as a concatenation of its rows
thought of this way, the equation d d=Gm data is a weighted average of the model parameters Gm means for example, if weighted average
sometimes the model represents literal averaging data kernels for running averages A) three points B) five points C) seven points M M M 1 1 1 j j j 1 1 1 N N N i i i but more often the averaging is more abstract
MATLAB script data kernel for a running-average w = [1, 2, 1]'; % three point weighted average La = length(w); % length of w Lw = floor(La/2)+1; % position middle weight n = sum(w); % sum of weights w = w/n; % normalized weights wf = flipud(w); % weights, flipped upside-down c = zeros(N,1); % initialize left column r = zeros(M,1); % initialize top row c(1:Lw)=wf(Lw:La); % copy weights to left column r(1:Lw)=w(Lw:La); % copy weight to top row G = toeplitz(c,r); % create data kernel
Python script data kernel for a running-average w = eda_cvec( 1, 2, 1 ); # unnormalized weights La, i = np.shape(w); # length of average Lw = floor(La/2); # position of central weight n = np.sum(w); # sum of weights w = w/n; # normalized weights wf = np.flipud(w); # weights in reversed order c = np.zeros((N,1)); # initialize left column of G r = np.zeros((M,1)); # initialize top row of G c[0:Lw+1,0:1]=wf[Lw:La,0:1]; # copy weights to c r[0:Lw+1,0:1]=w[Lw:La,0:1]; # copy weights to r G = la.toeplitz(c,r.T); # data kernel G is toeplitz
averaging doesnt have to be symmetric with this data kernel, each di is a weighted average of mj, with i j, that is, just past and present model parameters.
the prediction error error vector, e e
prediction error in straight line case plot of linedata01.txt 15 dipre 10 5 ei diobs data, d 0 d -5 -10 -15 -6 -4 -2 0 x 2 4 6 auxiliary variable, x
total error single number summarizing the error sum of squares of individual errors
principle of least-squares that minimizes
MATLAB script for total error dpre = G*mest; e=dobs-dpre; E = e'*e;
Python script for total error dpre = np.matmul(G,mest); e = dobs-dpre; E = np.matmul(e.T,e);
grid search strategy for finding the m m that minimizes E(m m) try lots of combinations of (m1, m2, ) a grid of combinations pick the combination with the smallest E as m mest.
m2est 4 0 m2 0 point of minimum error, Emin m1est region of low error, E 4 m1
the best m m is at the point of minimum E choose that as m mest but, actually, any m m in the region of low E is almost as good as m mest. especially since E is effected by measurement error if the experiment was repeated, the results would be slightly different, anyway
the shape of the region of low error is related to the covariance of the estimated model parameters (more on this in the next lecture)
think about error surfaces leads to important insights but actually calculating an error surface with a grid search so as to locate mest is not very practical in the next lecture we will develop a solution to the least squares problem that doesn t require a grid search