Exploring Algorithm Performance in Data Set 1 with LDA, CART, and K-Means

Slide Note
Embed
Share

Utilizing Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), and K-Means algorithms on Data Set 1. CART training involved tuning the number of leaves for optimal performance, while LDA explored covariance variations and discriminant types. The K-Means method was applied unsupervised to cluster unlabeled data. Comparison revealed Quadratic Discriminant Analysis with the lowest error rate. CART required additional tuning compared to LDA, yielding slightly lower performance.


Uploaded on Oct 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Final Project Cedric Destin

  2. Data Set 1 Used three algorithms 2 supervised Linear Discriminant Analysis (LDA) Classification and Regression Trees (CART) 1 unsupervised K Means

  3. CART Training Cross-validate cvLoss ClassificationTree.fit Found best # of leaves

  4. CART Training (Observation) Two methods for tuning Vary the number of leaves (Purity) This is to reduce the entropy, where splitting at a node will yield better uncertainty = ( ) ( ) log( ( )) i N P P j j j Prune the tree Avoid generalization Validation (resubLoss) Cross-validation (cvLoss)

  5. CART Training (Evaluation) Number of leaves: 1 Pruning Level Ideal = 6:13 levels p(error)=0.5303

  6. CART Conclusion Used 6 pruning levels Trained on 528 data points Splitting criterion GDI Measures how frequent an event occurs

  7. LDA Training Cross-validate cvLoss ClassificationDiscriminant Quadratic/ Linear Varying the covariance Gamma, Delta

  8. LDA (Observation) Tested if the covariance are Linear or Quadratic Did not need to change Gamma or Delta Uniform prior Linear discriminant Quadratic discriminant 0.3163 0.107

  9. LDA Conclusion Quadratic discriminant Error=0.504 Linear discriminant Error=0.5646

  10. K-Means How to train? Unsupervised Preparing the data PCA Procedure Iterated 10 times Initial cluster Calculated 1st k iterations Problem: data is unlabeled

  11. Conclusion Data Set 1 Quadratic Discriminant AnalysisError=0.504 CART Error=0.5303 K-Means Error=??? This seems to give better results that CART, I think that observing the classes in terms of their covariance made it perform slightly better CART required a little more tuning than QAD. I was kind of expecting it to perform slightly better, since it is trying to minizmie the uncertainty This technic worked great, but I was not able to specify my centroid and label them at first.

  12. Data Set: Playing Around with KNN With basic training and no tuning Error = 0.4406

  13. Data Set 2 Temporal data Technic: Hidden Markof Models Training hmmtrain Initial transit and emit matrices calculated Decoding Used the estimate of the hmmtrain for the Viterbi Decoder

  14. Conclusion Data Set 2 Hidden Markof Model Error=??? This process worked until the Viterbi Decoder

  15. Question

Related