Predicting Number of Crew Members on Cruise Ships Using Regression Model

Slide Note
Embed
Share

This analysis involves building a regression model to predict the number of crew members on cruise ships. The dataset includes information on 158 cruise ships with potential predictor variables such as age, tonnage, passengers, length, cabins, and passenger density. The full model with 6 predictors and 7 parameters shows a high R-square value of 0.9245. Backward elimination based on AIC was performed to optimize the model by removing certain predictors. The final model highlights key variables influencing crew size.


Uploaded on Oct 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Regression Model Building Predicting Number of Crew Members of Cruise Ships

  2. Data Description n=158 Cruise Ships Dependent Variable Crew Size (100s) Potential Predictor Variables Age (2013 Year Built) Tonnage (1000s of Tons) Passengers (100s) Length (100s of feet) Cabins (100s) Passenger Density (Passengers/Space)

  3. Data First 20 Cases Ship Journey Quest Celebration Conquest Destiny Ecstasy Elation Fantasy Fascination Freedom Glory Holiday Imagination Inspiration Legend Liberty* Miracle Paradise Pride Sensation Cruise Line Azamara Azamara Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Carnival Age Tonnage Pssngrs 30.277 30.277 47.262 110 101.353 70.367 70.367 70.367 70.367 110.239 110 46.052 70.367 70.367 86 110 88.5 70.367 88.5 70.367 Length Cabins PassDens Crew 42.64 42.64 31.8 36.99 38.36 34.29 34.29 34.23 34.29 29.79 36.99 31.72 34.29 34.29 40.49 36.99 41.67 34.29 41.67 34.29 6 6 6.94 6.94 14.86 29.74 26.42 20.52 20.52 20.56 20.52 5.94 5.94 7.22 9.53 8.92 8.55 8.55 8.55 8.55 9.51 9.51 7.27 8.55 8.55 9.63 9.51 9.63 8.55 9.63 8.55 3.55 3.55 7.43 14.88 13.21 10.2 10.2 10.22 10.2 14.87 14.87 7.26 10.2 10.2 10.62 14.87 10.62 10.2 11.62 10.2 3.55 3.55 6.7 19.1 10 9.2 9.2 9.2 9.2 11.5 11.6 6.6 9.2 9.2 9.3 11.6 10.3 9.2 9.3 9.2 26 11 17 22 15 23 19 6 10 28 18 17 11 8 9 15 12 20 37 29.74 14.52 20.52 20.52 21.24 29.74 21.24 20.52 21.24 20.52

  4. Full Model (6 Predictors, 7 Parameters, n=158) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.9615 0.9245 0.9215 0.9819 158 ANOVA df SS 1781.5 145.6 1927.1 MS 296.9 F 308.0 Significance F 0.0000 Regression Residual Total 6 151 157 1.0 Coefficients -0.52134 -0.01254 0.01324 -0.14976 0.40348 0.80163 -0.00066 Standard Errort Stat 1.05703 0.01420 0.01189 0.04759 0.11445 0.08922 0.01581 P-value Lower 95%Upper 95% 0.6226 -2.610 0.3783 -0.041 0.2673 -0.010 0.0020 -0.244 0.0006 0.177 0.0000 0.625 0.9669 -0.032 Intercept Age Tonnage Pssngrs Length Cabins PassDens -0.493 -0.884 1.113 -3.147 3.525 8.985 -0.042 1.567 0.016 0.037 -0.056 0.630 0.978 0.031

  5. Backward Elimination Model Based AIC (minimize) ( ) Model n SSE ( ) ( ) = + 2 parms(Model) + Model ln constant AIC n 145.57 158 = + 2(7) 1.055 = Full Model (7 Parms, constant=0) 158 ln AIC 145.57 158 146.39 158 = + = = + = Round 2: 158 ln 2(6) 0.943 Round 3: 158 ln 2(5) 2.062 AIC AIC FullMod FullMod -passdens -age -tonnage <none> <none> -passengers -length -cabins Df SS RSS AIC Round2 - age - age <none> <none> - tonnage - tonnage - length - length - passengers - passengers - cabins - cabins 1 1 1 0.002 0.753 1.195 145.57 146.32 146.77 145.57 155.12 157.55 223.39 -0.943 -0.13 0.347 1.055 9.092 11.551 66.721 1 0.815 146.39 145.57 147.58 157.64 159.6 225.13 -2.062 -0.943 -0.78 9.641 11.591 65.944 1 1 1 1 2.007 12.069 14.027 79.556 1 1 1 9.548 11.98 77.821 Round3 <none> <none> - tonnage - tonnage - length - length - passengers - passengers --cabins --cabins 146.39 150.25 158.13 160.66 225.25 -2.062 0.056 8.126 10.64 64.028 1 1 1 1 3.866 11.739 14.275 78.861

  6. Forward Selection (AIC Based) 1927.08 158 SS 1 1 1 1 1 ( ) = = + = TOTAL SS 1927.08 Null Model 158 ln 2(1) 397.18 AIC Null Model + cabins + cabins + tonnage + tonnage + passengers + passengers + length + length + age + age + passdens + passdens <none> <none> Df SS RSS AIC Round2 + length + length + passdens + passdens + tonnage + tonnage + passengers + passengers + age + age <none> <none> Df RSS AIC 1 1 1 1 1 1 1742.21 1658.03 1614.23 1546.6 542.66 46.6 184.88 269.05 312.86 380.49 1384.42 1880.48 1927.08 28.82 88.1 111.94 142.86 346.93 395.32 397.18 22.9636 14.9541 12.5135 7.0656 5.4442 161.91 169.92 172.36 177.81 179.43 184.88 9.8661 17.4948 19.748 24.6647 26.0989 28.8215 Round3 Round3 + passengers + passengers + passdens + passdens <none> <none> + age + age + tonnage + tonnage Df SS RSS AIC Round4 + tonnage + tonnage + age + age + passdens + passdens <none> <none> Df SS RSS AIC 1 1 11.6609 6.3732 150.25 155.54 161.91 159.94 160.66 0.0565 5.5212 9.8661 9.9317 10.6402 1 1 1 3.8656 2.6733 2.5635 146.39 -2.06164 147.58 -0.77996 147.69 -0.66241 150.25 1 1 1.9702 1.2514 0.0565 Round5 Round5 <none> <none> + age + age + passdens + passdens Df SS RSS AIC 146.39 -2.06164 145.57 -0.94339 146.32 -0.13037 1 1 0.81467 0.06366

  7. Stepwise Regression (AIC Based) Null Model + cabins + cabins + tonnage + tonnage + passengers + passengers + length + length + age + age + passdens + passdens <none> <none> Df SS RSS AIC Round2 + length + length + passdens + passdens + tonnage + tonnage + passengers + passengers + age + age <none> <none> - cabins - cabins Df SS RSS AIC 1 1 1 1 1 1 1742.21 1658.03 1614.23 1546.6 542.66 46.6 184.88 269.05 312.86 380.49 1384.42 1880.48 1927.08 28.82 88.1 111.94 142.86 346.93 395.32 397.18 1 1 1 1 1 22.96 14.95 12.51 7.07 5.44 161.91 169.92 172.36 177.81 179.43 184.88 1927.08 9.87 17.49 19.75 24.66 26.1 28.82 397.18 1 1742.21 Round3 Round3 + passengers + passengers + passdens + passdens <none> <none> + age + age + tonnage + tonnage - length - length - cabins - cabins Round4 Round4 + tonnage + tonnage + age + age + passdens + passdens <none> <none> - passengers - passengers - length - length - cabins - cabins Df SS RSS AIC Df SS RSS AIC 1 1 11.661 6.373 150.25 155.54 161.91 159.94 160.66 184.88 380.49 0.056 5.521 9.866 9.932 10.64 28.821 142.859 1 1 1 3.866 2.673 2.563 146.39 147.58 147.69 150.25 161.91 177.81 246.03 -2.062 -0.78 -0.662 0.056 9.866 24.665 75.974 1 1 1 1 1.97 1.251 22.964 218.571 1 1 1 11.661 27.559 95.781 Round5 Round5 <none> <none> + age + age + passdens + passdens - tonnage - tonnage - length - length - passengers - passengers - cabins - cabins 146.39 145.57 146.32 150.25 158.13 160.66 225.25 -2.062 -0.943 -0.13 0.056 8.126 10.64 64.028 1 1 1 1 1 1 0.815 0.064 3.866 11.739 14.275 78.861

  8. Summary of Automated Models Backward Elimination Drop Passenger Density (AIC drops from 1.055 to -0.943) Drop Age (AIC drops from -0.943 to -2.062) Stop: Keep Tonnage, Passengers, Length, Cabins Forward Selection Add Cabins (AIC drops from 397.18 to 28.82) Add Length (AIC drops from 28.82 to 9.8661) Add Passengers (AIC drops from 9.8661 to -0.0565) Add Tonnage (AIC drops from -0.0565 to -2.06) Stop: Keep Tonnage, Passengers, Length, Cabins Stepwise Same as Forward Selection

  9. All Possible (Subset) Regressions ' Number of parameters (including intercept) in Model p ( ) ( ) Regression(Model) Total SS Residual(Model) Total SS SS SS ( ) = = 2 Model 1 Goal:Maximize within reason R ( ) ( ) ( ) Residual(Model) Total SS SS 1 p n ( ) = 2 Adj- Model 1 Goal:Maximize R ( ) ' n ( ) Residual(Model) s SS ( ) ( ) = + = 2 Model 2 ' p Goal: ' where Residual(Full Model) C n C p s MS p p 2 ( ) Residual(Model) n SS ( ) ( ) = + + Model ln ln( ) ' constant Goal:Minimize BIC n n p

  10. All Possible (Subset) Regressions (Best 4 per Grp) #preds 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 Int 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Age 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 1 Ton Pass Lngth Cabin PassDen R-Sq 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 Adj-R2 Cp BIC 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 0.904 0.86 0.838 0.803 0.916 0.912 0.911 0.908 0.922 0.919 0.918 0.917 0.924 0.923 0.923 0.919 0.924 0.924 0.924 0.92 0.924 0.903 0.859 0.837 0.801 0.915 0.911 0.909 0.907 0.921 0.918 0.916 0.915 0.922 0.921 0.921 0.917 0.922 0.922 0.921 0.917 0.921 37.772 -360.238 125.086 -300.954 170.523 -277.122 240.675 -246.201 15.952 -376.131 24.261 -368.502 26.792 -366.249 32.443 -361.332 5.857 -382.878 11.341 -377.413 14.023 -374.808 15.909 -373.002 3.847 -381.933 5.084 -380.652 5.197 -380.534 13.056 -372.631 5.002 -377.752 5.781 -376.939 6.24 -376.462 14.904 -367.717 7 -372.692 BIC Adj-R2 Cp

  11. Cross-Validation Hold-out Sample (Training Sample = 100, Validation = 58) Fit Model on Training Sample, and obtain Regression Estimates Apply Regression Estimates from Training Sample to Validation Sample X levels for Predicted MSEP = sum(obs-pred)2/n Fit Model on Validation Sample and Compare regression coefficients with model for Training Sample PRESS Statistic (Delete observations 1-at-a-time) Fit model with each observation deleted 1-at-a-time Obtain Residual for each observation when it was deleted PRESS = sum(obs-pred(deleted))2 K-fold Cross-validation Extension of PRESS to where K groups of cases are deleted Useful for computationally intensive models

  12. Hold-Out Sample nin = 100 nout = 58 Training Sample (Intercept) tonnage passengers length cabins Estimate Std Err -1.1018 0.0048 -0.1919 0.4565 0.9506 t-stat P-Value 0.1576 0.6851 0.0007 0.0023 0.0000 0.7735 0.0118 0.0545 0.1457 0.1451 -1.424 0.407 -3.525 3.132 6.551 Coefficients keep signs, but significance levels change a lot. See Tonnage and Length. Validation Sample Estimate Std Err (Intercept) tonnage passengers length cabins t-stat P-Value 0.9159 0.0252 0.0388 0.2313 0.0000 -0.0970 0.0286 -0.1234 0.2321 0.7058 0.9142 0.0124 0.0582 0.1917 0.1060 -0.106 2.303 -2.119 1.211 6.656 2 V n ( ) ' 1 n ^ y 2 = = = = = 2 0.7578 0.0005182738 MSEP y Bias ( ) iV iV T n = 1 i V 0.0005182738 0.7578 ( ) = = 2 Percent Bias of MSEP = 100 / 100 0.06838787 (%) Bias MSEP

  13. Testing Bias = 0 from Training data to Validation = = -0.02276563 0.8778456 s 0.8778456 58 s = = = 0.1152668 s n V -0.02276563 0.1152668 = = = -0.003405238 t No evidence of systematic bias for samples s

  14. PRESS Statistic ^ ^ ^ ^ = + + where regression was fit without case Y X X i pred 0( ) 1( ) ( ) i i p i 1 i ip ( } i i 2 n ^ = PRESS Y Y pred i ( } i i = 1 i PRESS n ( ) Compare with Residual for the full model MS ^ Y Y p ^ ( ) -1 i = th P = X X'X X' Note: where diagonal element of i Y Y ii p i pred i ( } i i 1 ii ( ) = = / 0.9801 Residual 0.96 PRESS n MS Model appears to be valid, very little difference between PRESS/n and MS(Resid)

  15. K-fold Cross-Validation Results (k=10) Fold1 ObsNum Predicted CV_Pred Crew 33 0.611 39 11.081 51 8.670 54 9.240 80 4.649 84 8.750 87 7.300 98 11.240 107 3.725 116 4.567 117 3.149 126 8.820 127 15.140 151 3.590 158 2.647 Fold10 ObsNum Predicted CV_Pred Crew 16 12.060 29 8.466 38 7.060 44 10.958 45 8.430 46 6.163 56 5.515 64 6.902 65 6.830 68 12.719 92 9.118 110 11.370 145 1.605 147 2.925 155 8.640 CV_Res -0.025 -0.451 0.780 -1.240 -0.277 4.250 -1.140 -0.303 0.014 -0.124 0.080 -1.210 -1.640 1.130 -0.802 27.976 CV_Res -0.443 0.631 -0.709 -0.026 -0.770 0.209 -0.235 -0.789 -1.520 0.456 -0.416 Total 1.030 -0.029 0.012 3.370 17.522 15 1.168 Fold n(Fold) SSE(Fold) MS(Fold) 27.980 60.200 6.620 6.660 9.780 7.790 8.530 6.120 2.350 17.520 154 0.625 11.131 8.670 9.240 4.657 8.750 7.280 11.303 3.716 4.574 3.160 8.810 15.240 3.570 2.602 0.600 10.680 9.450 8.000 4.380 13.000 6.140 11.000 3.730 4.450 3.240 7.600 13.600 4.700 1.800 12.043 8.459 7.069 10.926 8.430 6.151 5.535 6.909 6.830 12.674 9.106 11.350 1.629 2.939 8.630 11.600 9.090 6.360 10.900 7.660 6.360 5.300 6.120 5.310 13.130 8.690 12.380 1.600 2.950 12.000 1 2 3 4 5 6 7 8 9 15 16 16 16 16 16 16 16 16 15 158 1.865 3.763 0.414 0.416 0.611 0.487 0.533 0.383 0.147 1.168 0.972 10 SSE # obs MSE(CV) SSE # obs MSE(CV) 15 1.865

More Related Content