Utilizing Machine Learning for Conversion and Bounce Analysis

Slide Note
Embed
Share

Machine learning techniques such as deep learning and random forest are employed to analyze drivers of bounce and conversion rates in a Velocity 2016 New York event. The process involves vectorizing and balancing the data, smoothing it for optimal performance, and validating on separate datasets to prevent overfitting.


Uploaded on Sep 07, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Using machine learning to determine drivers of bounce and conversion (part 2) Velocity 2016 New York Velocity 2016 New York

  2. Tammy Everts Tammy Everts @ @tameverts tameverts Pat Pat Meenan Meenan @ @patmeenan patmeenan

  3. What we did What we did (and why we did it) (and why we did it)

  4. Get the code Get the code https://github.com/WPO https://github.com/WPO- -Foundation/beacon Foundation/beacon- -ml ml

  5. Deep learning Deep learning weights

  6. Random forest Random forest Lots of random decision trees Lots of random decision trees

  7. Vectorizing Vectorizing the data Everything needs to be numeric Everything needs to be numeric Strings converted to several inputs as yes/no (1/0) Strings converted to several inputs as yes/no (1/0) i.e. Device manufacturer i.e. Device manufacturer Apple would be a discrete input Apple would be a discrete input Watch out for input explosion (UA String) Watch out for input explosion (UA String) the data

  8. Balancing the data Balancing the data 3% conversion rate 97% accurate by always guessing no Subsample the data for 50/50 mix

  9. Smoothing the data Smoothing the data ML works best on normally distributed data scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_val = scaler.transform(x_val)

  10. Validation data Validation data Train on 80% of the data Validate on 20% to prevent overfitting Training accuracy from validation set

  11. Input/output relationships Input/output relationships SSL highly correlated with conversions Long sessions highly correlated with not bouncing Remove correlated features from training Remove correlated features from training

  12. Training random forest Training random forest clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None) clf.fit(x_train, y_train)

  13. Feature Feature importances importances clf.feature_importances_

  14. Training deep learning Training deep learning model = Sequential() model.add(...) model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"]) model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)

  15. Understanding deep learning Understanding deep learning

  16. Brute force FTW Brute force FTW 93 input features Train 93 models with 1 input Measuring the prediction accuracy of each Train 92 models with 2 inputs Top feature from first round Measure combined prediction accuracy Lather, rinse, repeat

  17. Visualizing the model Visualizing the model Take trained model (X inputs) Vary inputs 100ms to 20 seconds in 100ms intervals Apply the data smoothing from training set model.predict_proba

  18. What we learned What we learned

  19. Whats in our beacon? What s in our beacon? Top Top- -level level domain, timestamp, SSL Session start time, length (in pages), total load time User agent browser, OS, mobile ISP Geo country, city, organization, ISP, network speed Bandwidth Timers base, custom, user-defined Custom metrics HTTP headers https://docs.soasta.com/whatsinbeacon/

  20. Finding 1 Finding 1 Maybe everything doesn t after all doesn t matter

  21. Bounce rate Bounce rate

  22. Finding 2 Finding 2 DOM ready (aka DOM content loaded) and average session load time were the best indicators of bounce rate

  23. Up to 89.5% accuracy

  24. Finding 3 Finding 3 When it came to getting high predictability, conversion data was tougher than bounce data

  25. 81% prediction accuracy was as high as we got

  26. Finding 4 Finding 4 Pages with more scripts were more less likely to convert

  27. Finding 5 Finding 5 The number of DOM elements matters a lot

  28. Finding 6 Finding 6 Mobile-related measurements weren t meaningful predictors of conversions

  29. Finding 7 Finding 7 Some conventional metrics were not as important as we thought

  30. Importance Importance (bounce) (bounce) Feature Feature Start render 69 ~top 3

  31. Things to watch out for Things to watch out for (other than dangling prepositions) (other than dangling prepositions)

  32. Yep, checkout pages are SLOW Yep, checkout pages are SLOW

  33. Takeaways

  34. 1. 1. 2. 2. 3. 3. 4. 4. 5. 5. YMMV YMMV Do try this at home Do try this at home Gather your RUM data (lots of it) Gather your RUM data (lots of it) Run the machine learning against it Run the machine learning against it If you get unexpected results, keep digging If you get unexpected results, keep digging

  35. Thanks! @ @patmeenan patmeenan @ @tameverts tameverts

Related


More Related Content