Algorithms for Big Data: Special Topics on CSCE 689

Algorithms for Big Data: Special Topics on CSCE 689
Slide Note
Embed
Share

This course covers advanced algorithms for handling big data, focusing on topics such as Euclidean space, CountSketch, ?2 estimation, moment estimation, Johnson-Lindenstrauss lemma, and algorithm implementation. Dive into the intricacies of Euclidean space, distance functions, heavy-hitters problems, frequency estimation, and more. Explore how algorithms can efficiently process and analyze large datasets, providing approximations and solutions for complex data challenges. Gain insights into algorithmic techniques for optimizing data analysis and understanding the nuances of statistical approximations in the context of big data applications.

  • Algorithms
  • Big Data
  • CSCE 689
  • Data Analysis
  • Statistical Approximations

Uploaded on Mar 04, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE 689: Special Topics on Algorithms for Big Data October 2, 2023

  2. Recall: Euclidean Space and ?2Norm For ? ??, the ?2norm of ? is denoted by ?2and defined as: 2+ ?2 2+ + ?? 2 ?2= ?1 For ?,? ??, the distance function ? is denoted by 2 and defined as ? ?2

  3. Recall: CountSketch Summary CountSketch solves the ?2 heavy-hitters problem: Given a set ? of ? elements from [?] that induces a frequency vector ? ?? and a threshold parameter ? (0,1), output a list that includes: The items from [?] that have frequency at least ? ?2 No items with frequency less than ? 2 ?2 1 ?2log2? bits of space Space usage: ?

  4. ?2 Estimation Goal: Given a set ? of ? elements from [?] that induces a frequency vector ? ?? and an accuracy parameter ? (0,1), output a (1 + ?)-approximation to ?2 Find ? such that 1 ? ?2 ? 1 + ? ?2 2 ? 1 + ? ?2 2 Find ? such that 1 ? ?2

  5. ?2 Moment Estimation 2 ? 1 + ? ?2 2 Goal: Find ? such that 1 ? ?2

  6. ?2 Moment Estimation 2 ? 1 + ? ?2 2 Goal: Find ? such that 1 ? ?2 1 7 7 7 3 7 7 1 4 1 1 1 1 5 1 1 7 1 7 5 1 7 7

  7. ?2 Moment Estimation 2 ? 1 + ? ?2 2 Goal: Find ? such that 1 ? ?2 1 7 7 7 3 7 7 1 4 1 1 1 1 5 1 1 7 1 7 5 1 7 7 ?1 ?2 ?3 ?4 ?5 ?6 ?7 10 0 1 1 2 0 9

  8. Johnson-Lindenstrauss Lemma Distributional Johnson-Lindenstrauss Lemma: Given ?? ? with ? = ? ?2 and each entry drawn from ? ?? and setting ? = ?, then with probability at least 1 ? log 1/? 1 ?? 0,1 , then for any 1 ? ?2 ?2 1 + ? ?2

  9. ?2 Moment Estimation log 1/? ?2 Algorithm: Generate ?? ? with ? = ? 1 ?? 0,1 . Set ? = ? and each entry drawn from Whenever there is an update to a coordinate of ?, update ?

  10. ?2 Moment Estimation log 1/? ?2 Algorithm: Generate ?? ? with ? = ? 1 ?? 0,1 . Set ? = ? 1 7 7 7 3 7 7 1 4 1 1 1 1 5 1 1 7 1 7 5 1 7 7 and each entry drawn from Whenever there is an update to a coordinate of ?, update ?

  11. ?2 Moment Estimation log 1/? ?2 Algorithm: Generate ?? ? with ? = ? 1 ?? 0,1 . Set ? = ? 1 7 7 7 3 7 7 1 4 1 1 1 1 5 1 1 7 1 7 5 1 7 7 and each entry drawn from Whenever there is an update to a coordinate of ?, update ? ? = ? + ?1 ? = ? + ?7 ? = ? + ?7

  12. ?2 Moment Estimation log 1/? ?2 Algorithm: Generate ?? ? with ? = ? 1 ?? 0,1 . Set ? = ? 1 7 7 7 3 7 7 1 4 1 1 1 1 5 1 1 7 1 7 5 1 7 7 and each entry drawn from Whenever there is an update to a coordinate of ?, update ? ? = ? + ?1, ? = ? + ?1 ? = ? + ?7, ? = ? + ?7 ? = ? + ?7, ? = ? + ?7

  13. AMS Algorithm Generate a random sign vector ? 1,+1? Maintain ? = ?,? Output ? ?2

  14. 1

  15. 1

  16. 2

  17. 1

  18. 2

  19. 1

  20. 1

  21. 1

  22. 2

  23. 1

  24. 1

  25. 2

  26. 2

  27. 2

  28. 1

  29. AMS Algorithm What values of ? did you get? ? = ?,? = ?1?1+ ?2?2+ + ???? What values of ?did you get? ? = ?2= ?,?????????

  30. AMS Algorithm What values of ?did you get? ? = ?2= ?,????????? ?1 ?2 ?3 ?4 ?5 ?6 ?7 9 6 0 0 0 0 0

  31. AMS Algorithm What is E ? ? ? = ?,? = ?1?1+ ?2?2+ + ???? ? = ?2= ?,????????? E ? = ?,?E ???????? = ?E ??2= ?2 2

  32. AMS Algorithm What is Var ? ? ? = ?,? = ?1?1+ ?2?2+ + ???? ?2= ?4= ?,?,?,????????????????? E ?2= ?,?,?,?E ???????????????? = ? E ??4+ 6 ? ?E ??2?? 6 ?2 2 4

  33. AMS Algorithm 2 with By Chebyshev s inequality, ? will be a 9-approximation to ?2 probability 2 3

  34. AMS Algorithm How to get (1 + ?)-approximation? 1 ?2 times and take the average Repeat ?

  35. AMS Algorithm 1 ?2 words of space or ? 1 ?2log? bits of space Space of algorithm: ?

Related


More Related Content