Challenges and Tools for Processing Streaming Data

streaming n.w
1 / 12
Embed
Share

Explore the challenges of streaming data processing, including consistency, throughput vs. latency trade-offs, and time complexities. Discover tools like windowing and watermarks used in processing unbounded data streams efficiently.

  • Streaming
  • Data Processing
  • Challenges
  • Tools
  • Windowing

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Streaming COS 518: Advanced Computer Systems Lecture 11 Daniel Suo

  2. What is streaming? Fast data! Fast processing! Lots of data! 2

  3. Streaming = unbounded data (Batch = bounded data) 3

  4. Other defns are somewhat misleading We can use batch frameworks for stream processing (how?) Batch frameworks can also handle scenarios historically covered by stream frameworks (e.g., low-latency, approximate) 4

  5. Three major challenges Consistency: historically, streaming systems were created to decrease latency and made many sacrifices (at-most-processing, anyone?) Throughput vs. latency: typically a trade-off (why?) Time: as we will soon see, streaming introduces some new challenges We ve covered consistency in a lot of detail, so let s investigate time. 5

  6. Our lives used to be easy 6

  7. but if you give a data scientist some data Once we move to unbounded data, we need new methods to process whether for sake of capacity (not enough machines) or availability (data doesn t exist yet) Easiest thing to do: 7

  8. Windowing by processing time is great Easy to implement and verify correctness Great for applications like filtering or monitoring 8

  9. But what if we care about when events happen? If we associate event times, then items could now come out-of-order! (why?) 9

  10. Time creates new wounds 10

  11. This would be nice 11

  12. But not the case, so we need tools Windows: how should we group together data? Watermarks: how can we mark when the last piece of data in some window has arrived? Triggers: how can we initiate an early result? Accumulators: what do we do with the results (correct, modified, or retracted)? All topics covered in next week s readings! 12

Related


More Related Content