Advanced Data Solutions for Uber and Beyond

hive @ uber l.w
1 / 11
Embed
Share

Explore the intricacies of data management at Uber, including specialty services like hDrone and Janus. Learn about transaction support, geo-query capabilities, auto-tuning in Hive, and more. Dive into a world of advanced data solutions tailored for Uber's complex needs.

  • Data Solutions
  • Uber
  • Advanced
  • Hive
  • Janus

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Hive @ Uber Mohammad Islam DATA

  2. Data @ Uber Kafka Ingestion Layer Sharded MySQL HDFS DB

  3. Data @ Uber Specialty in Uber data Out of order data arrival Duplicate records - machine failure/replay Highly nested structure Geo information

  4. hDrone: Data registration service Registration includes Create new table Add a new partition Schema evolution Registration backfill Pros Central control Data producer does not need to handle the details Cons Yet another service to manage

  5. hDrone: Data registration service hDrone INotify Hive Registration Task ThreadPool HDFS Hive catchUp

  6. Janus Janus: Unified query execution service

  7. Expected Feature : Transaction Hive transaction support Update/delete/insert Required for incremental ingestion Issue: ORC only supports it!

  8. Expected Feature : Geo Geo/spatial query support Uber business is inherently geo-aware City OPS may not be a techy (SQL experience) Esri library can be a good start but may need more

  9. Hive (auto) Tuning Hive has bunch of knobs for better performance Not easy to remember for everybody Excellent if hive execution/planner engine can auto-set the best configurations

  10. More.. HS2 stability Column-level security (for non-Hive App) Parquet performance

  11. Q & A

Related


More Related Content