
Advanced Data Solutions for Uber and Beyond
Explore the intricacies of data management at Uber, including specialty services like hDrone and Janus. Learn about transaction support, geo-query capabilities, auto-tuning in Hive, and more. Dive into a world of advanced data solutions tailored for Uber's complex needs.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Hive @ Uber Mohammad Islam DATA
Data @ Uber Kafka Ingestion Layer Sharded MySQL HDFS DB
Data @ Uber Specialty in Uber data Out of order data arrival Duplicate records - machine failure/replay Highly nested structure Geo information
hDrone: Data registration service Registration includes Create new table Add a new partition Schema evolution Registration backfill Pros Central control Data producer does not need to handle the details Cons Yet another service to manage
hDrone: Data registration service hDrone INotify Hive Registration Task ThreadPool HDFS Hive catchUp
Janus Janus: Unified query execution service
Expected Feature : Transaction Hive transaction support Update/delete/insert Required for incremental ingestion Issue: ORC only supports it!
Expected Feature : Geo Geo/spatial query support Uber business is inherently geo-aware City OPS may not be a techy (SQL experience) Esri library can be a good start but may need more
Hive (auto) Tuning Hive has bunch of knobs for better performance Not easy to remember for everybody Excellent if hive execution/planner engine can auto-set the best configurations
More.. HS2 stability Column-level security (for non-Hive App) Parquet performance