Apache Avro: Data Serialization System Overview

Apache Avro: Data Serialization System Overview
Slide Note
Embed
Share

Avro is a powerful data serialization system implemented in various programming languages. It provides rich data structures, compact binary format, and seamless integration with dynamic languages. From schema declaration to complex data types like records and enums, Avro offers flexibility and efficiency in handling data.

  • Apache Avro
  • Data Serialization
  • Distributed Computing
  • Big Data
  • Programming

Uploaded on Mar 10, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Apache Avro CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook

  2. Overview Avro is a data serialization system Implemented in C, C++, C#, Java, JavaScript, Perl, PHP, Python, and Ruby

  3. Avro Provides Rich data structures Compact, fast, binary data format A container file to store persistent data Remote Procedure Call (RPC) Simple integration with dynamic languages

  4. Schema Declaration A JSON string A JSON object {"type": "typeName" ...attributes...} A JSON array, representing a union of types

  5. Primitive Types Null Boolean Int Long Float Double Bytes String

  6. Complex Types Records Enums Arrays Maps Unions Fixed

  7. Record Example - LinkedList { "type": "record", "name": "LongList", // old name for this "aliases": ["LinkedLongs"], "fields" : [ // each element has a long {"name": "value", "type": "long"}, // optional next element {"name": "next", "type": ["LongList", "null"]} ] } Comments are here for descriptive purposes only there are no comments in JSON

  8. Enum Example Playing Cards { "name": "Suit", "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] } "type": "enum",

  9. Array { } "type": "array", "items": "string"

  10. Maps { } "type": "map", "values": "long"

  11. Unions Represented using JSON arrays ["string", "null"] declares a schema which may be a string or null May not contain more than one schema with the same type, except in the case of named types like record, fixed, and enum. Two arrays or maps? No. But two record types? Yes! Cannot contain other unions

  12. Fixed { "type": "fixed", "size": 16, "name": "md5" }

  13. A bit on Naming Records, enums, and fixed types are all named The full name is composed of the name and a namespace Names start with [A-Za-z_] and can only contain [A-Za-z0-9_] Namespaces are dot-separated sequence of names Named types can be aliasedto map a writer s schema to a reader

  14. Encodings! Binary JSON One is more readable by the machines, one is more readable by the humans Details of how they are encoded can be found at http://avro.apache.org/docs/current/spec.html

  15. Compression Null Deflate Snappy (optional)

  16. Other Features RPC via Protocols Message passing between readers and writers Schema Resolution When schema and data don t align Parsing Canonical Form Transform schemas into PCF to determine sameness between schemas Schema Fingerprints To uniquely identify schemas

  17. Code Generation! [shadam1@491vm ~]$ cat user.avsc { "namespace": "example.avro", "type": "record", "name": "User", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"] }, {"name": "favorite_color", "type": ["string", "null"] } ] }

  18. Code Generation! [shadam1@sandbox ~]$ java -jar avro-tools-1.7.6.jar compile \ schema user.avsc . Input files to compile: user.avsc [shadam1@sandbox ~]$ vi example/avro/User.java

  19. Java and Python Demo! https://github.com/adamjshook/hadoop- demos/tree/master/avro

  20. References http://avro.apache.org

Related


More Related Content