SerDes – Serializer and Deserializer

Hadoop can support wide variety of data formats, commonly referred as SerDes i.e. Serializer and Deserializer. SerDes defines the Input and Output Data formats.

Here are the commonly used SerDes in Hadoop/Hive.

Types of SerDes:
1. Text SerDes:
  - CSV
  - JSON
  - XML
2. Binary SerDes (most compressed)
  - Sequential File
  - Avro
3. Columnar SerDes (efficient read and writes)
  - RC
  - ORC
  - Parquet

For best performance:

Use ORC with Apache Hive

Use Parquet with Apache Spark

These SerDes are associated with different Compression Codecs, e.g.:
1. gzip
2. lz4
3. snappy - most important

Tags: Advance Analytics big data Big data analysis guardiancoder hadoop hive LetsTalkHadoop mapreduce

Rahul Aggarwal

http://guardiancoder.in

Senior Data Scientist and Gen-AI Engineer #DataScience #AI #RNN #CNN #GenAI #ChatGPT #LLMs

SerDes – Serializer and Deserializer

SerDes – Serializer and Deserializer

Like this:

Related

Rahul Aggarwal

http://guardiancoder.in

Leave a ReplyCancel reply

SerDes – Serializer and Deserializer

Share this post:

Like this:

Related

Rahul Aggarwal

http://guardiancoder.in

Leave a ReplyCancel reply

Discover more from Rahul Aggarwal's EdTech