Big Data – A Game Changer!!!

Big Data – A Game Changer!!!

With ever increasing data, Big data analysis and Hadoop platform is gaining popularity day by day. The Hadoop Architecture is perfect to handle the huge data with cheap commodity hardware resources.

  • HDFS and MapReduce are 2 building blocks of Hadoop.
    1. HDFS [Hadoop Distributed File System] is the Storage layer that stores the all the huge data files in a highly available and fault tolerant system.
    2. MR [MapReduce] is the Processing Layer for the Huge data on the commodity machines in a parallel fashion for higher throughput and lower latency.

  • But to understand Hadoop, we should know the other ecosystem technologies as mentioned below:
    1. Oozie – Workflow Scheduler: for end to end process execution
    2. Sqoop – Data Ingestion Tool for static [at rest] data
    3. Flume/Kafka – Data Ingestion Tool for dynamic [at motion] data i.e. Real Time Data
    4. Pig – Data Pre-Processing Tool mainly for ETL/ELT kind of logic by using PigLatin scripting
    5. Hive – Database or Data-Warehouse like storage for Hadoop
    6. HBase – NoSQL database for Hadoop.

Apache Spark [By DataBricks]:

Faster framework for Data Processing as compared to traditional MapReduce.

Works In Memory and based on DAGs [Directed Acyclic Graphs].

Learn RDD, Dataframe and Dataset APIs for Data Processing.

Built in Scala, but it Supports Scala, Java and Python Languages.

In upcoming blog posts, I will elaborate on these latest technologies further.

Watch this video for more details



Rahul Aggarwal
http://guardiancoder.in

Senior Data Scientist and Gen-AI Engineer #DataScience #AI #RNN #CNN #GenAI #ChatGPT #LLMs

1 comment so far

techno

Request you to elaborate it…thanks

Leave a Reply

Discover more from Rahul Aggarwal's EdTech

Subscribe now to keep reading and get access to the full archive.

Continue reading