The course covers the most important issues from the world of Big Data. The first part of the course presents the fundamental mechanisms and tools: Hadoop platform, YARN, HDFS, MapReduce, Hive. The second part focuses on the Apache Spark.
During the course, we use Google Cloud Platform and the services that are there. In particular, Dataproc, which is a rich and fully functional distribution of a Hadoop cluster with many additional tools such as Spark, Hive, Apache Kafka, Flink, etc.
The course covers the most important areas of the Big Data world. The first part covers the basics: Hadoop, YARN, HDFS, MapReduce. The second part focuses on Apache Spark. The third part covers streaming data processing. The fourth part is the project.
During the course, we use Google Cloud Platform and the services that are there. In particular, Dataproc, which is a rich and fully functional distribution of a Hadoop cluster with many additional tools such as Spark, Hive, Apache Kafka, Flink, etc.