Data Analytics

Big Data framework using Hadoop and Spark, principles of HDFS, YARN, MapReduce, HBase, a distributed column-oriented database, real-time data processing using Spark, understanding parallel processing in Spark, and using Spark RDD optimization techniques and SparkML, and use Pig and Hive to process and analyze large datasets stored in the HDFS and to use Sqoop and Flume for data ingestion.

Course ID

ECE 475