TIEE3045 Syllabus - Big Data Analytics - 2022 Regulation Anna University

TIEE3045 Syllabus - Big Data Analytics - 2022 Regulation Anna University

TIEE3045

BIG DATA ANALYTICS

 L T P C

2023

COURSE OBJECTIVES:
• To understand big data.
• To learn and use NoSQL big data management.
• To learn mapreduce analytics using Hadoop and related tools.
• To work with map reduce applications
• To understand the usage of Hadoop related tools for Big Data Analytics

UNIT I

UNDERSTANDING BIG DATA

5

Introduction to big data – convergence of key trends – unstructured data – industry examples of big data – web analytics – big data applications– big data technologies – introduction to Hadoop – open source technologies – cloud and big data – mobile business intelligence – Crowd sourcing analytics – inter and trans firewall analytics.

UNIT II

NOSQL DATA MANAGEMENT

7

Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schemaless databases – materialized views – distribution models – master-slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients


UNIT III

MAP REDUCE APPLICATIONS

6

MapReduce workflows – unit tests with MRUnit – test data and local tests – anatomy of MapReduce job run – classic Map-reduce – YARN – failures in classic Map-reduce and YARN – job scheduling – shuffle and sort – task execution – MapReduce types – input formats – output formats.

UNIT IV

BASICS OF HADOOP

6

Data format – analyzing data with Hadoop – scaling out – Hadoop streaming – Hadoop pipes – design of Hadoop distributed file system (HDFS) – HDFS concepts – Java interface – data flow – Hadoop I/O – data integrity – compression – serialization – Avro – file-based data structures - Cassandra – Hadoop integration.

UNIT V

HADOOP RELATED TOOLS

6

Hbase – data model and implementations – Hbase clients – Hbase examples – praxis. Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin scripts. Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries.

TOTAL: 60 PERIODS

COURSE OUTCOMES: After the completion of this course, students will be able to:
• Describe big data and use cases from selected business domains.
• Explain NoSQL big data management.
• Install, configure, and run Hadoop and HDFS.
• Perform map-reduce analytics using Hadoop.
• Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analytics.

TEXT BOOKS:
1. Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
2. Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
3. Sadalage, Pramod J. “NoSQL distilled”, 2013

REFERENCES:
1. E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012.
2. Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
3. Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
4. Alan Gates, "Programming Pig", O'Reilley, 2011.

Comments

Popular posts from this blog

CS3491 Syllabus - Artificial Intelligence And Machine Learning - 2021 Regulation Anna University

CS3401 Syllabus - Algorithms - 2021 Regulation Anna University

CS3492 Syllabus - Database Management Systems - 2021 Regulation Anna University