PTCCS368 Syllabus - Stream Processing - 2023 Regulation Anna University

PTCCS368 Syllabus - Stream Processing - 2023 Regulation Anna University

PTCCS368

STREAM PROCESSING

 L T P C

2 0 2 3

COURSE OBJECTIVES:
• Introduce Data Processing terminology, definition & concepts
• Define different types of Data Processing
• Explain the concepts of Real-time Data processing
• Select appropriate structures for designing and running real-time data services in a business environment
• Illustrate the benefits and drive the adoption of real-time data services to solve real world problems

UNIT I

FOUNDATIONS OF DATA SYSTEMS

6

Introduction to Data Processing, Stages of Data processing, Data Analytics, Batch Processing, Stream processing, Data Migration, Transactional Data processing, Data Mining, Data Management Strategy, Storage, Processing, Integration, Analytics, Benefits of Data as a Service, Challenges

UNIT II

REAL-TIME DATA PROCESSING

6

Introduction to Big data, Big data infrastructure, Real-time Analytics, Near real-time solution, Lambda architecture, Kappa Architecture, Stream Processing,Understanding Data Streams, Message Broker, Stream Processor, Batch & Real-time ETL tools, Streaming Data Storage

UNIT III

DATA MODELS AND QUERY LANGUAGES

6

Relational Model, Document Model, Key-Value Pairs, NoSQL, Object-Relational Mismatch, Many- to-One and Many-to-Many Relationships, Network data models, Schema Flexibility, Structured Query Language, Data Locality for Queries, Declarative Queries, Graph Data models, Cypher Query Language, Graph Queries in SQL, The Semantic Web, CODASYL, SPARQL

UNIT IV

EVENT PROCESSING WITH APACHE KAFKA

6

Apache Kafka, Kafka as Event Streaming platform, Events, Producers, Consumers, Topics, Partitions, Brokers, Kafka APIs, Admin API, Producer API, Consumer API, Kafka Streams API, Kafka Connect API.

UNIT V

REAL-TIME PROCESSING USING SPARK STREAMING

9

Structured Streaming, Basic Concepts, Handling Event-time and Late Data, Fault-tolerant Semantics, Exactly-once Semantics, Creating Streaming Datasets, Schema Inference, Partitioning of Streaming datasets, Operations on Streaming Data, Selection, Aggregation, Projection, Watermarking, Window operations, Types of Time windows, Join Operations, Deduplication

30 PERIODS

PRACTICAL EXERCISES: 30 PERIODS
1. Install MongoDB
2. Design and Implement Simple application using MongoDB
3. Query the designed system using MongoDB
4. Create a Event Stream with Apache Kafka
5. Create a Real-time Stream processing application using Spark Streaming
6. Build a Micro-batch application
7. Real-time Fraud and Anomaly Detection,
8. Real-time personalization, Marketing, Advertising

COURSE OUTCOMES:
CO1: Understand the applicability and utility of different streaming algorithms.
CO2: Describe and apply current research trends in data-stream processing.
CO3: Analyze the suitability of stream mining algorithms for data stream systems.
CO4: Program and build stream processing systems, services and applications.
CO5: Solve problems in real-world applications that process data streams.

TOTAL:60 PERIODS

TEXT BOOKS:
1. Streaming Systems: The What, Where, When and How of Large-Scale Data Processing by Tyler Akidau, Slava Chemyak, Reuven Lax, O’Reilly publication
2. Designing Data-Intensive Applications by Martin Kleppmann, O’Reilly Media
3. Practical Real-time Data Processing and Analytics : Distributed Computing and Event Processing using Apache Spark, Flink, Storm and Kafka, Packt Publishing

REFERENCES:
1. https://spark.apache.org/docs/latest/streaming-programming-guide.html
2. Kafka.apache.org

Comments

Popular posts from this blog

CS3491 Syllabus - Artificial Intelligence And Machine Learning - 2021 Regulation Anna University

CS3401 Syllabus - Algorithms - 2021 Regulation Anna University

CS3492 Syllabus - Database Management Systems - 2021 Regulation Anna University