PTCCS368 Syllabus - Stream Processing - 2023 Regulation Anna University
PTCCS368 Syllabus - Stream Processing - 2023 Regulation Anna University
PTCCS368 |
STREAM PROCESSING |
L T P C |
---|
2 0 2 3
COURSE OBJECTIVES:
• Introduce Data Processing terminology, definition & concepts
• Define different types of Data Processing
• Explain the concepts of Real-time Data processing
• Select appropriate structures for designing and running real-time data services in a business environment
• Illustrate the benefits and drive the adoption of real-time data services to solve real world problems
• Define different types of Data Processing
• Explain the concepts of Real-time Data processing
• Select appropriate structures for designing and running real-time data services in a business environment
• Illustrate the benefits and drive the adoption of real-time data services to solve real world problems
UNIT I |
FOUNDATIONS OF DATA SYSTEMS |
6 |
---|
Introduction to Data Processing, Stages of Data processing, Data Analytics, Batch Processing, Stream processing, Data Migration, Transactional Data processing, Data Mining, Data Management Strategy, Storage, Processing, Integration, Analytics, Benefits of Data as a Service, Challenges
UNIT II |
REAL-TIME DATA PROCESSING |
6 |
---|
Introduction to Big data, Big data infrastructure, Real-time Analytics, Near real-time solution, Lambda architecture, Kappa Architecture, Stream Processing,Understanding Data Streams, Message Broker, Stream Processor, Batch & Real-time ETL tools, Streaming Data Storage
UNIT III |
DATA MODELS AND QUERY LANGUAGES |
6 |
---|
Relational Model, Document Model, Key-Value Pairs, NoSQL, Object-Relational Mismatch, Many- to-One and Many-to-Many Relationships, Network data models, Schema Flexibility, Structured Query Language, Data Locality for Queries, Declarative Queries, Graph Data models, Cypher Query Language, Graph Queries in SQL, The Semantic Web, CODASYL, SPARQL
UNIT IV |
EVENT PROCESSING WITH APACHE KAFKA |
6 |
---|
Apache Kafka, Kafka as Event Streaming platform, Events, Producers, Consumers, Topics, Partitions, Brokers, Kafka APIs, Admin API, Producer API, Consumer API, Kafka Streams API, Kafka Connect API.
UNIT V |
REAL-TIME PROCESSING USING SPARK STREAMING |
9 |
---|
Structured Streaming, Basic Concepts, Handling Event-time and Late Data, Fault-tolerant Semantics, Exactly-once Semantics, Creating Streaming Datasets, Schema Inference, Partitioning of Streaming datasets, Operations on Streaming Data, Selection, Aggregation, Projection, Watermarking, Window operations, Types of Time windows, Join Operations, Deduplication
30 PERIODS
PRACTICAL EXERCISES: | 30 PERIODS |
---|
1. Install MongoDB
2. Design and Implement Simple application using MongoDB
3. Query the designed system using MongoDB
4. Create a Event Stream with Apache Kafka
5. Create a Real-time Stream processing application using Spark Streaming
6. Build a Micro-batch application
7. Real-time Fraud and Anomaly Detection,
8. Real-time personalization, Marketing, Advertising
2. Design and Implement Simple application using MongoDB
3. Query the designed system using MongoDB
4. Create a Event Stream with Apache Kafka
5. Create a Real-time Stream processing application using Spark Streaming
6. Build a Micro-batch application
7. Real-time Fraud and Anomaly Detection,
8. Real-time personalization, Marketing, Advertising
COURSE OUTCOMES:
CO1: Understand the applicability and utility of different streaming algorithms.
CO2: Describe and apply current research trends in data-stream processing.
CO3: Analyze the suitability of stream mining algorithms for data stream systems.
CO4: Program and build stream processing systems, services and applications.
CO5: Solve problems in real-world applications that process data streams.
CO2: Describe and apply current research trends in data-stream processing.
CO3: Analyze the suitability of stream mining algorithms for data stream systems.
CO4: Program and build stream processing systems, services and applications.
CO5: Solve problems in real-world applications that process data streams.
TOTAL:60 PERIODS
TEXT BOOKS:
1. Streaming Systems: The What, Where, When and How of Large-Scale Data Processing by Tyler Akidau, Slava Chemyak, Reuven Lax, O’Reilly publication
2. Designing Data-Intensive Applications by Martin Kleppmann, O’Reilly Media
3. Practical Real-time Data Processing and Analytics : Distributed Computing and Event Processing using Apache Spark, Flink, Storm and Kafka, Packt Publishing
2. Designing Data-Intensive Applications by Martin Kleppmann, O’Reilly Media
3. Practical Real-time Data Processing and Analytics : Distributed Computing and Event Processing using Apache Spark, Flink, Storm and Kafka, Packt Publishing
REFERENCES:
1. https://spark.apache.org/docs/latest/streaming-programming-guide.html
2. Kafka.apache.org
2. Kafka.apache.org
Comments
Post a Comment