## OCS353 Syllabus - Data Science Fundamentals - 2021 Regulation - Open Elective | Anna University

OCS353

DATA SCIENCE FUNDAMENTALS

L T P C

2023

COURSE OBJECTIVES:
● Familiarize students with the data science process.
● Understand the data manipulation functions in Numpy and Pandas.
● Explore different types of machine learning approaches.
● Understand and practice visualization techniques using tools.
● Learn to handle large volumes of data with case studies.

UNIT I

INTRODUCTION

6

Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining research goals – Retrieving data – data preparation - Exploratory Data analysis – build the model – presenting findings and building applications - Data Mining - Data Warehousing – Basic statistical descriptions of Data.

UNIT II

DATA MANIPULATION

9

Python Shell - Jupyter Notebook - IPython Magic Commands - NumPy Arrays-Universal Functions – Aggregations – Computation on Arrays – Fancy Indexing – Sorting arrays – Structured data – Data manipulation with Pandas – Data Indexing and Selection – Handling missing data – Hierarchical indexing – Combining datasets – Aggregation and Grouping – String operations – Working with time series – High performance

UNIT III

MACHINE LEARNING

5

The modeling process - Types of machine learning - Supervised learning - Unsupervised learning - Semi-supervised learning- Classification, regression - Clustering – Outliers and Outlier Analysis

UNIT IV

DATA VISUALIZATION

5

Importing Matplotlib – Simple line plots – Simple scatter plots – visualizing errors – density and contour plots – Histograms – legends – colors – subplots – text and annotation – customization – three dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn

UNIT V

HANDLING LARGE DATA

5

Problems - techniques for handling large volumes of data - programming tips for dealing with large data sets- Case studies: Predicting malicious URLs, Building a recommender system - Tools and techniques needed - Research question - Data preparation - Model building – Presentation and automation.

TOTAL: 60 PERIODS

COURSE OUTCOMES: At the end of this course, the students will be able to:
CO1: Gain knowledge on data science process.
CO2: Perform data manipulation functions using Numpy and Pandas.
CO3: Understand different types of machine learning approaches.
CO4: Perform data visualization using tools.
CO5: Handle large volumes of data in practical scenarios.

TEXT BOOKS:
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning Publications, 2016.
2. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016.

REFERENCES:
1. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.
2. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press,2014.