CS6007 - INFORMATION RETRIEVAL (Syllabus) 2013-regulation Anna University

CS6007

INFORMATION RETRIEVAL

 LPTC

3003

OBJECTIVES: The Student should be made to:
• Learn the information retrieval models.
• Be familiar with Web Search Engine.
• Be exposed to Link Analysis.
• Understand Hadoop and Map Reduce.
• Learn document text mining techniques.

UNIT I

INTRODUCTION

9

Introduction -History of IR- Components of IR - Issues –Open source Search engine Frameworks - The impact of the web on IR - The role of artificial intelligence (AI) in IR – IR Versus Web Search - Components of a Search engine- Characterizing the web.

UNIT II

INFORMATION RETRIEVAL

9

Boolean and vector-space retrieval models- Term weighting - TF-IDF weighting- cosine similarity – Preprocessing - Inverted indices - efficient processing with sparse vectors – Language Model based IR - Probabilistic IR –Latent Semantic Indexing - Relevance feedback and query expansion.


UNIT III

WEB SEARCH ENGINE – INTRODUCTION AND CRAWLING

9

Web search overview, web structure, the user, paid placement, search engine optimization/ spam. Web size measurement - search engine optimization/spam – Web Search Architectures - crawling - meta-crawlers- Focused Crawling - web indexes –- Near-duplicate detection - Index Compression - XML retrieval.

UNIT IV

WEB SEARCH – LINK ANALYSIS AND SPECIALIZED SEARCH

9

Link Analysis –hubs and authorities – Page Rank and HITS algorithms -Searching and Ranking – Relevance Scoring and ranking for Web – Similarity - Hadoop & Map Reduce - Evaluation - Personalized search - Collaborative filtering and content-based recommendation of documents and products – handling “invisible” Web - Snippet generation, Summarization, Question Answering, Cross- Lingual Retrieval.

UNIT V

DOCUMENT TEXT MINING

9

Information filtering; organization and relevance feedback – Text Mining -Text classification and clustering - Categorization algorithms: naive Bayes; decision trees; and nearest neighbor - Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM).

TOTAL : 45 PERIODS

OUTCOMES: Upon completion of the course, students will be able to
• Apply information retrieval models.
• Design Web Search Engine.
• Use Link Analysis.
• Use Hadoop and Map Reduce.
• Apply document text mining techniques.

TEXT BOOKS:
1. C. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval , Cambridge University Press, 2008.
2. Ricardo Baeza -Yates and Berthier Ribeiro - Neto, Modern Information Retrieval: The Concepts and Technology behind Search 2nd Edition, ACM Press Books 2011.
3. Bruce Croft, Donald Metzler and Trevor Strohman, Search Engines: Information Retrieval in Practice, 1st Edition Addison Wesley, 2009.
4. Mark Levene, An Introduction to Search Engines and Web Navigation, 2nd Edition Wiley, 2010.

REFERENCES
1. Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, The MIT Press, 2010.
2. Ophir Frieder “Information Retrieval: Algorithms and Heuristics: The Information Retrieval Series “, 2nd Edition, Springer, 2004.
3. Manu Konchady, “Building Search Applications: Lucene, Ling Pipe”, and First Edition, Gate Mustru Publishing, 2008.

Comments

Popular posts from this blog

CS3491 Syllabus - Artificial Intelligence And Machine Learning - 2021 Regulation Anna University

BE3251 - Basic Electrical and Electronics Engineering (Syllabus) 2021-regulation Anna University

CS3401 Syllabus - Algorithms - 2021 Regulation Anna University