Sale!
Placeholder

Information Retrieval and the Internet

10,000 3,000

Topic Description

Summary:
Today’s society is extremely information orientated, it has access to more information than at any
other time in history. Indeed, this flow of information is now vital to many individuals, businesses
and economies as a whole.
The Internet is one of the largest repositories of human knowledge, this coupled with its everincreasing
size creates an urgent need for methods to search this information source accurately and
quickly. Research into Information Retrieval methods has, as a result gained far more importance in
terms of computer research in recent years.
This Final Year Project report details the comparison of several methods used to power Information
Retrieval systems and in particular highlighting those aspects relevant to searching on the Internet.
A thorough evaluation of the existing commercial IR systems that use these methods forms the bulk
of this document, leading on to how these existing systems are coping with the demand placed upon
them within the Internet environment.
Additionally this report documents the limited implementation of an IR system that draws upon the
research outlined above.
The system built is to be run on the website of ClearIT, an IT Consultancy company, the Search
Engine will be used by potential customers to find information within the ClearIT site

TABLE OF CONTENT

Summary: ………………………………………………………………………………………………………………I
Chapter 1: Introduction to Information Retrieval …………………………………………………………. 1
1.1 Definition of an Information Retrieval System……………………………………………………. 1
1.3 The Problem…………………………………………………………………………………………………. 2
1.4 Minimum Requirements …………………………………………………………………………………. 2
Chapter 2: Information Retrieval and Searching the Web ……………………………………………… 3
2.1 Distinctions between Information Retrieval and Data Retrieval …………………………….. 3
2.2 Challenges Faced by IR Systems on the Internet…………………………………………………. 3
2.2.1 Problems with the Data on the Internet: ………………………………………………………. 3
2.2.2 Problems with Language…………………………………………………………………………… 4
2.2.3 Problems with Different Media ………………………………………………………………….. 5
Chapter 3: Background Research into Methods of Information Retrieval on the Web………… 6
3.1 The Classic Models ……………………………………………………………………………………….. 6
3.1.1 Boolean Model ………………………………………………………………………………………. 6
3.1.2 The Vector Model……………………………………………………………………………………. 7
3.1.3 The Probabilistic Model……………………………………………………………………………. 8
3.2 Directories and Browsing ……………………………………………………………………………….. 9
3.3 Indexing …………………………………………………………………………………………………….. 10
3.4 Index Optimisation Techniques ……………………………………………………………………… 11
3.4.1 Lexical Analysis ……………………………………………………………………………………. 11
3.4.2 Stop Words…………………………………………………………………………………………… 11
3.4.3 Stemming …………………………………………………………………………………………….. 12
3.4.4 Index Terms Selection ……………………………………………………………………………. 12
3.4.5 Thesaurus …………………………………………………………………………………………….. 12
3.5 Index File Structure ……………………………………………………………………………………… 13
3.6 Query Languages…………………………………………………………………………………………. 13
Chapter 4: Evaluation of Current Information Retrieval Systems………………………………….. 14
4.1 Difficulties of Retrieval Evaluation ………………………………………………………………… 14
4.2 Introduction to the Systems Under Evaluation ………………………………………………….. 14
4.3 Evaluation Criteria ………………………………………………………………………………………. 15
4.3.1 Size …………………………………………………………………………………………………….. 16
4.3.2 Freshness ……………………………………………………………………………………………… 16
4.3.3 Features ……………………………………………………………………………………………….. 17
Chapter 5: Results from Information System Evaluation …………………………………………….. 21
5.1 Size …………………………………………………………………………………………………………… 21
5.2 Freshness……………………………………………………………………………………………………. 22
5.3 Features……………………………………………………………………………………………………… 23
Chapter 6: Further Analysis of Top Ranking Search Engine ………………………………………… 25
6.1 Testing Method …………………………………………………………………………………………… 25
6.2 Test Results ………………………………………………………………………………………………… 26
Chapter 7: Are Current Information Retrieval Systems on the Internet Meeting Demand?… 29
7.1 Internet Size ……………………………………………………………………………………………….. 29
7.2 Search Engine Index Coverage ………………………………………………………………………. 30
7.3 Conclusion …………………………………………………………………………………………………. 31
Chapter 8: Implementation of a Small Information Retrieval System…………………………….. 32
8.1 Background ………………………………………………………………………………………………… 32
8.1.1 ClearIT ………………………………………………………………………………………………… 32
Ctxjt
Justin Turner
III
8.2 Solutions Considered……………………………………………………………………………………. 32
8.3 Solution Chosen ………………………………………………………………………………………….. 33
8.4 Solution Implemented…………………………………………………………………………………… 33
8.5 Evaluation of Implementation………………………………………………………………………… 34
8.5.1 Results…………………………………………………………………………………………………. 34
Chapter 9: Conclusion…………………………………………………………………………………………… 35
9.1 Findings …………………………………………………………………………………………………….. 35
Appendix A: Reflection……………………………………………………………………………………… 39
Appendix B:…………………………………………………………………………………………………….. 40
Appendix C……………………………………………………………………………………………………… 41
Appendix D…….

PROJECT SAMPLE/DEPARTMENTS

REVIEW OUR SERVICES

SEE FAQ