NewsIndexer is a document retrieval system. It has two main tasks: find relevant documents to user queries and evaluate the matching results and sort them according to relevance. This project aims to parse simple news articles and index a decent sized subset of the given news corpus. Indexing is a way of storing data to facilitate fast and accurate information retrieval. The system consists of two main components: a parser and an indexer.The Parser is responsible for converting a given text file into a Document representation. A Document is nothing but a collection of fields.Once a given file has been converted into a Document, the IndexWriter is responsible for writing the fields to the corresponding indexes. We have four different kinds of indexes: Term index, Place index, Author index and Category index. We also provide an index introspection mechanism that can later be built upon to support queries.
-
Notifications
You must be signed in to change notification settings - Fork 0
NewsIndexer is a document retrieval system. It has two main tasks: find relevant documents to user queries and evaluate the matching results and sort them according to relevance. This project aims to parse simple news articles and index a decent sized subset of the given news corpus. Indexing is a way of storing data to facilitate fast and accur…
chandana1332/News-Indexer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
NewsIndexer is a document retrieval system. It has two main tasks: find relevant documents to user queries and evaluate the matching results and sort them according to relevance. This project aims to parse simple news articles and index a decent sized subset of the given news corpus. Indexing is a way of storing data to facilitate fast and accur…
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published