Explain Information Retrieval. Also, Basic Measures for Text Retrieval.

 Information Retrieval



  • Information retrieval deals with the retrieval of information from a large number of text-based documents. 
  • Examples of information retrieval systems include − Online Library catalog systems, Online Document Management Systems, Web Search Systems, etc.
  • The main problem is an information retrieval system is to locate relevant documents in a document collection based on a user's query. This kind of user's query consists of some keywords describing an information need. In such search problems, the user takes an initiative to pull relevant information out from a collection. This is appropriate when the user has an ad-hoc information need, i.e., a short-term need. But if the user has a long-term information need, then the retrieval system can also take the initiative to push any newly arrived information item to the user.
  •  This kind of access to information is called Information Filtering. And the corresponding systems are known as Filtering Systems or Recommender Systems.


Basic Measures for Text Retrieval

We need to check the accuracy of a system when it retrieves a number of documents on the basis of the user's input. Let the set of documents relevant to a query be denoted as {Relevant} and the set of the retrieved documents as {Retrieved}. The set of documents that are relevant and retrieved can be denoted as {Relevant} ∩ {Retrieved}. This can be shown in the form of a Venn diagram as follows:


a) Precision

Precision is the percentage of retrieved documents that are in fact relevant to the query. Precision can be defined as

Precision= |{Relevant} ∩ {Retrieved}| / |{Retrieved}|

b) Recall

The recall is the percentage of documents that are relevant to the query and were in fact retrieved. The recall is defined as

Recall = |{Relevant} ∩ {Retrieved}| / |{Relevant}|

Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Discuss classification or taxonomy of virtualization at different levels.