|
|
| [ Main Menu ] [ Library Menu ] |
|
Relevance Ranking |
| Relevance Ranking |
| C2 is among the few advanced library systems employing relevance searching capabilities. Most users look for information about a particular topic. Often, the queries are vague and fuzzy. Relevance retrieval provides a mechanism with which the system determines the degree to which the retrieved records are relevant to the user’s original query. |
| Term Weighting |
| Relevance ranking is a process by which C2 assigns significant values to each search term used in the query. Each document retrieved during a search is assigned a weighted value. The retrieved documents are then ranked in descending order according to the weighted value. The ranking algorithm used in C2 is based on the inverse document frequency (IDF) formula, where search terms are assigned a higher weight if they occur less frequently in the database and more frequently in documents. The term weighting is calculated on a logarithmic scale. |
| Dynamic Maintenance |
| The database term metrics are maintained dynamically by the C2 relevance indexing system, so that term weightings always reflect the true weight of the term in the database. The term metrics are automatically recalculated to reflect the database content as it changes from day to day. |
| Benefits to Users |
| Retrieved records are ranked according to their relevance to the user’s request. Records that are highly relevant are ranked first, followed by records that are less relevant. Since the most relevant records appear at the top of the list, users do not need to look through long lists to find records of interest. |
| Boolean Limitations |
| Most library systems in use today employ Boolean searching techniques to control the search results. One of the main problems with using Boolean logic in search queries is that it is too complex for untrained users. Intermediary and trained staff carry out most on-line searches. Another problem is that Boolean searches effectively splits the database into two sets of documents; one set which match the query and the other which does not. There is no mechanism by which the user can judge the degree of relevance of the retrieved documents. |