Abstract
On the Web, browsing and searching categories is a popular method of finding documents. Two well-known category-based search systems are the Yahoo! and DMOZ hierarchies, which are maintained by experts who assign documents to categories. However, manual categorisation by experts is costly, subjective, and not scalable with the increasing volumes of data that must be processed. Several methods have been investigated for effective automatic text categorisation. These include selection of categorisation methods, selection of pre-categorised training samples, use of hierarchies, and selection of document fragments or features. In this paper, we further investigate categorisation into Web hierarchies and the role of hierarchical information in improving categorisation effectiveness. We introduce new strategies to reduce errors in hierarchical categorisation. In particular, we propose novel techniques that shift the assignment into higher level categories when lower level assignment is uncertain. Our results show that absolute error rates can be reduced by over 2%.
Original language | English |
---|---|
Pages | 525-531 |
Number of pages | 7 |
DOIs | |
Publication status | Published - 2002 |
Event | Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) - McLean, VA, United States Duration: 4 Nov 2002 → 9 Nov 2002 |
Conference
Conference | Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) |
---|---|
Country/Territory | United States |
City | McLean, VA |
Period | 4/11/02 → 9/11/02 |
Keywords
- Categorisation
- Error reduction
- Hierarchical categorisation