Strategies for minimising errors in hierarchical web categorisation

Wahyu Catur Wibowo, Hugh E. Williams

Research output: Contribution to conferencePaperpeer-review

11 Citations (Scopus)

Abstract

On the Web, browsing and searching categories is a popular method of finding documents. Two well-known category-based search systems are the Yahoo! and DMOZ hierarchies, which are maintained by experts who assign documents to categories. However, manual categorisation by experts is costly, subjective, and not scalable with the increasing volumes of data that must be processed. Several methods have been investigated for effective automatic text categorisation. These include selection of categorisation methods, selection of pre-categorised training samples, use of hierarchies, and selection of document fragments or features. In this paper, we further investigate categorisation into Web hierarchies and the role of hierarchical information in improving categorisation effectiveness. We introduce new strategies to reduce errors in hierarchical categorisation. In particular, we propose novel techniques that shift the assignment into higher level categories when lower level assignment is uncertain. Our results show that absolute error rates can be reduced by over 2%.

Original languageEnglish
Pages525-531
Number of pages7
Publication statusPublished - 1 Dec 2002
EventProceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) - McLean, VA, United States
Duration: 4 Nov 20029 Nov 2002

Conference

ConferenceProceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002)
CountryUnited States
CityMcLean, VA
Period4/11/029/11/02

Keywords

  • Categorisation
  • Error reduction
  • Hierarchical categorisation

Fingerprint Dive into the research topics of 'Strategies for minimising errors in hierarchical web categorisation'. Together they form a unique fingerprint.

Cite this