Simple and Accurate Feature Selection for Hierarchical Categorisation

Wahyu Catur Wibowo, Hugh E. Williams

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Citations (Scopus)

Abstract

Categorisation of digital documents is useful for organisation and retrieval. While document categories can be a set of unstructured category labels, some document categories are hierarchically structured. This paper investigates automatic hierarchical categorisation and, specifically, the role of features in the development of more effective categorisers. We show that a good hierarchical machine learning-based categoriser can be developed using small numbers of features from pre-categorised training documents. Overall, we show that by using a few terms, categorisation accuracy can be improved substantially: unstructured leaf level categorisation can be improved by up to 8.6%, while top-down hierarchical categorisation accuracy can be improved by up to 12%. In addition, unlike other feature selection models - which typically require different feature selection parameters for categories at different hierarchical levels -our technique works equally well for all categories in a hierarchical structure. We conclude that, in general, more accurate hierarchical categorisation is possible by using our simple feature selection technique.

Original languageEnglish
Title of host publicationProceedings of the 2002 ACM Symposium on Document Engineering
EditorsR. Furuta, J.I. Maletic, E. Munson
PublisherAssociation for Computing Machinery (ACM)
Pages111-118
Number of pages8
ISBN (Print)1581135947, 9781581135947
DOIs
Publication statusPublished - 2002
EventProceedings of the 2002 ACM Symposium on Document Engineering in Conjunction with 11th ACM International Conference on Information and Knowledge Management (CIKM 2002) - Mclean, VA, United States
Duration: 8 Nov 20029 Nov 2002

Publication series

NameProceedings of the 2002 ACM Symposium on Document Engineering

Conference

ConferenceProceedings of the 2002 ACM Symposium on Document Engineering in Conjunction with 11th ACM International Conference on Information and Knowledge Management (CIKM 2002)
Country/TerritoryUnited States
CityMclean, VA
Period8/11/029/11/02

Keywords

  • Categorisation
  • Error Reduction
  • Hierarchical Categorisation
  • Web Hierarchies

Fingerprint

Dive into the research topics of 'Simple and Accurate Feature Selection for Hierarchical Categorisation'. Together they form a unique fingerprint.

Cite this