TY - GEN
T1 - Clustering patent document in the field of ICT (Information & Communication Technology)
AU - Widodo, Agus
AU - Budi, Indra
PY - 2011
Y1 - 2011
N2 - The current classification of patent data that refers to the IPC (International Patent Classification) of the WIPO (World Intellectual Property Organization), deemed not reflect the classification of the field of ICT (Information & Communication Technology). ICT applications are usually included in sections G (Physics) and H (Electricity). This paper will evaluate the eight groupings of patents based on the IPC classes (G01, G06, G09, G11, H01, H03, H04, and H06) of patents registered in the Directorate General of Intellectual Property Rights in Indonesia, from the year 1991 to 2000. The algorithm used to grouping is KMeans, KMeans, Hierchical Clustering, and a combination of these three algorithms with SVD (Singular Value Decomposition). For external validation, Purity and F-Measure are used, whereas Silhouette is used for internal validation. From the experimental results it can be concluded that SVD provides improvements to the clustering results. In addition, the use of abstract does not necessarily improve the performance of clustering, and the use of phrase does not always yield better cluster than the use of the word as index. Moreover, no cluster has purity measure greater than 50%, which means that the existing IPC classification has not been able to accommodate the field of ICT appropriately.
AB - The current classification of patent data that refers to the IPC (International Patent Classification) of the WIPO (World Intellectual Property Organization), deemed not reflect the classification of the field of ICT (Information & Communication Technology). ICT applications are usually included in sections G (Physics) and H (Electricity). This paper will evaluate the eight groupings of patents based on the IPC classes (G01, G06, G09, G11, H01, H03, H04, and H06) of patents registered in the Directorate General of Intellectual Property Rights in Indonesia, from the year 1991 to 2000. The algorithm used to grouping is KMeans, KMeans, Hierchical Clustering, and a combination of these three algorithms with SVD (Singular Value Decomposition). For external validation, Purity and F-Measure are used, whereas Silhouette is used for internal validation. From the experimental results it can be concluded that SVD provides improvements to the clustering results. In addition, the use of abstract does not necessarily improve the performance of clustering, and the use of phrase does not always yield better cluster than the use of the word as index. Moreover, no cluster has purity measure greater than 50%, which means that the existing IPC classification has not been able to accommodate the field of ICT appropriately.
KW - Clustering
KW - Information & Communication Technology
KW - Kmeans
KW - Patent
KW - Singular Value Decomposition
UR - http://www.scopus.com/inward/record.url?scp=80052572914&partnerID=8YFLogxK
U2 - 10.1109/STAIR.2011.5995789
DO - 10.1109/STAIR.2011.5995789
M3 - Conference contribution
AN - SCOPUS:80052572914
SN - 9781612843537
T3 - 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011
SP - 203
EP - 208
BT - 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011
T2 - 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011
Y2 - 28 June 2011 through 29 June 2011
ER -