Journal of Engineering and Applied Sciences

Year: 2012
Volume: 7
Issue: 4
Page No. 342 - 347

Study of Ontology or Thesaurus Based Document Clustering and Information Retrieval

Authors : G. Bharathi and D. Venkatesan

Abstract: Document clustering generate clusters from the whole document collection automatically and is used in many fields including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most important ones. These characteristics of text data require clustering techniques to be scalable to large and high dimensional data and able to handle sparsity and semantics. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem such a bag of original words cannot represent the content of a document precisely. Most of the existing text clustering methods use clustering techniques which depend only on term strength and document frequency where single terms are used as features for representing the documents and they are treated independently which can be easily applied to non-ontological clustering. To overcome these issues, this study makes a survey of recent research done on ontology or thesaurus based document clustering.

How to cite this article:

G. Bharathi and D. Venkatesan, 2012. Study of Ontology or Thesaurus Based Document Clustering and Information Retrieval. Journal of Engineering and Applied Sciences, 7: 342-347.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved