Asian Journal of Information Technology

Year: 2011
Volume: 10
Issue: 8
Page No. 341 - 347

The Use of Hartigan Index for Initializing K-Means++ in Detecting Similar Texts of Clustered Documents as a Plagiarism Indicator

Authors : Diana Purwitasari, I. Wayan Surya Priantara , Putu Yuwono Kusmawan , Umi Laili Yuhana and Daniel Oranova Siahaan

Abstract: Plagiarism is increasingly alarming, especially if this happens in the field of education. Many writing works in which a part of the content is written by plagiarizing other people’s works. Similar sentence detection as a plagiarism indicator can be conducted by using n-gram based hashing algorithm of Winnowing algorithm. The function of Winnowing is to generate document fingerprint which convert texts within document into a collection of hash values. Similar fingerprint between documents shows that there are similar texts as a plagiarism indicator. Plagiarizing usually happens on documents having similar topics. Therefore, to detect plagiarism, documents having similar topics should be clustered. K-means++ is a clustering algorithm that requires cluster number as its input through recommendation conducted by Hartigan index to give a recommendation for the cluster number. After clustering documents, a comparison was made between document fingerprint and fingerprint cluster instead of between documents. Then, the comparison was made for documents which become members of the closest cluster that had been selected from the first comparison.

How to cite this article:

Diana Purwitasari, I. Wayan Surya Priantara , Putu Yuwono Kusmawan , Umi Laili Yuhana and Daniel Oranova Siahaan , 2011. The Use of Hartigan Index for Initializing K-Means++ in Detecting Similar Texts of Clustered Documents as a Plagiarism Indicator. Asian Journal of Information Technology, 10: 341-347.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved