Abstract: Mining in news blog remarkably a new research area in this modern world of the technological era. Here, we propose a feature word selection of clustering online news comments using Hadoop on bigdata which realizes structurally superior clustering of online comments. Data is made to run on Hadoop platform, so as to convert the unstructured data from the news comments to a structured format for further classification. Here a Naive Bayesian classifier is included right before applying the k-means clustering algorithm. For clustering, the top most frequent nouns appearing across online comments are selected to construct an overall noun set. Local noun sets are constructed based on the frequently occurring nouns. The global noun set is the intersection of the local and overall noun set. The global noun set is reduced from the corresponding local noun set to construct the distinct noun set.
Anu Sunil Kumar, Remya Anand and G. Deepa, 2018. Clustering Online News Comments Using Hadoop on Bigdata. Journal of Engineering and Applied Sciences, 13: 5226-5229.