contact@ijirct.org      

Published In

Volume 2 Issue 3
October-2016

 

Paper Id

IJIRCT1601013

 

Page Number

71-77

 

Authors

  1. K.Gunavathi
  2. M.Manikandan
  3. S.Thilagavathi

 

Share This Article

Paper Details

Title

Concept Based Text Document Clustering

Abstract

A cluster is a collection of data objects that are similar to one another. A cluster of data objects can be treated collectively as one group and so it may be considered as form of data compression. Clustering is also called as data segmentation in some applications because clustering partitions large data sets into groups according to their similarity. Indexing of documents is based on the related or semantically related keywords. Topic based weighting scheme is proposed to index the text. It involves with identifying topic candidates, determine their importance, and detect similar and synonymous topics. The indexing algorithm uses topic frequency to determine their importance and existence of the topics. Concept based weighting scheme is used to index the document, it identifies topic candidates, determine their importance, detect the similar and synonymous topics. In this system the numbers of medical documents are collected, and then the documents are taken for document pre-processing which includes tokenization and stop word removal. Finally compare the topic based weighting scheme with other indexing schemes and prove that topic based indexing reduces the dimensionality of the data which is efficient even for very large databases and provides an understandable description of the discovered clusters by their frequent term sets.

Key Words

Clustering algorithms, Indexing, Topic based weighting scheme, Concept based weighting scheme and Me Sh ontology.

Click here for Article Preview

 

. . .

Citation

K.Gunavathi, M.Manikandan, S.Thilagavathi, "Concept Based Text Document Clustering", IJIRCT, Volume 2, Issue 3, Pages 71-77, October-2016, https://www.ijirct.org/viewPaper.php?paperId=IJIRCT1601013

Download Paper

 

Print This Page

 

Download Counter

49