editor@ijirct.org        

Published in:

Volume 2 Issue 3
October-2016
eISSN: 2454-5988

 

Unique Identifier

 

IJIRCT1601013


 

Page Number

 
71-77

 

 

Share This Article

 

 


Paper Details

Title

Concept Based Text Document Clustering


Abstract

A cluster is a collection of data objects that are similar to one another. A cluster of data objects can be treated collectively as one group and so it may be considered as form of data compression. Clustering is also called as data segmentation in some applications because clustering partitions large data sets into groups according to their similarity. Indexing of documents is based on the related or semantically related keywords. Topic based weighting scheme is proposed to index the text. It involves with identifying topic candidates, determine their importance, and detect similar and synonymous topics. The indexing algorithm uses topic frequency to determine their importance and existence of the topics. Concept based weighting scheme is used to index the document, it identifies topic candidates, determine their importance, detect the similar and synonymous topics. In this system the numbers of medical documents are collected, and then the documents are taken for document pre-processing which includes tokenization and stop word removal. Finally compare the topic based weighting scheme with other indexing schemes and prove that topic based indexing reduces the dimensionality of the data which is efficient even for very large databases and provides an understandable description of the discovered clusters by their frequent term sets.


Key Words

Clustering algorithms, Indexing, Topic based weighting scheme, Concept based weighting scheme and Me Sh ontology.

 

Click here for Article Preview

It appears you don't have Adobe Reader or PDF support in this web browser. Click here for view PDF

Download Paper

 

Print This Page

 

Download Citations

 

Download Counter

0010