Paper Title
Improving Efficiency of Similarity of Document Network Using Bisect K-Means
Abstract
A new approach for Identification of document clusters is proposed which describe similar clinical
conditions using Bisect k-means and topic detection. Identification of cases with similar clinical characteristics from
database of clinical documents is a common problem in clinical informatics. The main goal of document clustering is to
identify the document clusters which describe similar clinical cases; also topic identification in those documents
belongs to same clusters. To achieve this goal initially system builds the document network by linking the reports in
VAERS dataset. Then we will apply the clustering algorithm named as bisect k-means clustering algorithm on this
networks to find the similar kind of documents. For evaluation of clustering algorithm, system will use two
performance param- eters such as memory and time. Finally, results proved that the bisect k-means clustering
outperforms k-means algorithm used in previous available system. We will also working on the topic detection procedure
in cluster of documents. This topic detection will ease the understanding of overall contents of the documents belongs to
same clusters.
Index Terms - Bisect k-means clustering, topic detection, doc- ument clusters.