Large Scale Data Clustering Using Various-Widths Clustering Approach

Agashe, Harshal R.; Banait, S. S.

Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/991

Title:	Large Scale Data Clustering Using Various-Widths Clustering Approach
Authors:	Agashe, Harshal R. Banait, S. S.
Keywords:	Clustering, k-Nearest Neighbor, Tree Index, large scale data, Map Reduce
Issue Date:	Jan-2017
Publisher:	International Journal for Scientific Research & Development
Abstract:	To perform a clustering widely used and most powerful technique is k-nearest neighbor. This approach required large computational cost for high dimensional datasets. The proposed work focuses on k-NN is based on various clustering widths on large scale data. We are proposing modified kNN approach with MapReduce parallel computing algorithm and clusters grouping with goal of improving the performance in terms of clustering time, preprocessing costs and querying cost while working with high dimensional data. First we are presenting the kNN method using various width clustering to efficiently extract the kNNs for input query object from the dataset. The given dataset is clustered using global width then each cluster that satisfies its predefined criteria i.e threshold value is recursively clustered using their local width. To prune unlikely clusters triangle inequality was used earlier, but we designed tree based approach in which centers of clusters grouped into the tree based index to maximize the more clusters pruning. To reduce the processing time and clustering time, we designed parallel computing algorithm based on MapReduce.
URI:	http://192.168.3.232:8080/jspui/handle/123456789/991
ISSN:	2321-0613
Appears in Collections:	PG - Students

Files in This Item:

File	Description	Size	Format
IJSRDV5I10324.pdf	Large Scale Data Clustering Using Various-Widths Clustering Approach	330.08 kB	Adobe PDF	View/Open

Show full item record