Paper
16 January 2006 Document clustering: applications in a collaborative digital library
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, Hassan Alam
Author Affiliations +
Proceedings Volume 6067, Document Recognition and Retrieval XIII; 60670K (2006) https://doi.org/10.1117/12.650161
Event: Electronic Imaging 2006, 2006, San Jose, California, United States
Abstract
This paper introduces a document clustering method within a commercial document repository, FileShare(R). FileShare(R) is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft(R) Internet Explorer(R), Netscape(R) or Opera(R)) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare(R) repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, and Hassan Alam "Document clustering: applications in a collaborative digital library", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670K (16 January 2006); https://doi.org/10.1117/12.650161
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Digital libraries

Databases

Distance measurement

Internet

Genetic algorithms

Human-machine interfaces

Visualization

RELATED CONTENT

Data processing and control system software for SPM
Proceedings of SPIE (July 16 2002)
Intelligent web agents for a 3D virtual community
Proceedings of SPIE (August 04 2003)
MetaSEEk: a content-based metasearch engine for images
Proceedings of SPIE (December 23 1997)
The LINC-NIRVANA common software
Proceedings of SPIE (June 27 2006)
Dynamic neighborhoods: browsing the World Wide Web together
Proceedings of SPIE (September 16 1998)

Back to Top