Description : A set of Java classes to index, ranking and finding similarities. The package includes an simple interface to quick indexing and ranking.
The classes are based on Apache-lucene and apache-mime frameworks. Weka and Crawler frameworks have been added to the project to be compatible with other ranking methodologies.
Language : Java / Project file : NetBeans.
First edition : 2014-05
Last update : 2015-06-01 Fixed some bugs in the data retrieval section
Version : 0.2
license : AGPL-3.0
Direct link to download :
Version 0.2
datacrawler-0.2.tar.gz
Issues
2016-01-10 : The classes don't work with Lucene 4 anymore
2018-04-05 : The project was moved to Git but this page also will be get update.
https://github.com/tOSuser/DataCrawler