International Conference «Mathematical and Information Technologies, MIT-2016»

Login:
Password:

Mansurova M. Barakhnin V.B. Khibatkhanuly E. Aubakirov S. Musina A.

Parallel Text Document Clustering based on Genetic Algorithm

Reporter: Barakhnin V.B.

This work describes parallel implementation of algorithm FRIS-Tax for clustering of a corpus of documents. The algorithm is based on evaluated of the similarity between objects in a competitive situation, which leads to the notion of the competitive similarity function. To determine the similarity measure, the attributes of bibliographic description of documents were chosen. The time of FRIS-Tax operation increases exponentially with the increase in the amount of articles. In this relation, to speed up the work at two stages of the algorithm, technologies of parallel computations were used. First, when choosing individuals in a genetic algorithm. The parallel genetic algorithm is implemented on high performance platform MPJ Express. Secondly, during direct implementation of the clustering algorithm. The loading test revealed two slowest stages in FRIS-Tax algorithm. They appeared to be finding of the first pillar and finding of the next pillar. To speed up these stages, the technology Streams JAVA 8 was used. For monitoring of the algorithm implementation, we developed a web interface which allows observing the current values of genetic parameters and achieved values of the fitness – function. The work presents quantitative values of the process execution time demonstrating the advantage of parallel implementation of the algorithm.

To reports list