International Conference «Mathematical and Information Technologies, MIT-2016»

28 August – 5 September 2016

Vrnjacka Banja, Serbia – Budva, Montenegro

Nugumanova A.   Mansurova M.   Alimzhanov Y.   Baiburin Y.  

Using Non-Negative Matrix Factorization for Text Segmentation

Reporter: Nugumanova A.

As applied to topic modeling, non-negative matrix factorization allows executing mapping of documents into the domain. The basis matrix allows to reduce the dimension of initial vector representations of documents, this being actively used for solution of the problems of text classification, clustering and information retrieval. The features matrix allows to evaluate the distribution of words occurring in the collection on topics. Sorting out the elements (words) of each extracted topic by the decrease of weights, one can define the most valuable (weighted) words of the topics. The aim of this work is to study the possibilities of increasing the quality of topic segmentation of documents on account of using such valuable words which we call topic representative.


To reports list

© 1996-2019, Institute of computational technologies of SB RAS, Novosibirsk