International Conference «Mathematical and Information Technologies, MIT-2016»

28 August – 5 September 2016

Vrnjacka Banja, Serbia – Budva, Montenegro

Barakhnin V.B.   Kozhemyakina O.Y.   Zabaykin A.V.   Pastushkov I.S.  

The algorithm of automated definition of genre type and stylistic coloring of the texts in Russian language

Reporter: Barakhnin V.B.

In the process of automated analysis of natural language texts there is the problem of the definition of their genre type and stylistic colouring. The first stage of solving this problem is the development of appropriate classifiers. For the texts in Russian language there is a division of texts (primarily literary) into relevant to high, neutral and low styles, which are originated to the works of M. V. Lomonosov. Historically, each of them is characterized by the ratio of the usage of old Slavonic (Church Slavonic) and Russian words (in this case we consider separately the group of words common to old Slavonic and Russian languages), the part of archaisms, and the usage of certain syntactic constructions. In turn, in the classical theory a genre of a literary text strictly dictates the choice of style. The classic genres of the lyrics (according to the most completed classification given in the works of D. M. Magomedova) include the system of canonical genres: an Ode, an Elegy, an Idyll, an Epistle (a Message), a Ballad, complemented by non-canonical: a Fragment and a Short story in verses.

However, in practice there are frequent cases when in the text, a genre of which is traditionally associated with a particular style, there is a wide range of lexemes of other styles. We made the original two-dimensional classifier of a genre/style that allows to increase the accuracy of determining of the characteristics of a literary text (especially poetry), used in the further process of the automated analysis.

In turn, the process of classifying of a text to one or another section of the constructed two-dimensional classifier can also be automated. For this purpose, we developed an algorithm of the description of semantic fields associated with various genre and stylistic types of texts.

To reports list

© 1996-2019, Institute of computational technologies of SB RAS, Novosibirsk