ka | en

Authorisation

Georgian document processing software tools

Author: vazha gujaraidze
Co-authors: Bitchiko Tchelidze, Zaza Papunashvili, Daviti Gaboshvili, Tamazi kvizhinadze, Archil Euashvili, Giorgi Muradashvili
Keywords: Classification of texts, lematization, stop words
Annotation:

In the presented work the document method for classification process is provided. Information retrieval process does not represent the outcome of only one type of operation. Its success and relevance is based on retrieval cycle recall and adequacy. One of the important parts of this cycle is the stage of classification- which represents the initial stage of text retrieval. The methods on natural language processing, along their main stages and significant properties, for information retrieval task is described as well. The Natural Language Processing (NLP) contains the knowledge acquisition based on syntax and semantics of the provided natural language text. Such an approach can be considered as “semantic” based on logic that the content and structure of document is defined using non-statistical methods. Nowadays the integration of statistic/probabilistic models with syntactic/semantical models are considered to be the best solution for increasing the effectiveness of the retrieval process. In the work provided the initial processing processes of the text, which are necessary for the initial stage of classification. In the work provided we considered natural language processing methods for classification task – namely the task of concept pattern based information retrieval. The problem of classification can be considered as a union of machine learning and IR methods.