El'Manuscript-14
Conference
Organizing Committee
Programme Committee
Conference topics
Workshops & Tutorials
Important Dates
Registration & Submission
Registration fee
List of participants
Conference program
Conference Abstracts
Workshops program
Workshops Abstracts
Contact
Leisure & Tourism
Photos





Lost Password?
No account yet? Register
We have 4 guests online
RSS-feed
rss20.gif

The project is supported by the Russian Foundation for Basic Research, project #07-04-12140в

(c) "Information Technologies and Textual Heritage", 2008-2020

Об одном методе автоматической грамматической разметки старопечатных текстов PDF Print E-mail
Written by: Артем Викторович Андреев   
Воскресенье, 07 Сентябрь 2014
A method is proposed for unsupervised morphosyntactic markup of old texts for which no exact grammar nor vocabulary may be known. The method employs building all possible mappings from text forms into grammemes and then reducing them using a loose context-free (CF) grammar. The forms are further lemmatized based on minimization of morphologic variation. The method has been tested on two old Lithuanian documents from the late 16th century by M. Dauksha and has proven to be rather efficient and accurate (up to 80 %). icon andreev_elmanuscript2014 (594.8 kB)
 
< Prev   Next >