Bulgarian National Corpus - Сообщество "Письменное наследие"

Select

EnglishRussianBulgarianLithuanian

El'Manuscript-14

Организационный комитет

Programos komitetas

Konferencijos darbo kryptys

Mokyklos veiklos kryptys ir temos

Pagrindinės datos

Registracija ir taikymas

Dalyvio mokestis

Участники конференции

Программа конференции

Konferencijos medžiaga

Программа школы

Mokyklos medžiaga

Organizacinė Infromacija

Kultūrinė programa

Фотогалерея

We have 1 guest online

RSS-ленты новостей

Portalo kūrimą rėmė Rusijos humanitarinių mokslų fondas, projektas Nr. 07-04-12140в.

(c) "Informacinės technologijos ir rašytinis palikimas", 2008-2020

Bulgarian National Corpus

PDF

Print

E-mail

Written by: Светла Коева
Четверг, 07 Август 2014
Лекция We will discuss several key concepts related to the development of corpora and reconsider them in light of recent developments in Natural Language Processing. We propose a data-driven approach to corpus design, which integrates the best practices of traditional corpus linguistics with the potential of the latest technologies allowing fast collection, automatic metadata description and annotation of large amounts of data. We will illustrate this concept with a description of the compilation, structuring, documentation, and annotation (morphosyntactic tagging, lemmatisation, word-sense annotation, annotation of noun phrases and named entities) of the Bulgarian National Corpus (http://ibl.bas.bg/en/BGNC_access_en.htm; http://ibl.bas.bg/en/BGNC_en.htm; http://search.dcl.bas.bg/). We will conclude with a brief evaluation of the quality of the corpus and an outline of its applications in Natural Language Processing and linguistic research.

< Prev		Next >