present paper attempts to define the philological preconditions for the digital
processing of texts written in Middle Bulgarian with the help of software
applicable for other recensions of Church Slavonic. With the Slavonic Dioptra
as an example, required adaptations on the levels of graphetics, graphematics,
and morphology are discussed.
The Dioptra is a voluminous Greek
didactic poem composed as a dialogue of body and soul, which was translated
into Middle Bulgarian Church Slavonic around the middle of the fourteenth
century. As was first noted by Franz von Miklosich, it contains an abundance of
remarkable lexical material, which until now has not been analysed
conclusively. Therefore, the bilingual critical edition being currently prepared
at Vienna University shall be completed by a dictionary eventually disclosing
the lexicon of the poem. In view of its considerable length—the Dioptra consists of approx.
62.000 words—a largely automated lemmatisation appears highly desirable. This
requires a device for approximate string matching directly applicable to Middle
Bulgarian texts, which, as to my knowledge, for now does not exist. The
present paper lists the deviations of the Dioptra from Old Church Slavonic
relevant to the automated processing of the text. Its goal is to outline from a
philological point of view the prerequisites for an adaptation of approximate
string matching techniques developed for other variants of Slavonic
to the Dioptra. At that, OCS is unquestionably a more natural point of
reference than Old Russian. The results can be expected to be applicable for
other Middle Bulgarian texts as well.
Our edition relies on the L’viv manuscript of the
Dioptra (LNB NAN imeni Stefanyka MV-418), as this is the only completely
preserved Middle Bulgarian testimony of the poem. First of all, in order to
allow fuzzy string matching, the software processing the text should be capable
of abstracting from certain graphic peculiarities of the ms represented in the
print version. Thus, the 12 letters (out of a total of 51 used in our edition)
representing positional or arbitrary allographs should be assigned to the superordinate
graphemes (2 and ¬ to е; s to ł; ∙ and ¶ to и; w, 3, and 5 to о; У and ? to №; û to ¥; and v to y.). Additionally, the lemmata in
the dictionary should appear in a corresponding “abstract” form, relieving the
reader of some time-consuming guesswork. Of course, the actual spelling is to
be preserved in the single entries listed under the respective headwords.
I do not expect the operations necessary for a simplification of that
kind to cause much trouble. By contrast, the frequent alternations of graphemes
resulting from phonetic shift can be assumed to pose a much bigger challenge
both to computational scientists entrusted with the task of adapting existing
software to the requirements of Middle Bulgarian, and to philologists
processing the data thus gained. I examined the spelling principles of the
Middle Bulgarian Dioptra mss in a recent paper in detail; therefore, I shall only give a brief overview here.
Following graphematic alternations appear regularly in the L’viv ms of
the Dioptra (and, of course, in many other Middle Bulgarian mss):
ł ~ з / з ~ ł: only a few cases contradict the etymological spelling; most of these deviations
seem to be lexicalised (e.g., the adjective полезн¥и, is always spelt with з, the noun полłа, by contrast, unexceptionally with ł).
л ~ ø epenthetic
l is comparatively frequently omitted.
ъ ~ ø (/ ü / о): weak ъ may be skipped, but is usually preserved in spelling;
it is hardly ever replaced by ü; о-vocalism occurs only in a few words (любовü, начтокъ) and seems to be lexicalised.
~ и: both are
only exceptionally mistaken for one another; a few cases of regular,
lexicalised commutation occur (нинэ, посилати).
ь ~ ø / е / ъ: weak ü may be skipped or replaced by ъ, but is usually preserved; strong historic ü appears as е, weak ü vocalised in order to split consonant clusters either
as ü or ъ.
э ~ я: a complementary distribution prevails; э is used after soft consonants, я at the word onset and at morpheme boundaries; after
vowels only а appears.
~ ©: as a rule, the choice of one of the nasal graphemes is
influenced, but not strictly determined, by the quality of the preceding sound;
is preferred at the word onset,
after soft consonants and forward vowels, © after hard consonants with a more ambiguous distribution
after sibilants and non-forward vowels.
In general, the spelling of the L’viv ms of the Dioptra seems to be
fairly consistent and highly lexicalised. Words
deviating from a presupposed OCS standard are likely to be spelt in the same
way in other occurrences as well
—though the total
number of possible variations is rather high, only a limited set is realised.
This can, once appropriate parameters were defined, be expected to facilitate approximate
string matching significantly.
A pivotal point in the automated processing of a text is evidently the
correct assignment of inflexion forms. In the following, I give an overview of
the desinences present in the Dioptra which do not or not regularly occur in
OCS (merely graphematic phenomena covered above are not quoted expressly; e.g. землэ = nom. sg. fem. ja-stem). For comparison I used [Diels,
1963]. Most of these endings are all but uncommon
in Middle Bulgarian; not a few occur even sporadically in OCS (those mentioned
by Diels are given in italics).
-а nom. sg. fem. and masc. former ī-stems,
which were adopted to the ja-stem-paradigm (млъниа, с©диа)
-е nom. sg. masc. jo-stems: proper names
ending in -ιος in Greek (e.g. григорие)
nom. sg. neutr. of the short form of the part. praet.
act. (и дрэво е
ветхо же и изгнивъше; according to [Diels, 1963: 242], also attested in Supr.)
acc. sg. of r-st. (матере, дъωере; according to [Diels, 1963: 178], also in
Sav. and Supr.)
pl. of some masc. jo-stems (коне, коваче, прэлþбодэе)
-еве nom. pl. of monosyllabic masc. jo-stems
(rare! e.g. врачеве, плачеве, краеве; cf. [Diels, 1963: 159])
-еи loc. fem. long form of soft adjectives
(rare! въ послэднеи старости; въ прочеи твари)
gen. pl. of masc. jo-stems (e.g. м©жеи; cf. [Diels, 1963: 159])
-емъ loc. sg. masc./neutr. of the long form of
soft adjectives, comparatives, and part. praes./praet. act. (въ насто©ωемъ житии)
-ехъ loc. pl. of masc./neutr. jo-stems (въ агньцехъ)
-ие nom. pl. masc. of jo-stems, especially of those ending
in -tel’, -ar’, and soft monosyllabic roots (e. g. родителие, р¥барие, царие, м©жие)
-ии gen. pl. of masc. jo-stems (м©жии)
-м¥ 1. pers. pl. of the athemat. verbs (есм¥, вэм¥, имам¥, дам¥; according to [Ivanova-Mirčeva and Charalampiev, 1999: 134], this ending is already attested in OCS
-ове nom. pl. of monosyllabic masc.
o-stems (e.g. родове; cf. [Diels, 1963: 156])
-омоy dat. sg. masc./neutr. of the long form of
hard adjectives, comparatives, part. praes. act., praes. pass., praet. act.,
praet. pass. (e. g. богатэ©ωомоy)
-омъ instr. sg. and dat. pl. of neutr. jo-stems ending in -ie in nom. sg. (искоyшениомъ зъмииноN и зависти диаволе); rarely also of masc. with a stem ending
in a vowel (after the loss of intervocalic j; e.g. къ садоyкеомъ, къ иоyдеомъ)
loc. sg. masc./neutr. of the long form of hard adjectives (въ четврътомъ словэ) and the part. praes./praet. pass. (въ ... насажденомъ раи)
-охъ loc. pl. of masc./neutr. o-stems (въ нэдрохъ; masc. already in OCS, cf. [Diels, 1963: 157])
pl. of masc. jo-stems (оyч·телми; according to [Diels, 1963: 157], -ъми is
attested with OCS o-stems)
pl. of the neutr. jo-stems ending in -ie in nom. sg. (wUвэωанми)
(!) sg. masc./neutr. of hard adjectives (съ шоyмомъ велицэмъ; otherwise also as regular loc. form)
- nom./acc. pl. of r-stems (дъωер)
acc. pl. of masc. n-stems (степен)
-©(и) nom. sg. masc. short
(long) form of the part. praes. act. replacing -¥(и)
Many of these morphological innovations,
which affected almost exclusively the nominal and adjectival inflexion, were
caused by inter-paradigmatic equalisation. Therefore, most of the respective
desinences should be readily identifiable for software applicable to OCS as
they appear in an either identical or similar form in at least one other
paradigm (e.g. -üми in the ĭ-stems, -ове in
the former ŭ-stems). On the other hand, intra-paradigmatic
neutralisation (as in -эмъ for the instr. sg. of masculine and neuter adjectives)
is not common enough to seriously aggravate the problem of homonymy, which can
be expected to leave the editor with a lot of manual work anyway.
in all, despite the loss of the casus in the contemporary vernacular, in respect to
morphology the Dioptra preserved an artificial standard close to OCS. Therefore
a digital processing of the poem does not seem less promising than the
processing of OCS or Old Russian texts.
The letter 2 is preferred after vowels, at the word onset, and at the
end of lines, but may occur in any position; ¬ appears only in ¬T΅ (= ¬стъ) and, occasionally, in ¬„ωе.
The letter ∙ is frequently, yet not obligatorily, used in front of
vowels, but may appear in any position; ¶ is restricted to Greek loanwords (¶„нд∙ктиwн, ¶„2реи) and names of Greek or Hebrew origin (¶„ппократъ, ¶„2„зек∙илъ).
Both w and 3 may appear in any position; w is clearly preferred at the
word onset; 5 is notoriously restricted to the word oko.
Digraphic № is by far most common, but may be replaced by У in any position; ? (an v set above an о) occurs only exceptionally.