LE-PAROLE

WP1.1

Finnish Lexicon Documentation

* * *

Document first version date
19-Feb-97


Document date
2-Jun-98


Document ID
P-WP3.6-WP-HEL-1


Version
08


Doc. type



Document status
to be validated


Validation type



Comments








Name
Organisation
Purpose




From
Anu Airola
HEL













To




ALL PAROLE Partners
....
Validation












1. General Design Information

Construction of the Lexicon

The words in the Finnish lexicon are selected from three sources. First, all the simple nouns and all the proper nouns in the Frequency Dictionary of Finnish (Saukkonen, Haipus, Niemikorpi & Sulkala 1979) are included in the Finnish lexicon. The second source is the FINTAG-corpus compiled of texts representing different text types (for example academic prose, textbooks, newspapers, and magazines). The size of this corpus is about 1,3 million running words. The criteria for including a word in the lecixon was a frequency of no less than 4 for nouns and a frequency of no less than 2 for verbs and adjectives in the above mentioned corpus. The third source is the sample of the hs90-corpus containing about 1,0 million running words from the newspaper Helsingin Sanomat 1990.

2. Current Lexicon Contents

2.1 Morphological layer

2.1.1 Summary of the morphological information

Number of simple morphological units

25421
Number of compound morphological units

Number of affix morphological units
60
Number of agglutinated morphological units

Number of graphical morphological units
25421
Number of simple inflection modes

Number of simple compound inflection modes

Category

Subcategory
Number of Units
Noun
common
17785
Noun
proper
746
Verb
normal
2941
Verb
impersonal
88
Adjective
quali
2931
Adjective
pronominal
23
Adjective
noninflecting
11
Adjective
ordinal
10
Adverb

569
Ad-adjective

42
Adposition
postposition
151
Adposition
preposition
44
Pronoun
personal
6
Pronoun
demonstrative
6
Pronoun
reflexive
1
Pronoun
relative
2
Pronoun
interrogative
5
Pronoun
indefinite
17
Numeral
cardinal
24
Conjunction
coordinative
8
Conjunction
subordinative
11
2.1.2 General remarks

In the Finnish lexicon the lemmatised form for nominals is nominative singular, or, if the word does not have this particular form, the lemmatised form is one of the existing word forms (for example nominative plural for pluralia tantum). For verbs the lemmatised form is 1st infinitive, or when this does not exist, the lemmatised form is 3rd person singular in active indicative present tense. For adverbs and adpositions inflected in locative cases there is one Morphological Unit for every different word form. The criteria for splitting morphlogical units is considered to be in line with the GENELEX principles.

There are some non-standardized categories, features and feature values added in the morphological layer in order to make the model more suitable for the Finnish languge.

Among these are two minor grammatical categories postulated for the Finnish lexicon, namely ad-adjectives and post-adverbs. Ad-adjectives are adverbs used to modify adjectives and other adverbs. Ad-adjectives never modify a verb, which is considered an argument to separate them from other adverbs and to postulate a new grammatical category 'ad-adjective'. Post-adverbs are adverbs that require a nominal complement. In this regard they behave like adpositions, but the difference is that a post-adverb does not determine the case of its complement like an adposition does.

Adjectives with no inflection are classified as a separate grammatical subcategory, i.e. NONINFLECTING. They never display the predicative function, and their position in the syntagma is immediately before the headword and after other noun modifiers that agree with the headword in case and number. (Vilkuna 1996.) Also pronouns that function like adjectives, i.e. modify nouns and agree with their headword in the normal way, are considered to form a separate grammatical subcategory, PRONOMINAL.

The Finnish case system contains four grammatical cases (i.e. NOMINATIVE, PARTITIVE, GENITIVE, and ACCUSATIVE), six locative cases, which are structured according to two dimensions, i.e. location and direction (see the table below), two abstract locative cases (ESSIVE and TRANSLATIVE) and finally the so called 'marginal cases' (ABESSIVE, COMITATIVE and INSTRUCTIVE). (See for example Karlsson 1987 and Vilkuna 1996.)

The system of the Finnish local cases according to Karlsson (1987:99):



LOCATION



INSIDE
OUTSIDE

STATIC
inessive
adessive
DIREC-
AWAY FROM
elative
ablative
TION
TOWARDS
illative
allative

An additional personal ending needed in the Finnish personal inflection is the fourth person indefinite, which is a special personal ending for the passive verb forms referring to personal but indefinite actor(s) of the process or action described by the verb.

In marking possession in Finnish the word signifying what is possessed also takes an ending. The possessive suffixes in the 3rd person singular and in the 3rd person plural are the same, and the (additional) value SGPL3 for the morphological feature POSSESSOR is thus meant to indicate the fact that a word form containing the possessive suffix in question is ambiguous between the two possible interpretations. (See e.g. Karlsson 1996.)

The morphological features TENSE and MOOD are classified as one distributional class in Finnish morphology, i.e. all the sequences standing for some possible combination of tense and mood are always indicated by only one surface morph. Although the functional endings of the infinitives and participles are in complementary distribution with the morphemes indicating tense and mood, they are considered to form a separate morphological feature (called NONFINITE here). Non-finite forms can take a case ending and a possessive suffix, and participles are also inflected for number. (Karlsson 1987.)

2.1.3 Inflection modes

2.1.3.1 General principles

In the Finnish lexicon the information needed to generate the different word forms (i.e. about 6000 word forms for a noun, and about 12000 word forms for a verb) is included in the different kinds of stems a word can have and in the different kinds of attributes that are linked to the stems. A noun can have either 5 or 6 stems, an adjective 7 or 8 stems, and a verb either 6, 7 or 8 stems.

In the same way every single variant of an inflectional ending with the corresponding attributes is listed in the lexicon (for example the illative ending has no less than 42 different allomorphs).

The idea is basically that an ending can be matched with a word stem if, on the one hand, one can find exactly the same attributes in the element Radg describing the behavior of a stem and in the Radg describing the behavior of an ending and if, on the other hand, the set of attributes have exactly the same values. In addition some attributes are needed in linking different suffixes.

Below are the lists containing the attributes needed to describe the Finnish inflectional morphology in the LE-PAROLE lexicon model. All the possible combinations of the word stems for the main parts of speech are also stated in the tables below. All frequencies of the combinations are given in the table at 2.1.3.6 (Frequencies of the different combinations of stems). One example of the inflection of the nominals is also given in the appendix.

The most frequently used inflected forms of pronouns are listed in the lexicon according to the common LE-PAROLE lexicon model for encoding morphological information.

The nouns, verbs, and adjectives in the lexicon are also marked with a code indicating the inflectional category the word belongs to. The categories used are those defined in "The Basic Dictionary of Finnish" ("Suomen kielen perussanakirja"). We have used the codes referring to the different inflectional categories as id-numbers, which identify the different MFGs in the SGML-encoded lexicon. In some instances there may be the symbol "DEFAULTMF" instead of the specific code for an inflectional category.

2.1.3.2 Attributes and Combination of stems for nouns and adjectives

Attributes needed for adding suffixes to nominal stems

ATTRIBUTES

VALUES
stemtype

BASE
base form
SSG
strong vowel stem
WSG
weak vowel stem
SPL
strong plural stem
WPL
weak plural stem
CON
consonant stem
dstem

PVE
positive stem
CVE
comparative stem
SVE
superlative stem
back
does the stem contain back or front vowels
has2v
is there a long vowel in the end of the vowel stem
endingv
is there a vowel in the end of the stem in question
basei
is there the vowel 'i' in the end of the base form
syll1p3
is the word 2-syllabic or not
syll2p3
is the word monosyllabic or not
contexte_var
this attribute is used mainly for restricting the application of certain rules
vq
vowel quality
stemc
does the word have a consonant stem
ending2v
is there two different vowels in the end of the vowel stem

Attributes needed for linking different suffixes

px

This attribute tells if a case ending must/must not/can be followed by one of the possessive suffixes.
px3
This attribute tells if a certain type of case endings and a certain type of possessive suffixes match.
vqending
Vowel quality linked to the suffixes.

2.1.3.3 Nouns

Stemtypes and their attributes

stemtype=Base

back

endingv

has2v

basei

vq

syll2p3
stemtype=SSg
back

has2v

stemc

vq

syll2p3
stemtype=WSg
back

syll1p3
stemtype=SPl
back

endingv

ending2v

has2v

basei

syll1p3
stemtype=WPl
back

endingv

has2v

syll1p3
stemtype=Con
back

Different combinations of noun stems

Nouns with 5 stems

Stemtype

Examples
Base
kauppa (shop)
kivi (stone)
SSg
kauppa/na
kive/nä
WSg
kaupa/n
kive/n
SPl
kauppo/ina
kiv/inä
WPl
kaupo/issa
kiv/issä

Nouns with 6 stems

Stemtype

Examples
Base
vene (boat)
ajatus (thought)
käsi (hand)
SSg
venee/nä
ajatukse/na
käte/nä
WSg
venee/n
ajatukse/n
käde/n
SPl
vene/inä
ajatuks/ina
käs/inä
WPl
vene/issä
ajatuks/issa
käs/issä
Con
venet/tä
ajatus/ta
kät/tä

2.1.3.4 Adjectives

Stemtypes and their attributes

dstem=PVE

back
stemtype=BASE
endingv

has2v

basei

vq

syll2p3


dstem=PVE
back
stemtype=SSG
has2v

stemc

vq

syll2p3


dstem=PVE
back
stemtype=WSG
syll1p3


dstem=PVE
back
stemtype=SPL
endingv

ending2v

has2v

basei

syll1p3


dstem=PVE
back
stemtype=WPL
endingv

has2v

syll1p3


dstem=PVE
back
stemtype=CON



dstem=CVE
back


dstem=SVE
back

Different combinations of adjective stems

Adjectives with 7 stems

Dstem

Stemtype
Examples

PVE
BASE
korkea (high)
vapaa (free)
PVE
SSG
korkea/na
vapaa/na
PVE
WSG
korkea/n
vapaa/n
PVE
SPL
korke/ina
vapa/ina
PVE
WPL
korke/issa
vapa/issa
CVE

korkea/mpi
vapaa/mpi
SVE

korke/in
vapa/in

Adjectives with 8 stems

Dstem

Stemtype
Examples

PVE
BASE
kirkas (bright)
lämmin (warm)
PVE
SSG
kirkkaa/na
lämpimä/nä
PVE
WSG
kirkkaa/n
lämpimä/n
PVE
SPL
kirkka/ina
lämpim/inä
PVE
WPL
kirkka/issa
lämpim/issä
PVE
CON
kirkas/ta
lämmin/tä
CVE

kirkkaa/mpi
lämpimä/mpi
SVE

kirkka/in
lämpim/in

2.1.3.5 Verbs

Attributes

Attributes needed for adding suffixes to verb stems

vstem

SSGVST
strong vowel stem

WSGVST
weak vowel stem

SSGPAST
strong vowel stem: past tense

WSGPAST
weak vowel stem: past tense

SSGCOND
conditional stem

CONVST
consonant stem

PASS
passive stem

CONPOTN
potential stem

SSGINF
infinitive stem
back
does the stem contain back or front vowels
endingv
is there a vowel in the end of the stem in question
has2v
is there a long vowel in the end of the vowel stem
stemc
does the verb have a consonant stem
stempotn
does the verb have a potential stem
steminf
does the verb have an infinitive stem
vq
vowel quality
cq
consonant quality

Attributes needed for linking different suffixes

gradation

weak/strong/strongpast
vqending
Vowel quality linked to the suffixes.

Stemtypes and their attributes

vstem=SSGVST

back

endingv

has2v

stemc

stempotn

steminf

vq
vstem=WSGVST
back
vstem=SSGPAST
back
vstem=WSGPAST
back
vstem=SSGCOND
back
vstem=SSGINF
back
vstem=PASS
back

endingv

has2v

cq
vstem=CONVST
back

cq
vstem=CONPOTN
back

cq

Different combinations of verb stems

Verbs with 6 stems

Vstem

Examples

SSGVST
muista/vat (remember)
juo/vat (drink)
WSGVST
muista/n
juo/n
SSGPAST
muist/ivat
jo/ivat
WSGPAST
muist/in
jo/in
SSGCOND
muista/isin
jo/isin
PASS
muiste
juo

Verbs with 7 stems

Vstem

Example
SSGVST
luke/vat (read)
WSGVST
lue/n
SSGPAST
luk/ivat
WSGPAST
lu/in
SSGCOND
luk/isin
PASS
lue
SSGINF
luki

Verbs with 8 stems

Vstem

Example
SSGVST
valitse/vat (choose)
WSGVST
valitse/n
SSGPAST
valits/ivat
WSGPAST
valits/in
SSGCOND
valits/isin
CONVST
valit/koon
PASS
valit
CONPOTN
valin

2.1.3.6 Frequencies of the different combinations of stems

Number of

Category
stems
Noun (common)
Adjective (qualificative)
Verb (normal)
5
10305
-
-
6
7480
-
2204
7
-
92
187
8
-
4003
946

2.2 Syntactic layer

General

In the Finnish lexicon the main syntactic information encoded for verbs contains firstly the possible type of the object a verb can take, if any, and secondly, for intransitive verbs, the property of a verb to select a specific case and its possibility to take a clausal subject. We also have a category for intransitive verbs which can occur in existential constructions, i.e. for verbs which can take a subject in the partitive. In addition, there are some constructions for different mental and communicative verbs, which typically take a complement indicating RECIPIENT.

In addition to intransitive and transitive verbs, there is a third category for verbs in the lexicon, a category for impersonal verbs. By `impersonal' we refer to verbs, or uses of certain verbs, which lack a person contrast.

The syntactic encoding of nouns concerns the possible post-nominal complements a noun can take, i.e. different phrasal complements inflected in some local case, as well as infinitive constructions and clauses governed by a noun.

Adjectives in the lexicon are categorized according to three types of complements, which are phrasal complements inflected a) in the genitive and b) in some local cases. Thirdly there is a subgroup of adjectives which can be modified by an infinitival complement.

Prepositions and postpositions are categorized according to the form of the complement they occur with, i.e. a phrasal complement either in the genitive or in the partitive, or, in some rare cases, in some local case.

The so called ad-adjectives form a functionally motivated subcategory differentiated from adverbs. Adverbs in themselves in the lexicon are marked with information concerning e.g. polarity, the possible context they can occur in.

Syntactic functions

Syntactic functions used in the syntactic frames for verbs:

HEAD

ADVERBIAL for

NPs indicating MANNER or INSTRUMENT

CLAUSCOMP for

infinitive complements (i.e. 3rd infinitive in illative, elative or abessive, 1st infinitive, a participle or a permissive construction)

OBJECT realized as

an NP, a that-clause, a Wh-clause, a participal construction, an infinitive construction or a so called permissive construction

OBLIQUE for

phrasal complements in some local case or in essive or translative (the latter two cases are restricted here to predicative functions)

SUBJECT realized as

an NP in nominative, partitive or genitive or a clausal subject (a that-clause, a wh-clause or 1st infinitive construction)

Syntactic functions used in the syntactic frames for nouns:

HEAD

NCOMP for

phrasal complements inflected in some local case or in partitive

NCLAUSCOMP for

complements realized either as infinitive constructions or that- or Wh-clauses

Syntactic functions used in the syntactic frames for adjectives:

HEAD

SUBJECT

ACOMP for

phrasal complements inflected in some local case and for clausal complements realized as a third infinitive

SUBJPRED for

adjectives in predicative position with a clausal subject

NLEFTATTRIBUTIVE for

adjectives with no inflection. Noninflecting adjectives never display the predicative function, and their position in the syntagma is immediately before the headword and after other noun modifiers that agree with the headword in case and number.

Syntactic functions used in describing the syntactic behavior of prepositions, postpositions, ad-adjectives and adverbs:

HEAD

NPREPCOMP for

NPs depending on a preposition

NPOSTCOMP for

NPs depending on a postposition

AMODIFIER for

an ad-adjective modifying an adjective

ADVMODIFIER for

an ad-adjective modifying an adverb

ADVERBIAL for

an adverb as a sentence modifier

Our first approximation was to consider all the different forms of the phrasal post-nominal complements or modifiers (i.e. NPs inflected in different local cases) as different Syntactic units as well as the different kinds of clausal complements. The solution adopted for encoding adjectives was, however, to have one syntactic unit with different descriptions, so the encoding of nouns have been modified accordingly.

Framesets are not used in the Finnish lexicon (they are not obligatory information).

Number of Syntactic descriptions

124
Number of framesets

Category

Subcategory
Number of Units
Noun
common
17785
Noun
proper
746
Verb
normal
2941
Verb
impersonal
88
Adjective
qualificative
2931
Adjective
noninflecting
11
Adjective
pronominal
23
Adjective
ordinal
10
Adverb

569
Ad-adjective

42
Adpositon
postpositon
151
Adpositon
prepositon
44
Numeral
cardinal
24

3. Bibliography

Karlsson, Fred. 1987. FINNISH GRAMMAR. Second edition. WSOY: Juva.

Saukkonen, Pauli, Marjatta Haipus, Antero Niemikorpi & Helena Sulkala. 1979. SUOMEN KIELEN TAAJUUSSANASTO. WSOY: Porvoo.

Vilkuna, Maria. 1996. SUOMEN LAUSEOPIN PERUSTEET. Edita: Helsinki.

Dictionaries

SUOMEN KIELEN PERUSSANAKIRJA I-III. ("The Basic Dictionary of Finnish".) Kotimaisten kielten tutkimuskeskuksen julkaisuja 55. Valtion painatuskeskus: Helsinki.

Corpora

FINTAG. The Department of General Linguistics, University of Helsinki.

hs90. The Department of General Linguistics, University of Helsinki.

Appendix - An Example

Below are the relevant parts of the entries needed in building one of the approximately 6000 different forms of the Finnish noun aihe, 'subject'. The word form in question is aiheisiinsakin, 'even to his/her/their subjects':

aihe - i - sii - nsa - kin

SPL PL ILL SGPL3 CLITICPARTICLE

The order of the endings is indicated by means of the attributes linked to the element Um_Aff; e.g. 'mustbeattachedto' or 'mustbefollowedby'.

<Um_S

id="N60"

appellation="aihe"

catgram="NOUN"

sscatgram="COMMON"

autonomie="YES"

usyn_l="Usyn85"

<Umg

mf="DEFAULTMF">

<Lib>aihe</Lib>

<Radg

nieme="4"

back="YES"

stemtype="SPL"

endingv="YESEV"

ending2v="NOEN"

has2v="YESVV"

basei="NOBI"

syll1p3="NOSY1"

<Lib>aihe</Lib></Radg>

</Umg>

</Um_S>

<Um_Aff

id="NUM2"

typaff="SUFFIX"

mustbeattachedto="STEMORNONFINITE"

mustbefollowedby="CASE"

<Umg

mf="MFGEMPTY">

<Radg

nieme="5"

stemtype="SPL"

endingv="YESEV">

<Lib>i</Lib></Radg>

<MorphFeature

featurename="NUMBER"

featurevalue="PLURAL">

</MorphFeature></Umg>

</Um_Aff>

<Um_Aff

id="CASE4"

typaff="SUFFIX"

mustbeattachedto="NUMBERORNONFINITE"

<Umg

mf="MFGEMPTY">

<Radg

nieme="41"

stemtype="SPL"

endingv="YESEV"

has2v="YESVV"

px="YESPX"

px3="NOP">

<Lib>sii</Lib></Radg>

<MorphFeature

featurename="CASE"

featurevalue="ILLATIVE">

</MorphFeature></Umg>

</Um_Aff>

<Um_Aff

id="POSS5"

typaff="SUFFIX"

mustbeattachedto="CASE"

<Umg

mf="MFGEMPTY">

<Radg

nieme="1"

back="YES">

<Lib>nsa</Lib></Radg>

<MorphFeature

featurename="POSSESSOR"

featurevalue="SGPL3">

</MorphFeature></Umg>

</Um_Aff>

<Um_Aff

id="CPL2"

typaff="SUFFIX"

maybeattachedto="CASEORPOSSORPERSONORIMPERATIVE"

<Umg

mf="MFGEMPTY">

<Radg

nieme="1">

<Lib>kin</Lib></Radg>

<MorphFeature

featurename="CLITICPARTICLE"

featurevalue="YESCPL">

</MorphFeature></Umg>

</Um_Aff>