| Document
first version date
|
19-Feb-97
|
||
| Document
date
|
2-Jun-98
|
||
| Document
ID
|
P-WP3.6-WP-HEL-1
|
||
| Version
|
08
|
||
| Doc.
type
|
|||
| Document
status
|
to
be validated
|
||
| Validation
type
|
|||
| Comments
|
|||
| Name
|
Organisation
|
Purpose
| |
| From
|
Anu
Airola
|
HEL
|
|
| To
|
|||
| ALL
PAROLE Partners
|
....
|
Validation
| |
Construction of the Lexicon
The words in the Finnish lexicon are selected from three sources. First, all the simple nouns and all the proper nouns in the Frequency Dictionary of Finnish (Saukkonen, Haipus, Niemikorpi & Sulkala 1979) are included in the Finnish lexicon. The second source is the FINTAG-corpus compiled of texts representing different text types (for example academic prose, textbooks, newspapers, and magazines). The size of this corpus is about 1,3 million running words. The criteria for including a word in the lecixon was a frequency of no less than 4 for nouns and a frequency of no less than 2 for verbs and adjectives in the above mentioned corpus. The third source is the sample of the hs90-corpus containing about 1,0 million running words from the newspaper Helsingin Sanomat 1990.
2. Current Lexicon Contents
2.1 Morphological layer
2.1.1 Summary of the morphological information
Number of simple morphological units
|
25421
|
| Number
of compound morphological units
|
|
| Number
of affix morphological units
|
60
|
| Number
of agglutinated morphological units
|
|
| Number
of graphical morphological units
|
25421
|
| Number
of simple inflection modes
|
|
| Number
of simple compound inflection modes
|
Category
|
Subcategory
|
Number
of Units
|
| Noun
|
common
|
17785
|
| Noun
|
proper
|
746
|
| Verb
|
normal
|
2941
|
| Verb
|
impersonal
|
88
|
| Adjective
|
quali
|
2931
|
| Adjective
|
pronominal
|
23
|
| Adjective
|
noninflecting
|
11
|
| Adjective
|
ordinal
|
10
|
| Adverb
|
569
| |
| Ad-adjective
|
42
| |
| Adposition
|
postposition
|
151
|
| Adposition
|
preposition
|
44
|
| Pronoun
|
personal
|
6
|
| Pronoun
|
demonstrative
|
6
|
| Pronoun
|
reflexive
|
1
|
| Pronoun
|
relative
|
2
|
| Pronoun
|
interrogative
|
5
|
| Pronoun
|
indefinite
|
17
|
| Numeral
|
cardinal
|
24
|
| Conjunction
|
coordinative
|
8
|
| Conjunction
|
subordinative
|
11
|
In the Finnish lexicon the lemmatised form for nominals is nominative singular, or, if the word does not have this particular form, the lemmatised form is one of the existing word forms (for example nominative plural for pluralia tantum). For verbs the lemmatised form is 1st infinitive, or when this does not exist, the lemmatised form is 3rd person singular in active indicative present tense. For adverbs and adpositions inflected in locative cases there is one Morphological Unit for every different word form. The criteria for splitting morphlogical units is considered to be in line with the GENELEX principles.
There are some non-standardized categories, features and feature values added in the morphological layer in order to make the model more suitable for the Finnish languge.
Among these are two minor grammatical categories postulated for the Finnish lexicon, namely ad-adjectives and post-adverbs. Ad-adjectives are adverbs used to modify adjectives and other adverbs. Ad-adjectives never modify a verb, which is considered an argument to separate them from other adverbs and to postulate a new grammatical category 'ad-adjective'. Post-adverbs are adverbs that require a nominal complement. In this regard they behave like adpositions, but the difference is that a post-adverb does not determine the case of its complement like an adposition does.
Adjectives with no inflection are classified as a separate grammatical subcategory, i.e. NONINFLECTING. They never display the predicative function, and their position in the syntagma is immediately before the headword and after other noun modifiers that agree with the headword in case and number. (Vilkuna 1996.) Also pronouns that function like adjectives, i.e. modify nouns and agree with their headword in the normal way, are considered to form a separate grammatical subcategory, PRONOMINAL.
The Finnish case system contains four grammatical cases (i.e. NOMINATIVE, PARTITIVE, GENITIVE, and ACCUSATIVE), six locative cases, which are structured according to two dimensions, i.e. location and direction (see the table below), two abstract locative cases (ESSIVE and TRANSLATIVE) and finally the so called 'marginal cases' (ABESSIVE, COMITATIVE and INSTRUCTIVE). (See for example Karlsson 1987 and Vilkuna 1996.)
The system of the Finnish local cases according to Karlsson (1987:99):
LOCATION
| |||
| INSIDE
|
OUTSIDE
| ||
| STATIC
|
inessive
|
adessive
| |
| DIREC-
|
AWAY
FROM
|
elative
|
ablative
|
| TION
|
TOWARDS
|
illative
|
allative
|
An additional personal ending needed in the Finnish personal inflection is the fourth person indefinite, which is a special personal ending for the passive verb forms referring to personal but indefinite actor(s) of the process or action described by the verb.
In marking possession in Finnish the word signifying what is possessed also takes an ending. The possessive suffixes in the 3rd person singular and in the 3rd person plural are the same, and the (additional) value SGPL3 for the morphological feature POSSESSOR is thus meant to indicate the fact that a word form containing the possessive suffix in question is ambiguous between the two possible interpretations. (See e.g. Karlsson 1996.)
The morphological features TENSE and MOOD are classified as one distributional class in Finnish morphology, i.e. all the sequences standing for some possible combination of tense and mood are always indicated by only one surface morph. Although the functional endings of the infinitives and participles are in complementary distribution with the morphemes indicating tense and mood, they are considered to form a separate morphological feature (called NONFINITE here). Non-finite forms can take a case ending and a possessive suffix, and participles are also inflected for number. (Karlsson 1987.)
2.1.3 Inflection modes
2.1.3.1 General principles
In the Finnish lexicon the information needed to generate the different word forms (i.e. about 6000 word forms for a noun, and about 12000 word forms for a verb) is included in the different kinds of stems a word can have and in the different kinds of attributes that are linked to the stems. A noun can have either 5 or 6 stems, an adjective 7 or 8 stems, and a verb either 6, 7 or 8 stems.
In the same way every single variant of an inflectional ending with the corresponding attributes is listed in the lexicon (for example the illative ending has no less than 42 different allomorphs).
The idea is basically that an ending can be matched with a word stem if, on the one hand, one can find exactly the same attributes in the element Radg describing the behavior of a stem and in the Radg describing the behavior of an ending and if, on the other hand, the set of attributes have exactly the same values. In addition some attributes are needed in linking different suffixes.
Below are the lists containing the attributes needed to describe the Finnish inflectional morphology in the LE-PAROLE lexicon model. All the possible combinations of the word stems for the main parts of speech are also stated in the tables below. All frequencies of the combinations are given in the table at 2.1.3.6 (Frequencies of the different combinations of stems). One example of the inflection of the nominals is also given in the appendix.
The most frequently used inflected forms of pronouns are listed in the lexicon according to the common LE-PAROLE lexicon model for encoding morphological information.
The nouns, verbs, and adjectives in the lexicon are also marked with a code indicating the inflectional category the word belongs to. The categories used are those defined in "The Basic Dictionary of Finnish" ("Suomen kielen perussanakirja"). We have used the codes referring to the different inflectional categories as id-numbers, which identify the different MFGs in the SGML-encoded lexicon. In some instances there may be the symbol "DEFAULTMF" instead of the specific code for an inflectional category.
2.1.3.2 Attributes and Combination of stems for nouns and adjectives
Attributes needed for adding suffixes to nominal stems
ATTRIBUTES
|
VALUES
|
|
stemtype
|
|
|
BASE
|
base
form
|
|
SSG
|
strong
vowel stem
|
|
WSG
|
weak
vowel stem
|
|
SPL
|
strong
plural stem
|
|
WPL
|
weak
plural stem
|
|
CON
|
consonant
stem
|
|
dstem
|
|
|
PVE
|
positive
stem
|
|
CVE
|
comparative
stem
|
|
SVE
|
superlative
stem
|
|
back
|
does
the stem contain back or front vowels
|
|
has2v
|
is
there a long vowel in the end of the vowel stem
|
|
endingv
|
is
there a vowel in the end of the stem in question
|
|
basei
|
is
there the vowel 'i' in the end of the base form
|
|
syll1p3
|
is
the word 2-syllabic or not
|
|
syll2p3
|
is
the word monosyllabic or not
|
|
contexte_var
|
this
attribute is used mainly for restricting the application of certain rules
|
|
vq
|
vowel
quality
|
|
stemc
|
does
the word have a consonant stem
|
|
ending2v
|
is
there two different vowels in the end of the vowel stem
|
Attributes needed for linking different suffixes
px
|
This
attribute tells if a case ending must/must not/can be followed by one of the
possessive suffixes.
|
| px3
|
This
attribute tells if a certain type of case endings and a certain type of
possessive suffixes match.
|
| vqending
|
Vowel
quality linked to the suffixes.
|
2.1.3.3 Nouns
Stemtypes and their attributes
stemtype=Base
|
back
|
| endingv
| |
| has2v
| |
| basei
| |
| vq
| |
| syll2p3
| |
| stemtype=SSg
|
back
|
| has2v
| |
| stemc
| |
| vq
| |
| syll2p3
| |
| stemtype=WSg
|
back
|
| syll1p3
| |
| stemtype=SPl
|
back
|
| endingv
| |
| ending2v
| |
| has2v
| |
| basei
| |
| syll1p3
| |
| stemtype=WPl
|
back
|
| endingv
| |
| has2v
| |
| syll1p3
| |
| stemtype=Con
|
back
|
Different combinations of noun stems
Nouns with 5 stems
Stemtype
|
Examples
| |
| Base
|
kauppa
(shop)
|
kivi
(stone)
|
| SSg
|
kauppa/na
|
kive/nä
|
| WSg
|
kaupa/n
|
kive/n
|
| SPl
|
kauppo/ina
|
kiv/inä
|
| WPl
|
kaupo/issa
|
kiv/issä
|
Nouns with 6 stems
Stemtype
|
Examples
| ||
| Base
|
vene
(boat)
|
ajatus
(thought)
|
käsi
(hand)
|
| SSg
|
venee/nä
|
ajatukse/na
|
käte/nä
|
| WSg
|
venee/n
|
ajatukse/n
|
käde/n
|
| SPl
|
vene/inä
|
ajatuks/ina
|
käs/inä
|
| WPl
|
vene/issä
|
ajatuks/issa
|
käs/issä
|
| Con
|
venet/tä
|
ajatus/ta
|
kät/tä
|
2.1.3.4 Adjectives
Stemtypes and their attributes
dstem=PVE
|
back
|
| stemtype=BASE
|
endingv
|
| has2v
| |
| basei
| |
| vq
| |
| syll2p3
| |
|
dstem=PVE
|
back
|
| stemtype=SSG
|
has2v
|
| stemc
| |
| vq
| |
| syll2p3
| |
|
dstem=PVE
|
back
|
| stemtype=WSG
|
syll1p3
|
|
dstem=PVE
|
back
|
| stemtype=SPL
|
endingv
|
| ending2v
| |
| has2v
| |
| basei
| |
| syll1p3
| |
|
dstem=PVE
|
back
|
| stemtype=WPL
|
endingv
|
| has2v
| |
| syll1p3
| |
|
dstem=PVE
|
back
|
| stemtype=CON
|
|
|
dstem=CVE
|
back
|
|
dstem=SVE
|
back
|
Different combinations of adjective stems
Adjectives with 7 stems
Dstem
|
Stemtype
|
Examples
|
|
| PVE
|
BASE
|
korkea
(high)
|
vapaa
(free)
|
| PVE
|
SSG
|
korkea/na
|
vapaa/na
|
| PVE
|
WSG
|
korkea/n
|
vapaa/n
|
| PVE
|
SPL
|
korke/ina
|
vapa/ina
|
| PVE
|
WPL
|
korke/issa
|
vapa/issa
|
| CVE
|
korkea/mpi
|
vapaa/mpi
| |
| SVE
|
korke/in
|
vapa/in
|
Adjectives with 8 stems
Dstem
|
Stemtype
|
Examples
|
|
| PVE
|
BASE
|
kirkas
(bright)
|
lämmin
(warm)
|
| PVE
|
SSG
|
kirkkaa/na
|
lämpimä/nä
|
| PVE
|
WSG
|
kirkkaa/n
|
lämpimä/n
|
| PVE
|
SPL
|
kirkka/ina
|
lämpim/inä
|
| PVE
|
WPL
|
kirkka/issa
|
lämpim/issä
|
| PVE
|
CON
|
kirkas/ta
|
lämmin/tä
|
| CVE
|
kirkkaa/mpi
|
lämpimä/mpi
| |
| SVE
|
kirkka/in
|
lämpim/in
|
2.1.3.5 Verbs
Attributes
Attributes needed for adding suffixes to verb stems
vstem
|
SSGVST
|
strong
vowel stem
|
| WSGVST
|
weak
vowel stem
| |
| SSGPAST
|
strong
vowel stem: past tense
| |
| WSGPAST
|
weak
vowel stem: past tense
| |
| SSGCOND
|
conditional
stem
| |
| CONVST
|
consonant
stem
| |
| PASS
|
passive
stem
| |
| CONPOTN
|
potential
stem
| |
| SSGINF
|
infinitive
stem
| |
| back
|
does
the stem contain back or front vowels
| |
| endingv
|
is
there a vowel in the end of the stem in question
| |
| has2v
|
is
there a long vowel in the end of the vowel stem
| |
| stemc
|
does
the verb have a consonant stem
| |
| stempotn
|
does
the verb have a potential stem
| |
| steminf
|
does
the verb have an infinitive stem
| |
| vq
|
vowel
quality
| |
| cq
|
consonant
quality
|
Attributes needed for linking different suffixes
gradation
|
weak/strong/strongpast
|
| vqending
|
Vowel
quality linked to the suffixes.
|
Stemtypes and their attributes
vstem=SSGVST
|
back
|
| endingv
| |
| has2v
| |
| stemc
| |
| stempotn
| |
| steminf
| |
| vq
| |
| vstem=WSGVST
|
back
|
| vstem=SSGPAST
|
back
|
| vstem=WSGPAST
|
back
|
| vstem=SSGCOND
|
back
|
| vstem=SSGINF
|
back
|
| vstem=PASS
|
back
|
| endingv
| |
| has2v
| |
| cq
| |
| vstem=CONVST
|
back
|
| cq
| |
| vstem=CONPOTN
|
back
|
| cq
|
Different combinations of verb stems
Verbs with 6 stems
Vstem
|
Examples
|
|
| SSGVST
|
muista/vat
(remember)
|
juo/vat
(drink)
|
| WSGVST
|
muista/n
|
juo/n
|
| SSGPAST
|
muist/ivat
|
jo/ivat
|
| WSGPAST
|
muist/in
|
jo/in
|
| SSGCOND
|
muista/isin
|
jo/isin
|
| PASS
|
muiste
|
juo
|
Verbs with 7 stems
Vstem
|
Example
|
| SSGVST
|
luke/vat
(read)
|
| WSGVST
|
lue/n
|
| SSGPAST
|
luk/ivat
|
| WSGPAST
|
lu/in
|
| SSGCOND
|
luk/isin
|
| PASS
|
lue
|
| SSGINF
|
luki
|
Verbs with 8 stems
Vstem
|
Example
|
| SSGVST
|
valitse/vat
(choose)
|
| WSGVST
|
valitse/n
|
| SSGPAST
|
valits/ivat
|
| WSGPAST
|
valits/in
|
| SSGCOND
|
valits/isin
|
| CONVST
|
valit/koon
|
| PASS
|
valit
|
| CONPOTN
|
valin
|
2.1.3.6 Frequencies of the different combinations of stems
Number of
|
Category
| ||
| stems
|
Noun
(common)
|
Adjective
(qualificative)
|
Verb
(normal)
|
| 5
|
10305
|
-
|
-
|
| 6
|
7480
|
-
|
2204
|
| 7
|
-
|
92
|
187
|
| 8
|
-
|
4003
|
946
|
2.2 Syntactic layer
General
In the Finnish lexicon the main syntactic information encoded for verbs contains firstly the possible type of the object a verb can take, if any, and secondly, for intransitive verbs, the property of a verb to select a specific case and its possibility to take a clausal subject. We also have a category for intransitive verbs which can occur in existential constructions, i.e. for verbs which can take a subject in the partitive. In addition, there are some constructions for different mental and communicative verbs, which typically take a complement indicating RECIPIENT.
In addition to intransitive and transitive verbs, there is a third category for verbs in the lexicon, a category for impersonal verbs. By `impersonal' we refer to verbs, or uses of certain verbs, which lack a person contrast.
The syntactic encoding of nouns concerns the possible post-nominal complements a noun can take, i.e. different phrasal complements inflected in some local case, as well as infinitive constructions and clauses governed by a noun.
Adjectives in the lexicon are categorized according to three types of complements, which are phrasal complements inflected a) in the genitive and b) in some local cases. Thirdly there is a subgroup of adjectives which can be modified by an infinitival complement.
Prepositions and postpositions are categorized according to the form of the complement they occur with, i.e. a phrasal complement either in the genitive or in the partitive, or, in some rare cases, in some local case.
The so called ad-adjectives form a functionally motivated subcategory differentiated from adverbs. Adverbs in themselves in the lexicon are marked with information concerning e.g. polarity, the possible context they can occur in.
Syntactic functions
Syntactic functions used in the syntactic frames for verbs:
HEAD
ADVERBIAL for
NPs indicating MANNER or INSTRUMENT
CLAUSCOMP for
infinitive complements (i.e. 3rd infinitive in illative, elative or abessive, 1st infinitive, a participle or a permissive construction)
OBJECT realized as
an NP, a that-clause, a Wh-clause, a participal construction, an infinitive construction or a so called permissive construction
OBLIQUE for
phrasal complements in some local case or in essive or translative (the latter two cases are restricted here to predicative functions)
SUBJECT realized as
an NP in nominative, partitive or genitive or a clausal subject (a that-clause, a wh-clause or 1st infinitive construction)
Syntactic functions used in the syntactic frames for nouns:
HEAD
NCOMP for
phrasal complements inflected in some local case or in partitive
NCLAUSCOMP for
complements realized either as infinitive constructions or that- or Wh-clauses
Syntactic functions used in the syntactic frames for adjectives:
HEAD
SUBJECT
ACOMP for
phrasal complements inflected in some local case and for clausal complements realized as a third infinitive
SUBJPRED for
adjectives in predicative position with a clausal subject
NLEFTATTRIBUTIVE for
adjectives with no inflection. Noninflecting adjectives never display the predicative function, and their position in the syntagma is immediately before the headword and after other noun modifiers that agree with the headword in case and number.
Syntactic functions used in describing the syntactic behavior of prepositions, postpositions, ad-adjectives and adverbs:
HEAD
NPREPCOMP for
NPs depending on a preposition
NPOSTCOMP for
NPs depending on a postposition
AMODIFIER for
an ad-adjective modifying an adjective
ADVMODIFIER for
an ad-adjective modifying an adverb
ADVERBIAL for
an adverb as a sentence modifier
Our first approximation was to consider all the different forms of the phrasal post-nominal complements or modifiers (i.e. NPs inflected in different local cases) as different Syntactic units as well as the different kinds of clausal complements. The solution adopted for encoding adjectives was, however, to have one syntactic unit with different descriptions, so the encoding of nouns have been modified accordingly.
Framesets are not used in the Finnish lexicon (they are not obligatory information).
Number of Syntactic descriptions
|
124
|
| Number
of framesets
|
Category
|
Subcategory
|
Number
of Units
|
| Noun
|
common
|
17785
|
| Noun
|
proper
|
746
|
| Verb
|
normal
|
2941
|
| Verb
|
impersonal
|
88
|
| Adjective
|
qualificative
|
2931
|
| Adjective
|
noninflecting
|
11
|
| Adjective
|
pronominal
|
23
|
| Adjective
|
ordinal
|
10
|
| Adverb
|
569
| |
| Ad-adjective
|
42
| |
| Adpositon
|
postpositon
|
151
|
| Adpositon
|
prepositon
|
44
|
| Numeral
|
cardinal
|
24
|
3. Bibliography
Karlsson, Fred. 1987. FINNISH GRAMMAR. Second edition. WSOY: Juva.
Saukkonen, Pauli, Marjatta Haipus, Antero Niemikorpi & Helena Sulkala. 1979. SUOMEN KIELEN TAAJUUSSANASTO. WSOY: Porvoo.
Vilkuna, Maria. 1996. SUOMEN LAUSEOPIN PERUSTEET. Edita: Helsinki.
Dictionaries
SUOMEN KIELEN PERUSSANAKIRJA I-III. ("The Basic Dictionary of Finnish".) Kotimaisten kielten tutkimuskeskuksen julkaisuja 55. Valtion painatuskeskus: Helsinki.
Corpora
FINTAG. The Department of General Linguistics, University of Helsinki.
hs90. The Department of General Linguistics, University of Helsinki.
Appendix - An Example
Below are the relevant parts of the entries needed in building one of the approximately 6000 different forms of the Finnish noun aihe, 'subject'. The word form in question is aiheisiinsakin, 'even to his/her/their subjects':
aihe - i - sii - nsa - kin
SPL PL ILL SGPL3 CLITICPARTICLE
The order of the endings is indicated by means of the attributes linked to the element Um_Aff; e.g. 'mustbeattachedto' or 'mustbefollowedby'.
<Um_S
id="N60"
appellation="aihe"
catgram="NOUN"
sscatgram="COMMON"
autonomie="YES"
usyn_l="Usyn85"
<Umg
mf="DEFAULTMF">
<Lib>aihe</Lib>
<Radg
nieme="4"
back="YES"
stemtype="SPL"
endingv="YESEV"
ending2v="NOEN"
has2v="YESVV"
basei="NOBI"
syll1p3="NOSY1"
<Lib>aihe</Lib></Radg>
</Umg>
</Um_S>
<Um_Aff
id="NUM2"
typaff="SUFFIX"
mustbeattachedto="STEMORNONFINITE"
mustbefollowedby="CASE"
<Umg
mf="MFGEMPTY">
<Radg
nieme="5"
stemtype="SPL"
endingv="YESEV">
<Lib>i</Lib></Radg>
<MorphFeature
featurename="NUMBER"
featurevalue="PLURAL">
</MorphFeature></Umg>
</Um_Aff>
<Um_Aff
id="CASE4"
typaff="SUFFIX"
mustbeattachedto="NUMBERORNONFINITE"
<Umg
mf="MFGEMPTY">
<Radg
nieme="41"
stemtype="SPL"
endingv="YESEV"
has2v="YESVV"
px="YESPX"
px3="NOP">
<Lib>sii</Lib></Radg>
<MorphFeature
featurename="CASE"
featurevalue="ILLATIVE">
</MorphFeature></Umg>
</Um_Aff>
<Um_Aff
id="POSS5"
typaff="SUFFIX"
mustbeattachedto="CASE"
<Umg
mf="MFGEMPTY">
<Radg
nieme="1"
back="YES">
<Lib>nsa</Lib></Radg>
<MorphFeature
featurename="POSSESSOR"
featurevalue="SGPL3">
</MorphFeature></Umg>
</Um_Aff>
<Um_Aff
id="CPL2"
typaff="SUFFIX"
maybeattachedto="CASEORPOSSORPERSONORIMPERATIVE"
<Umg
mf="MFGEMPTY">
<Radg
nieme="1">
<Lib>kin</Lib></Radg>
<MorphFeature
featurename="CLITICPARTICLE"
featurevalue="YESCPL">
</MorphFeature></Umg>
</Um_Aff>