LE-PAROLE
WP3.8
German Lexicon Documentation
* * *
Results:
Over 20,000 lexical entries have been converted according to the PAROLE DTD on the morphological and syntactic level. The composition of the different word classes is as follows:
For more information on the work, or access to the German PAROLE Lexicon (which is freely available upon filling in a User Agreement), please contact the German coordinator Wolfgang Teubert (wolfgang.teubert@ids-mannheim.de).
2. Current Lexicon Contents
|
Category |
Subcategory |
Number of Units |
nouns |
15,500 |
|
adjectives |
4,000 |
|
verbs |
3,000 |
|
adverbs |
590 |
|
function words |
ca 500 |
Adjectives:
The list of adjectives provided was whittled down by comparison with extensive corpora (ca 103 million words and ca 204 million words). If no example sentences were found in either corpus, or if the word classed in the PAROLE lemma list as an adjective was clearly mis-tagged, then the item was removed. Attributive and/or predicative use is specified. This work was manually checked.
Nouns:
Again, the lemma list was checked against corpus material. The "default" value is that no special syntactic information is attached to the noun (example: Tisch). Other types are "ntype(mass)", "s-comp(C-daß)", "s-comp(C-wh/ob)" and "v-comp(V-zu-inf)". Example sentences extracted from the corpus are available for a subset of nouns which take a subcategorised subclause or an infinitive.
Adverbs:
Of the list of 500 adverbs in the German PAROLE list, examples were identified of several selected uses from the 200 million word corpus used. The main information included in on the following properties:
2.2 VALIDATION
The German lexicon was validated by a member of the German PAROLE subconsortium, IAI in Saarbrücken. A full validation report was produced and is available from MAN. On the basis of this, several points were improved.
2.3 Multilingual Linking
To fulfil the work item on multilingual links in WP 3.8, a workshop was held of all the German subconsortium partners. The agenda and papers from the workshop are available on the MAN website (http://www.ids-mannheim.de/MLF). The full proceedings will be published later this year.
2.4 Further work
Since the end of the PAROLE project, the lexicon has been altered and improved. This has been deemed necessary for two reasons.