LE-PAROLE

WP3.8

 

 

 

German Lexicon Documentation

 

* * *

 

Results:

Over 20,000 lexical entries have been converted according to the PAROLE DTD on the morphological and syntactic level. The composition of the different word classes is as follows:

For more information on the work, or access to the German PAROLE Lexicon (which is freely available upon filling in a User Agreement), please contact the German coordinator Wolfgang Teubert (wolfgang.teubert@ids-mannheim.de).

2. Current Lexicon Contents

    1. Morphological and syntactic layer

 

 

Category

Subcategory

Number of Units

nouns

 

15,500

adjectives

 

4,000

verbs

 

3,000

adverbs

 

590

function words

 

ca 500

Adjectives:

The list of adjectives provided was whittled down by comparison with extensive corpora (ca 103 million words and ca 204 million words). If no example sentences were found in either corpus, or if the word classed in the PAROLE lemma list as an adjective was clearly mis-tagged, then the item was removed. Attributive and/or predicative use is specified. This work was manually checked.

Nouns:

Again, the lemma list was checked against corpus material. The "default" value is that no special syntactic information is attached to the noun (example: Tisch). Other types are "ntype(mass)", "s-comp(C-daß)", "s-comp(C-wh/ob)" and "v-comp(V-zu-inf)". Example sentences extracted from the corpus are available for a subset of nouns which take a subcategorised subclause or an infinitive.

 

Adverbs:

Of the list of 500 adverbs in the German PAROLE list, examples were identified of several selected uses from the 200 million word corpus used. The main information included in on the following properties:

2.2 VALIDATION

The German lexicon was validated by a member of the German PAROLE subconsortium, IAI in Saarbrücken. A full validation report was produced and is available from MAN. On the basis of this, several points were improved.

2.3 Multilingual Linking

To fulfil the work item on multilingual links in WP 3.8, a workshop was held of all the German subconsortium partners. The agenda and papers from the workshop are available on the MAN website (http://www.ids-mannheim.de/MLF). The full proceedings will be published later this year.

2.4 Further work

Since the end of the PAROLE project, the lexicon has been altered and improved. This has been deemed necessary for two reasons.

  1. The partially unsatisfactory nature of the original entries, as provided by CIS, means that frequency was not taken into account as it should have been, leading to a skewed distribution of words in the lexicon. For instance, strong verbs were a major failing in the PAROLE lexicon, which have now been largely added.
  2. The SIMPLE work has lead to further gaps and failings becoming evident. These have been systematically noted and either improved straight away, or noted for future work. Although the SIMPLE work does not require any imrpovement to the PAROLE lexicon, we are hoping to combine the two to provide a better resource.