SIMPLE LE4-8346
DANISH SIMPLE - LEXICON DOCUMENTATION
* * *
|
Document first version date |
28/8/1999 |
||||
|
Document date |
25/4/2000 |
||||
|
Document ID |
Danish Simple-Lexicon Documentation Prefinal version prepared for evaluation May 19 |
||||
|
Version |
02 |
||||
|
Doc. type |
|||||
|
Document status |
prefinal |
||||
|
Validation type |
|||||
|
Comments |
|||||
|
Name |
Organisation |
Purpose |
|||
|
From |
Bolette Pedersen |
COP |
Documentation |
||
|
Sanni Nimb |
|||||
|
Sussi Olsen |
|||||
|
To |
evaluation panel |
||||
1. General design information
1.1. Lexicon population
The Danish SIMPLE-lexicon adds semantic descriptions to 8,200 of the 20,000 Danish PAROLE lexicon entries. These 8,200 morphological entries amounts to 10,000 semantic units because of cases of polysemy and homonomy. 7,000 of the semantic units are nouns; 2,000 are verbs, and 1,000 are adjectives (by April 25 9,700 semus are encoded) .
The entries to be encoded in SIMPLE have been chosen on the basis of three different criteria:
In the case of nouns, we have sought towards a relatively ‘closed approach’ to lexicon population so that all relevant readings of the particular words were encoded. We have primarily based our reading distinction strategy on a medium-sized monolingual lexicon as well as on corpus examinations (i.e. in some cases we have deviated from the lexicon because the corpus revealed either less or other ambiguities than the ones represented in the lexicon).
In the case of verbs, a closed approach has not been plausible first of all because the Danish PAROLE lexicon has not adopted such an approach when describing the syntax of Danish verbs. For instance, Danish is characterised by a very high use of phrasal verb constructions (see also Section 2.7) and not all of these have been encoded in syntax.
In relation to lexicon population it is important for us to stress that the elaboration of a Danish computational lexicon does not stop with the PAROLE/SIMPLE project. An ongoing project at Center for Sprogteknologi is concerned with the task of scaling up the PAROLE/SIMPLE lexicon to 100,000 semantic units (see Braasch et al. 1998). In particular wrt. phrasal verbs our aim is to extent the existing phrasal verb descriptions into something that corresponds better to the presence of phrasal verbs in Danish corpora.
1.2. Background resources
Two background resources have played an important role in the building of the Danish SIMPLE data, namely corpora and a medium-sized Danish lexicon. First of all, the decision was made very early in the project that all data should be described on the basis of corpus examinations and that each semantic unit should be supported by an illustrative example from the corpus. This means that if a meaning of a word shows significant frequency in corpus we represent it in the SIMPLE lexicon - even if the particular meaning is not represented in the traditional dictionary we use as our other important background resource (for instance the metaphorical meaning of puslespil (puzzle)). Also, if a meaning is represented in the lexicon but with no occurrences in the corpus, the particular meaning has in most cases been omitted.
Our corpus examinations are primarily based on two corpora. The most important is the Berlingske corpus of about 20 mill. tokens, consisting of newspaper articles concerning various topics. In the cases where there are few or no examples of a given word in this corpus, the DK-korpus (Bergenholtz 1990), a balanced corpus of 4 mill. words composed of novels, newspapers, journals, magazines and miscellaneous, is used. We have chosen the corpus tool Xkwic (Christ 1993) for our corpus examinations. Xkwic is part of the IMS corpus toolbox developed at the University of Stuttgart and available through the internet.
Nudansk Ordbog is a medium-sized Danish lexicon with a rather consistent reading distinction policy. We have achieved the right to exploit this resource as long as the material is not used with commercial perspectives. Almost all definitions have been extracted from an electronic version of this source. All encoded words in our lexicon include a definition; in cases where we did not find an appropriate definition in Nudansk Ordbog - either because the word was not represented or because the definition for some reason or other was inappropriate - we have elaborated one. It has been of great help to have this resource as a reference point.
1.3. Material selected for the evaluation
Since for adjectives we have encoded by now only required information (cf. Lenci et al. 2000) whereas we for nouns and verbs have encoded both recommended and in several cases also optional information, the two latter word classes (which represent 9/10 of the Danish SIMPLE material) show much more of the Danish SIMPLE lexicon. Therefore, 50 noun meanings and 50 verb meanings have been selected for evaluation purposes. Due to time limits there has not been time for a very careful word selection, on the other hand the rather random selection of words from the lexicon illustrates well the lexicon as a whole since there has been no special elaboration of the material presented here. However, as can be seen below, the material represents different aspects of the SIMPLE ontology since both concrete, abstract and event nouns are represented as well as a large set of the ontological verb types.
The selected words are seen in the two lists below. Note that only 31 morphological noun units and 15 verb units are selected but that these spread into 100 different meanings due to homonomy and polysemy. In the sgml files eval_nouns_DK.sgml and eval_verbs_DK.sgml both the morphological, syntactic and semantic units are given as well as all other sgml objects referred to in the entries. The two files have been successfully parsed using the English version of the DTD delivered by LexiQuest.
NOUNS: (eval_nouns_DK.sgml)
pige (girl)
koreaner (Korean)
bror (brother)
søn (son)
republikaner (republican)
forbruger (consumer)
opdrætter (breeder)
oplæser (reciter, newsreader)
biologi (biology)
årsag (cause)
marina (marina)
lystbådehavn (marina)
land (country)
skole (school)
bibliotek (library)
minut (minute)
aften (evening)
hjelm (helmet)
styrthjelm (crash helmet)
visir (visor)
menukort (menu)
spisekort (menu)
paraply (umbrella)
skadedyr (vermin)
papegøje (parrot)
enebær (juniper berry)
nellike (pink (flower), clove)
kanin (rabbit)
sild (herring)
storm (storm)
tordenvejr (thunder)
VERBS: (eval_verbs_DK.sgml)
finde (find, think of, think, stand (I can't stand it))
disponere (take the necessary steps, predispose, have disposal of)
bryde (break, change)
sende (send)
spille (act, play)
såre (hurt)
så (sow)
behandle (treat, cure)
bede (ask, pray)
male (paint, grind)
tale (speak, talk)
regne (rain, calculate, count on)
stoppe (stop, end)
læse (study, read)
springe (jump, explode)
The easiest way to ‘follow’ a word from morphology to semantics is to simply search on the word form throughout the file. For a verb like læse (study, read) this gives the following results (note that since the original Danish PAROLE lexicon covers 20,000 morphological units and around 60,000 syntactic units not all links are necessarily encoded in the semantic part of the lexicon which only covers 10,000 semantic units):
MORPHOLOGY
<MuS (morphological unit)
id="UM029573"
naming="LÆSE"
gramcat="VERB"
gramsubcat="MAIN"
synulist="Usyn12 Usyn3796 Usyn3797 Usyn3798 Usyn3800 Usyn3801 Usyn3802 Usyn3803">
<Gmu
attestation="RO86"
inp="MFG0131">
<Spelling>læse</Spelling></Gmu></MuS>
SYNTAX
<SynU (syntactic unit)
id="Usyn3797"
naming="læse"
attestation="cn"
description="Dv2P-i"><correspSynUSemU
targetsemu="USEM_V_læse_COE_1"
correspondence="arg12i"</SynU>
<SynU
id="Usyn12"
naming="læse"
attestation="cn"
description="Dv2N0"><correspSynUSemU
targetsemu="USEM_V_læse_COE_1"
correspondence="arg12"><correspSynUSemU
targetsemu="USEM_V_læse_COE_3"
correspondence="arg12"></SynU>
<SynU
id="Usyn3800"
naming="læse"
attestation="cn"
description="Dv2P-paa"></SynU>
<SynU
id="Usyn3801"
naming="læse"
attestation="cn"
description="Dv2P-til"><correspSynUSemU
targetsemu="USEM_V_læse_COE_2"
correspondence="arg12til"></SynU>
<SynU
id="Usyn3802"
naming="læse"
attestation="cn"
description="Dv2xP0-op-til"><correspSynUSemU
targetsemu="USEM_V_læse_op_COE_1"
correspondence="arg12til"></SynU>
<SynU
id="Usyn3796"
naming="læse"
attestation="cn"
description="Dv3N0P0-for">
<correspSynUSemU
targetsemu="USEM_V_læse_SPE_1"
correspondence="arg122P">
<SynU
id="Usyn3803"
naming="læse"
attestation="cn"
description="Dv2t"><correspSynUSemU
targetsemu="USEM_V_læse_COE_1"
correspondence="arg12t"></SynU>
<SynU
id="Usyn3798"
naming="læse"
attestation="cn"
description="Dv2xN0-op">
<correspSynUSemU
targetsemu="USEM_V_læse_op_SPE_1"
correspondence="arg12"></SynU>
SEMANTICS
COGNITIVE EVENTS
<SemU
id="USEM_V_læse_COE_1"
naming="læse"
example=" Det er ikke en bog , man gider at læse to gange , men sjov er den . "
comment="full BSP"
freedefinition="se på og forstå en tekst (NDONY)" /look at and understand a text/
weightvalsemfeaturel="
WVSFTemplateCognitiveEventPROT
WVSFTemplateSuperTypePsychologicalEventPROT
WVSFEventTypeProcessPROT
TSVP_Cognition_TS_classificateur_de_verbe">
<PredicativeRepresentation
typeoflink="MASTER"
predicate="PREDhumsem_COE_1">
/selectional restrictions ARG1=human ARG2=semiotic /
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_erkendelsesproces_COE_1"
semr="SRIsa">
</SemU>
<SemU
id="USEM_V_læse_op_COE_1"
naming="læse_op (til)"
example=" På en videregående uddannelse kan man ikke , som på gymnasiet , bare læse op til eksamen "
comment="full BSP"
freedefinition="forberede sig til en eksamen" /prepare an exam/
weightvalsemfeaturel="
WVSFTemplateCognitiveEventPROT
WVSFTemplateSuperTypePsychologicalEventPROT
WVSFEventTypeProcessPROT
TSVP_Cognition_TS_classificateur_de_verbe">
<PredicativeRepresentation
typeoflink="MASTER"
predicate="PREDhum_COE_1">
/selectional restriction ARG1=human ARG2=unrestricted/
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_erkendelsesproces_COE_1"
semr="SRIsa">
</SemU>
<SemU
id="USEM_V_læse_COE_2"
naming="læse"
example=" En ordentlig arbejder , der ville frem i geledderne måtte helst læse til cand.polit "
comment="full BSP"
freedefinition=" være ved at tage en boglig uddannelse i noget (NDONY)" /take an education to become something/
weightvalsemfeaturel="
WVSFTemplateCognitiveEventPROT
WVSFTemplateSuperTypePsychologicalEventPROT
WVSFEventTypeProcessPROT
TSVP_Cognition_TS_classificateur_de_verbe">
<PredicativeRepresentation
typeoflink="MASTER"
predicate="PREDhumprof_COE_1">
/selectional restriction ARG1=human ARG2=profession/
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_erkendelsesproces_COE_1"
semr="SRIsa">
</SemU>
<SemU
id="USEM_V_læse_COE_3"
naming="læse"
example="Han trådte som 20-årig ind i redemtoristordenen og læste teologi hos Mauterne i Østrig "
comment="full BSP"
freedefinition=" være ved at tage en boglig uddannelse i noget (NDONY)"
weightvalsemfeaturel="
WVSFTemplateCognitiveEventPROT
WVSFTemplateSuperTypePsychologicalEventPROT
WVSFEventTypeProcessPROT
TSVP_Cognition_TS_classificateur_de_verbe">
<PredicativeRepresentation
typeoflink="MASTER"
predicate="PREDhumdom_COE_1">
/selectional restriction ARG1=human ARG2=domain /
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_erkendelsesproces_COE_1"
semr="SRIsa">
</SemU>
SPEECH ACTS
<SemU
id="USEM_V_læse_op_SPE_1"
naming="læse_op"
example="jeg er heller ikke i stand til at læse op , hvad mine medarbejdere skriver"
comment="full SN"
freedefinition="udtale noget skrevet, så andre kan høre det (NDONY)" /read aloud/
weightvalsemfeaturel="
WVSFTemplateSpeechActPROT
WVSFTemplateSuperTypeActPROT
WVSFEventTypeProcessPROT
TSVP_COMMUNICATION_TS_classificateur_de_verbe">
<PredicativeRepresentation
typeoflink="MASTER"
predicate="PRED2hum_sem_SPE_1">
/selectional restriction ARG1=human ARG2=semiotic /
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_talehandling_SPE_1"
semr="SRIsa">
</SemU>
<SemU
id="USEM_V_læse_SPE_1"
naming="læse"
example="han læste for pigen "
comment="full SN"
freedefinition="læse højt af en tekst for nogen" /read aloud to somebody /
weightvalsemfeaturel="
WVSFTemplateSpeechActPROT
WVSFTemplateSuperTypeActPROT
WVSFEventTypeProcessPROT
TSVP_COMMUNICATION_TS_classificateur_de_verbe">
<PredicativeRepresentation
typeoflink="MASTER"
predicate="PRED3hum_sem_hum_SPE_1">
/selectional restrictions ARG1=human ARG2=semiotic (can be ommitted) ARG3=human/
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_talehandling_SPE_1"
semr="SRIsa">
</SemU>
We also include as evaluation material two papers (Nimb & Pedersen 2000, Pedersen & Nimb 2000) where we focus on metaphoric senses and on phrasal verbs, respectively. These papers give a more thorough description as well as the linguistic background of the specific phenomena that have required special attention during the Danish lexicon encoding.
1.4. Current Lexicon Contents
Table 1: Overall statistics
|
Number of full Semu's linked to syntax and morphology |
by April 25: 9,700 semu’s |
|
Number of predicative Semu’s |
2,035 |
|
Semu per category Nouns: (required, recommended and optional information) Verbs: (required, recommended and optional information) Adjectives: (required information only) |
6,700 2,000 1,000 |
|
Number of dummies |
approx. 1000 |
The following schemas show the templates represented in the lexicon.
CONCRETE NOUN TEMPLATES REPRESENTED:
|
Part |
|
Body part |
|
Group |
|
Human group |
|
Concrete entity |
|
Location |
|
3D location |
|
Geopol |
|
Area |
|
Openings |
|
Building |
|
Artifactual area |
|
Material |
|
Artifact |
|
Artifact material |
|
Furniture |
|
Clothing |
|
Container |
|
Artwork |
|
Instrument |
|
Money |
|
Vehicle |
|
Semiotic artifact |
|
Food |
|
Artifact food |
|
Flavouring |
|
Physical object |
|
Organic object |
|
Animal |
|
Earth |
|
Air |
|
Water |
|
Human |
|
People |
|
Ideo |
|
Kinship |
|
Social status |
|
Agent of temporary activity |
|
Agent of persistent activity |
|
Profession |
|
Vegetal |
|
Plant |
|
Flower |
|
Fruit |
|
Microorganism |
|
Substance |
|
Natural Substance |
|
Substance food |
|
Drink |
|
Artifactual drink |
CONCRETE NOUN TEMPLATES NOT REPRESENTED
|
Entity |
|
Living entity |
|
Role |
ABSTRACT NOUN TEMPLATES REPRESENTED:
|
Quality |
|
Social property |
|
Psychical property |
|
Physical property |
|
Colour |
|
Physical power |
|
Shape |
|
Representation |
|
Information |
|
Language |
|
Number |
|
Sign |
|
Unit of measurement |
|
Abstract |
|
Cognitive fact |
|
Convention |
|
Domain |
|
Institution |
|
Moral standards |
|
Time |
ABSTRACT NOUN TEMPLATES, NOT REPRESENTED:
|
Property |
|
Movement of thought |
EVENT TEMPLATES REPRESENTED:
|
Event |
|
Weather |
|
Cause Aspectual |
|
Aspectual |
|
State |
|
Exist |
|
Relational state |
|
Identificational state |
|
Constitutive state |
|
Stative location |
|
Stative possession |
|
Act |
|
Non-relational act |
|
Relational act |
|
Purpose act |
|
Move |
|
Caused Motion |
|
Speech act |
|
Reporting event |
|
Commisives |
|
Cognitive event |
|
Judgment |
|
Caused experience event |
|
Perception |
|
Change |
|
Relational change |
|
Change possession |
|
Change Location |
|
Natural transition |
|
Change of State |
|
Change of Value |
|
Acquire knowledge |
|
Cause Change |
|
Creation |
|
Physical creation |
|
Mental creation |
|
Symbolic creation |
|
Copy creation |
|
Cause relational change |
|
Cause Change of State |
|
Cause change of value |
|
Cause change of location |
|
Cause natural transition |
EVENT TEMPLATES, NOT REPRESENTED
|
Disease |
|
Stimuli |
|
Cooperative Act |
|
Cause Act |
|
Cooperative Speech act |
|
Directives |
|
Expressives |
|
Declaratives |
|
Psychological event |
|
Experience Event |
|
Modal event |
|
Constitutive change |
|
Cause constitutive change |
|
Give knowledge |
PROPERTY TEMPLATES REPRESENTED
|
Modal |
|
Temporal |
|
Emotive |
|
Manner |
|
Emphasizer |
|
Physical property |
|
Psychological property |
|
Social property |
|
Temporal property |
|
Relational property |
|
Intensional |
PROPERTY TEMPLATES NOT REPRESENTED
|
Object-related |
|
Intensifying property |
|
Extensional |
1.5. Validation
In order to check the grammatical consistency of our encoded SGML templates we have adjusted an SGML parser which validates our files according to the document type definition (dtd).
Apart from the validation taken care of by the SGML parser; we have elaborated a few Unix procedures which help check other sources to mistakes. One procedure checks ‘id’ and ‘naming’ and produces a list of semantic units where the two are not identical. Another writes a list of target semu’s referred to via the semantic relations in the qualia structure and check these towards the already encoded entries. This list is essentially a list of dummy candidates (i.e. words that have not been fully coded yet and should therefore be established as dummy semu’s), but the list is checked manually and wrong references, misspellings, empty targets and other mistakes are sorted out. This can be done only because every ‘id’ is supplied with an abbreviation of the ontological type to which it belongs (i.e. USEM_V_bevæge_sig_MOV_1). Only when a word has more than one sense within the same ontological type the different senses receive subsequent reading numbers (i.e. USEM_N_kort_SEM_1 vs. USEM_N_kort_SEM_2).
As regards purely linguistic consistency checking, a great deal of work is still remaining. Although the lexical guidelines (Lenci et al. 2000) have ensured a large degree of consistency between the different parts of the lexicon by providing templates to each ontological type, many cases of inconsistency can still be found. A browser helps us ensure that the use of relations is appropriate; for instance hyponyms and hyperonyms are checked on the lexicon material in order to discover whether a homogenous semantic class refers to the same hypernym or not and whether the hyperonyms of a given hyponym really are hyperonyms at the same level of analysis.
1.6. Remaining work
Within the scope of the SIMPLE project, 300 nouns need to be encoded, linked to syntax and parsed. Further linguistic validation of the whole lexicon material is also foreseen in the last phase of the project.
2. Semantic encoding
2.1. Criteria for Syntax-Semantic linking
Non-predicative nouns are linked by simply relating to the semantic unit(s) to which a syntactic unit corresponds; in the case of adresse, two links are established from one syntactic unit, namely one to a ‘representation’ interpretation as in brevet skal være forsynet med navn og adresse på bagsiden (the letter should be supplied with name and address on the back) and one to a ‘location’ interpretation folk afstår fra at flytte ind på visse adresser (people desist from moving into to certain addresses):
<SynU
id="Usyn10003"
naming="adresse"
attestation="ns"
description="Dn0">
<CorrespSynUSemU
targetsemu="USEM_N_adresse_REP_1">
<CorrespSynUSemU
targetsemu="USEM_N_adresse_LOC_1"></SynU>
For events, also a linking procedure between syntactic complements and semantic arguments has been established. Here we have followed the LINDA specifications (Underwood et al. 1996) where a principled analysis is given of the argument structure of Danish verbs and nouns. For a further description of the argument structure applied in this lexicon we therefore refer to this manual.
In the syntactic unit below for ride (ride) we can see how the valency pattern Dv2P-paa in syntax is mapped onto the semantic frame arg12paa by means of the feature ‘correspondence’:
<SynU
id="Usyn4713"
naming="ride"
attestation="n"
description="Dv2P-paa"><correspSynUSemU
targetsemu="USEM_V_ride_MOV_1"
correspondence="arg12paa"></SynU>
The correspondence feature is further specified below where it can be seen how each complement (position) in syntax is linked to an argument in semantics; thus subject is linked to ARG1 and the valency bound prepositional phrase to ARG2:
<Correspondence
id="arg12paa"
naming="mapping for divalent verb with prepositional object"
corresargpos1="ARG1_P_CNPrsubj ARG2_P_CPP-paa">
In some cases, more than one description is given in the syntactic unit and in such cases it is sometimes necessary to specify which description links to which semantic unit. Below is given the case of bevæge (Dv4NPa0Pa0-fra-til) (‘move’ - causative) and bevæge sig (Dv4refNPa0Pa0-fra-til) (‘move’ reflexive, decausative). The two descriptions link to the semantic template MOVE and CAUSED MOTION, respectively:
<SynU
id="Usyn3515"
naming="bevæge"
attestation="cn"
description="Dv4NPa0Pa0-fra-til"
descriptionl="Dv3refNPa0Pa0-fra-til">
<correspSynUSemU
targetsemu=USEM_V_bevæge_sig_MOV_1"
correspondence="arg1_ADJ_ADJfratil"
description="Dv3refNPa0Pa0-fra-til">
<correspSynUSemU
targetsemu=USEM_V_bevæge_CAM_1"
correspondence="arg12_ADJ_ADJfratil"
descriptionl="Dv4NPa0Pa0-fra-til">
</SynU>
2.2. Criteria for assigning Domain Features
Most of the vocabulary for this deliverable belongs to the domain: General. Specific readings belonging to particular domains have been assigned an appropriate domain from the domain list. Wrt. to domain assignment we have to a large degree followed the encodings made in Nudansk Ordbog. See Section 3 for the statistics for Domain.
2.3. Criteria for assigning Semantic Class and Template Type
Semantic Class and Template Types have been assigned according to the guidelines given by the Specification Group. In most cases, the templates are so well-defined in the guidelines that it has been more or less unproblematic to assign templates to the words. In some cases, however, the features proposed in the templates have been too specific as to count for all the words that would naturally fit into the template. This is in particular the case for events. To give an example, the template CHANGE_LOCATION has as a type-defining feature, the event type ‘transition’. However, in the Danish lexicon we have encountered several ‘change of location’ verbs which denote processes rather than transitions such as falde (fall) and dale (descend) where the result phase is not expressed implicitly. One could argue that such verbs should therefore rather be encoded under the template MOVE. But the ‘change of location’ feature seems to be so essential for these two verbs that it doesn’t seem convenient to encode them as ‘manner of motion’ verbs either.
Also for the group of abstract nouns we have sometimes found it difficult to assign templates to the words. Somehow too many words did not seem to fit into the seven more specific abstract template types and therefore simply had to be assigned the mother node "abstract entity". In this template group we therefore find very different words like alibi (alibi), fødekæde (food chain) and harmoni (harmony), which do not share much meaning content. We also found it a bit difficult to distinguish between the groups "Moral Standards" and "Cognitive Fact", for instance in the case of the word holdning (attitude), which on the one hand just means a way of thinking about something, but on the other hand could be considered a question of moral. In the template group "Cognitive Fact" we have encoded words of "thinking": tanke (thought), viden (knowledge), but also words of feeling: jalousi (jealousy), henrykkelse (delight) etc., though one could discuss whether these words of ‘feeling’ are events more than cognitive facts.
2.3.1. Language specific typing
In accordance with the guidelines, argument structure and selectional restrictions are encoded as language specific typing. For the basis of selectional restrictions we have applied the template ontology by stating restrictions as follows:
<InformArg
id="ArgHuman"
comment="human"
status="CHECK"
weightvalsemfeaturel="WVSFTemplateHumanPROT">
<InformArg
id="ArgAnimal"
comment="animal"
status="DEFAULTCHECK"
weightvalsemfeaturel="WVSFTemplateAnimalPROT">
<InformArg
id="ArgHumanAnimal"
comment="animal or human"
status="CHECK"
weightvalsemfeaturel="WVSFTemplateHumanPROT WVSFTemplateAnimalPROT">
<InformArg
id="ArgHumanVehicle"
comment="human or vehicle"
status="CHECK"
weightvalsemfeaturel="WVSFTemplateHumanPROT WVSFTemplateVehiclePROT">
We have found it rather inconvenient that optionality is encoded here as a value to the feature ‘status’. Since optionality is already stated in syntax we see no reason for encoding it again here (although we have followed the guidelines in this respect).
The very large amount of semantic units represented under the template ARTIFACT (457 semu’s) gives an indication of the fact that this category may require further splitting. We have felt the need for an additional subtemplate denoting electronic or mechanical devices
The interesting thing about electronic and mechanical devices is that they expose a different distribution than other artifacts in the sense that they can ‘work by themselves’ and thus can often fill in selectional slots which are very similar to human beings. This in particular counts for computers; consider for example the following corpus excerpt:
Så spørger computeren om cyklisten holder rigtigt og børnene skal så ved hjælp af musen klikke på enten ‘ja’ eller ‘nej’
(then the computer asks whether the biker is in the right place and the kids are then to click on either ‘yes’ or ‘no’ with the mouse)
2.3.3. Criteria for encoding Semantic Relations
We have focused on linguistically relevant semantic relations. All type-defining, obligatory semantic relations have been encoded. Apart from this some essential relations have been encoded in cases where we believed them to have strong linguistic relevance. In most cases, we have followed the definition given in Nudansk Ordbog. This means that when a feature has been represented as part of the definition for a given word, we have included this feature as a semantic relation in the formal part of the semantic unit.
Consider the relation ‘has_as_parts’. This is in many cases a semantic relation which describes what we would call a ‘world-knowledge’ aspect of a word. For instance, we would not encode a ‘has_as_parts’-relation on the noun hus (house) since we believe that it is not linguistically crucial for this word that it contains walls, roof, floors, and windows etc.. This hypothesis is supported by the definition in Nudansk Ordbog for the word hus : en bygning som udgør en selvstændig enhed, og som anvendes til beboelse (a building which constitute an independent unit and which is used for habitation). In contrast, for the noun trappe (staircase) the definition does imply a ‘has_as_parts’-relation: et antal sammenhængende trin som man kan gå op el. ned ad (a number of steps of which you can go up or down); thus this word is encoded with the relation trappe ’has_as_parts’ trin:
<SemU
id="USEM_N_trappe_ART_1"
naming="trappe"
example=" Ruten i Leeds er uhyggelig hård - indeholder således en lang trappe, der skal forceres med cyklen på ryggen"
comment="full BSP"
freedefinition=" et antal sammenhængende trin som man kan gå op el. ned ad (NDO)"
weightvalsemfeaturel="
WVSFTemplateArtifactPROT
WVSFUnificationPathConcreteentity-Agentive-TelicPROT
TSVP_ARTIFACT_TS_classificateur_de_nom_C">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_genstand_ENT_1"
semr="SRIsa">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_V_fremstille_1"
semr="SRCreatedby">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation, gå op og ned"
target="USEM_V_gå_1"
semr="SRUsedfor">
<RWeightValSemU
weight="ESSENTIAL"
comment="Semantic relation"
target="USEM_N_trin_ART_1"
semr="SRHasaspart">
</SemU>
A similar situation can be found with many compounds in Danish. Here an essential (non-type-defining) feature can often be used to express exactly the relation that holds between the two parts of the compound; consider for instance the examples below of two kinds of containers in Danish, vinflaske (wine bottle) which ‘contains vin’ (wine) and blikdåse (tin can) which is ‘made of blik’ (tin)
<SemU
id="USEM_N_vinflaske_CON_1"
naming="vinflaske"
example="en vinflaske kan genbruges syv til otte gange"
comment="full BKK"
freedefinition="flaske til vin"
weightvalsemfeaturel="
WVSFTemplateContainerPROT
WVSFUnificationPathConcreteentity-ArtifactAgentive-TelicPROT
TSVP_NOTION_TS_classificateur_de_nom_C">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_flaske_CON_1"
semr="SRIsa">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_V_fremstille_1"
semr="SRCreatedby">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_V_indeholde_1"
semr="SRUsedfor">
<RWeightValSemU
weight="ESSENTIAL"
comment="Semantic relation"
target="USEM_N_vin_ARD_1"
semr="SRContains">
</SemU>
<SemU
id="USEM_N_blikdåse_CON_1"
naming="blikdåse"
example="en urtepotteunderskål, hvori man omvendt har sat en tom blikdåse, som fyldes med vand"
comment="full BKK"
freedefinition="dåse lavet af blik"
weightvalsemfeaturel="
WVSFTemplateContainerPROT
WVSFUnificationPathConcreteentity-ArtifactAgentive-TelicPROT
TSVP_NOTION_TS_classificateur_de_nom_C">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_dåse_CON_1"
semr="SRIsa">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_V_fremstille_1"
semr="SRCreatedby">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_V_indeholde_1"
semr="SRUsedfor">
<RWeightValSemU
weight="ESSENTIAL"
comment="Semantic relation"
target="USEM_N_blik_ARS_1"
semr="SRMadeof">
</SemU>
In general, we have applied a template-driven approach in the sense that each encoder has been responsible for a specific set of templates in order to ensure as large a degree of consistency among encoders as possible as regards the semantic relations to be applied within a template type. For instance, we have striven towards a homogenous level of specificity as well as a consensus on which of the more general Targetsemu’s to be applied for each relation.
2.3.4. Criteria for encoding Derivation Relations.
Derivation relations are not encoded in the Danish lexicon.
Synonymy
We have chosen to give information on synonyms in the cases where a synonym is mentioned in the Danish dictionary we use to retrieve our definitions, as long as the synonym is represented in the PAROLE dictionary.
An example, seen below, are the two words knække and brække (both meaning "cause to break"), encoded in the template group "cause change of state":
1) <SemU
id="USEM_V_brække_CCS_1"
naming="brække"
example="Jeg var målløs. Han sparkede på bilen, knuste lygterne og brækkede antennen"
comment="full BC 200203548 SN"
freedefinition="få noget til at brække(NDO)"
weightvalsemfeaturel="
WVSFTemplateCauseChangeofStatePROT
WVSFTemplateSuperTypeCauseRelationalChangePROT
WVSFEventTypeTransitionPROT
TSVP_CHANGE_TS_classificateur_de_verbe_C">
<PredicateRepresentation
typeoflink="MASTER"
predicate="PRED_brække_CCS_1">
<RWeightValSemU
semr="SRAgentiveCause"
target="USEM_V_ændre_CCS_1"
weight="PROTOTYPICAL">
<RWeightValSemU
semr="SRResultingState"
target="USEM_ADJ_itu_QUA_1"
weight="PROTOTYPICAL">
<RWeightValSemU
weight="ESSENTIAL"
comment="Synonym relation"
target="USEM_V_knække_CCS_1"
semr="SRSynonym">
</SemU>
2)
<SemU
id="USEM_V_knække_CCS_1"
naming="knække"
example="hvis man knækker skaftet udleveres en ny spade"
comment="full BC 200203548 SN"
freedefinition="få noget til at knække (NDO)"
weightvalsemfeaturel="
WVSFTemplateCauseChangeofStatePROT
WVSFTemplateSuperTypeCauseRelationalChangePROT
WVSFEventTypeTransitionPROT
TSVP_CHANGE_TS_classificateur_de_verbe_C">
<PredicateRepresentation
typeoflink="MASTER"
predicate="PRED_knække_CCS_1">
<RWeightValSemU
semr="SRAgentiveCause"
target="USEM_V_ændre_CCS_1"
weight="PROTOTYPICAL">
<RWeightValSemU
semr="SRResultingState"
target="USEM_ADJ_itu_QUA_1"
weight="PROTOTYPICAL">
<RWeightValSemU
weight="ESSENTIAL"
comment="Synonym relation"
target="USEM_V_brække_CCS_1"
semr="SRSynonym">
</SemU>
We imagine that links between synonyms in the dictionary could be very useful for many purposes, for instance in applications for information retrieval. It also helps to speed up the encoding process since the entries of two, or sometimes even three, synonymous words can be made easily at the same time.
Polysemy
Regular polysemy - when groups of related words display the same ambiguity - is handled in a uniform way in the SIMPLE model via the identification of a set of well-established regular semantic classes for nouns, which are adjusted for each of the languages involved. While unsystematic ambiguous readings of a word are represented as totally unrelated semantic units, regular polysemous senses can be encoded as interlinked semantic units. This is represented by the information slot complex, whose value is the polysemous class to which the semantic unit belongs as seen below for Dragør (Dragør - Danish village) in the semantic unit for the human group sense of the word:
<SemU
id="USEM_N_Dragør_HUG_1"
naming="Dragør"
example=" Dragør må i år af med godt 31 mill. kr. til den kommunale udligning"
/This year Dragør must pay approx. 31 mill. crowns to the community equalization /
comment="full BSP"
freedefinition="de mennesker der bor i Dragør eller som træffer belutningerne der"
weightvalsemfeaturel="
WVSFTemplateHumanGroupPROT
WVSFTemplateSuperTypeGroupPROT
TSVP_GROUP_NAMES_TS_classificateur_de_nom_C">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_befolkning_HUG_1"
semr="SRIsa">
<RWeightValSemU
weight="PROTOTYPICAL"
comment="Type-defining semantic relation"
target="USEM_N_indbygger_HUM_1"
semr="SRHasasmember">
<RWeightValSemU
weight="PROTOTYPICAL"
target="USEM_N_Dragør_GEO_1"
semr="SRPolysemyHumanGroup-GeopoliticalLocation">
</SemU>
In the Danish lexicon the most productive cases of regular polysemy involving concrete nouns are the following:
Other well-known polysemous pairs are not productive in Danish, as for example 'people / language' and 'flower / colour', where only a few examples of each can be found. This difference relates to the distinction made by Apresjan (apud Malmgren, 1988) between productive and regular polysemy. Here productive polysemy refers to cases where more or less the whole group of nouns within a semantic class display the same polysemy relations, whereas regular polysemy refers to cases where at least two words - but not the whole class - follow the same polysemy pattern.
A more extensive, empirically-based study of regular semantic polysemous classes of Danish nouns has not yet been carried out. However, the corpus-oriented approach used during the encoding of the Danish SIMPLE lexicon facilitates the identification of new polysemous classes, since the differences in distributional patterns of the encoded words senses are a good indication of whether a regular polysemy relation could be involved. It should be noted, however, that the common polysemy classes established in the project are not totally unproblematic in this respect. One would expect that the classes established would expose different distributional patterns in the corpus; however, this is not always the case. A well-established test for examining such patterns is the so-called zeugma test: two different senses of a word are expected to create a zeugma (i.e. nonsense) if they are put together in the same phrase, as is the case for the regular polysemy class that holds between geopolitical location and human group:
*Danmark, som er et fladt og grønt land, nedlagde veto mod forslaget i Europakommissionen
(Denmark, which is a flat and green country, vetoed the proposal in the European Commission)
Nevertheless, for the semiotic artifact/information polysemy relation this is not the case as seen in the example below which clearly combines the two senses in one construction:
menukortet, der var dekoreret med en kopi af Arne Haugen Sørensens maleri ‘Skovkentaur med dame’, var varieret og ganske indbydende
(the menu, which was decorated with a copy of Arne Haugen Sørensens painting ‘Forest centaur with lady’, was varied and rather appetising)
This example leads to the discussion of the constraints that should be satisfied in order to establish two semantic units. If they are not distinguished in corpus via different distribution what are the criteria then for defining two senses ? In the particular case of semiotic artifact/information we are tempted to believe that this phenomenon should rather be categorised as a case of semantic vagueness than as a case of polysemy since we in a given context can refer to either meaning aspect OR both at the same time.
We have not encoded regular polysemy relations on verbs. It is characteristic for Danish that it has far less cases of regular polysemy for verbs that e.g. English, and we found that it would require a more detailed investigation to decide which of the many classes described in the guidelines would be relevant in the Danish lexicon. However, this work is foreseen in the Danish follow-up lexicon project.
2.5. Representation of Predicative information.
Words which take argument all include a predicate object, in which the argument structure is described. In the Danish lexicon we have chosen to name the arguments in accordance with the LINDA specifications (Underwood et al. 1996) where a principled analysis is given of the argument structure of Danish verbs and nouns. The grammatical subject is in most cases assigned ARG1, the grammatical and prepositional object ARG2 and weakly bound prepositional complements are assigned the function ADJUNCT. ARG0 is reserved for semantically empty subjects in the LINDA specifications, as in constructions like det regner ("it is raining"), and this kind of argument is not described in a predicate, but only taken care of in the syntactic description (at the syntactic level).
As regards selectional restrictions, we apply ontological types, only. When for an argument we want to express that it can refer human groups only, we simply refer to the ontological type ‘human group’ via the so-called Informarg objects:
<InformArg
id="ArgHumanGroup"
comment="human"
status="CHECK"
weightvalsemfeaturel="WVSFTemplateHumanGroupPROT">
The semantic roles are assigned to each argument according to the list in the guidelines on events.
Only we have felt the need to introduce an additional role, "NonProtoAgent", for subjects of the type flaget vajer (the flag waves).
Phrasal verbs
Phrasal verbs have caused several problems during the encoding phase. Phrasal verbs are very frequent in Danish and therefore it is important to strive towards a principled treatment of these.
In traditional Danish lexicography, we distinguish between two kinds of phenomena: namely phrasal constructions vs. phrasal verbs. The basic criterion for this distinction relies on transparency: if either the verb or the particle is not transparent in meaning, i.e. diverge from its original or prototypical meaning then we prefer a lexicalised interpretation meaning in traditional lexicography that we would establish a sublemma to the verb in question. This is the case for vaske op (lit: ‘wash up’ meaning ‘do the dishes’) where vaske more or less preserve the original meaning whereas op (‘up’) clearly does not. In contrast, if the meaning is more or less predictable on the basis of the original meaning of the two words then we prefer a valency interpretation of the particle, as in for instance grave noget op/ned (‘dig something op/down) (see Braasch & Pedersen in press).
In the Danish Parole syntax such a distinction has not been established mainly due to the fact that the syntax does not really allow for such a distinction: irrespective of the internal nature of the particle construction, the particle is always expressed in the so-called ‘self’. This gives an overall splitting strategy as follows:
MORPHOLOGY SYNTAX SEMANTICS
We interpret this as a kind of lexicalisation, having as a consequence that all phrasal verb/constructions in Danish are treated as lexicalisations. This lack of distinction provokes problems when dealing with semantics. As it is now we have been enforced to encode different semantic units to what is basically the same meaning of a word since the particles in such cases are not assigned a valency function but rather are considered as part of the lemma.
Consider the example below for the verb løbe. Two syntactic units have been established; the first one describes a construction like han løb (fra Roskilde) (til København) (he ran from Roskilde to Copenhagen); the second a construction like han løb ud (he ran out). Semantically, we would prefer to treat these as one semantic unit with a directional adjunct which can be expressed either as a PP or as a directional particle. However, as it is now we are enforced to encode - apart from the ‘basic’ sense of løbe - a phrasal verb construction of løbe ud/ind/op/ned (run out/on/up/down) which is fully predictable in meaning and which furthermore is considered to take only one argument (since the directional particle is considered to be a lexicalised part of the lexeme løbe).
<SynU
id="Usyn2016"
naming="løbe"
attestation="cn"
description="Dv3Pa0Pa0v-fra-til"><correspSynUSemU
targetsemu=USEM_V_løbe_MOV_1"
correspondence="arg1_ADJ_ADJ" ></SynU>
<SynU
id="Usyn5152"
naming="løbe"
attestation="cn"
description="Dv1xdv-dir"><correspSynUSemU
targetsemu=USEM_V_løbe_ud_ind_op_ned_MOV_1"
correspondence="arg1" ></SynU>
We would have preferred a valency interpretation of all particles at the syntactic level leaving for the semantics to consider whether the meaning was predicable or not. This would also fit nicely into the ‘split late’ strategy adopted in the project and would leave the semantic distinction where it belongs: in semantics. Consider the figure below where such an approach is adopted for grave and vaske respectively:
MORPHOLOGY SYNTAX SEMANTICS
vaske op (do the dishes)
Such a strategy would also be convenient for the really complex cases (which again are rather frequent in Danish) where both a predictable and a non-predictable meaning is found, as for gå op which can mean either ‘go up’ or ‘cancel out’:
MORPHOLOGY SYNTAX SEMANTICS
gå op (cancel out)
Here the predictable sense (go up) is treated as one semantic unit together with the normal gå sense with an optional directional adjunct, whereas the ‘cancel out’ has its own semu belonging to a different node in the event ontology.
At a longer term, we will consider such a reorganisation of our lexicon; however, within the scope of SIMPLE, we are not in capable of performing such a large change to the PAROLE lexicon.
Figurative senses
When using a corpus to find the distribution of the different meanings of words being encoded, we have noticed that in many cases the concrete meaning of a word is rarely represented in the text, whereas we often find a high frequency of a figurative sense of the word instead. We haven’t systematically coded these figurative word senses, which we find are somewhat problematic, since it seems to be the most frequent use of many words in written language. As an example we could mention the word stormvejr (stormy weather) which is only encoded as a weather phenomenon, but which in fact in corpus is mostly used in the meaning of a hectic situation. Often these kinds of meanings are not described in the dictionary we use as our resource, and since these meanings are very abstract, they are quite difficult to place in the semantic hierarchy, at least in the case of abstract nouns. As regards verbs though, the event ontology seems to cover very well also the figurative senses. We have only been missing one ontological type, namely one to cover the metaphoric event senses ‘to move in time’ or ‘time passing’, which we have found were quite common figurative senses of motion verbs in our corpus. One example is with the verb passere (pass), which is encoded with the concrete sense ‘Change of location’, but which also has a figurative sense ‘to move in time’:
vi skal passere år 2000 , før alle danske biler kører med katalysator
(we will have to pass the year 2000 before all Danish cars run with catalytic converter)
In a future extension of the Danish SIMPLE lexicon, we do feel a need for developing the treatment of figurative senses of words in order to be able to cover written text in a better way.
Domains applied in the encodings:
|
agriculture |
|
|
air_transport |
|
|
arts |
|
|
astronomy |
|
|
baby_care |
|
|
biochemistry |
|
|
botany |
|
|
bus_transport |
|
|
business |
|
|
car_transport |
|
|
chemistry |
|
|
civil_law |
|
|
commerce |
|
|
computing |
|
|
diplomacy |
|
|
drink |
|
|
economics |
|
|
education |
|
|
entomology |
|
|
ethnology |
|
|
fashion |
|
|
film |
|
|
finance |
|
|
fishing |
|
|
food |
|
|
freshwater_fishing |
|
|
furnishing |
|
|
geography |
|
|
geology |
|
|
geometry |
|
|
gymnastics |
|
|
health |
|
|
history |
|
|
home_and_garden |
|
|
hotel_business |
|
|
inland_waterway_transport |
|
|
law |
|
|
librarianship |
|
|
life sciences |
|
|
linguistics |
|
|
livestock_farming |
|
|
logic |
|
|
|
|
|
mathematics |
|
|
mechanical_engineering |
|
|
media |
|
|
medicine |
|
|
military |
|
|
mineralogy |
|
|
music |
|
|
ornithology |
|
|
physical sciences |
|
|
physics |
|
|
physiology |
|
|
poetics |
|
|
politics |
|
|
politics and government |
|
|
psychology |
|
|
publishing |
|
|
rail_transport |
|
|
religion |
|
|
restauration |
|
|
road_transport |
|
|
sailing_yachting_and_boating |
|
|
sciences |
|
|
sea_transport |
|
|
ship_building |
|
|
sociology |
|
|
sports and leisure |
|
|
subway_transport |
|
|
taxation |
|
|
transport |
|
|
trucking |
|
|
zoology |
Semantic Classes applied in the encodings:
|
ABSTRACT |
|
AGENCY |
|
AMPHIBIAN |
|
ANIMAL |
|
ARTIFACT |
|
ATTRIBUTE |
|
BIO |
|
BIRD |
|
BODY |
|
BODY_PART |
|
BUILDING |
|
CHANGE |
|
COGNITION |
|
COGNITIVE_FACT |
|
COLOR |
|
COMMUNICATION |
|
COMPETITION |
|
CONCRETE |
|
CONSUMPTION |
|
CONTACT |
|
COULEUR |
|
CREATION |
|
CURRENCY |
|
DAY |
|
EMOTION |
|
ETHNOS |
|
FISH |
|
FLOWER |
|
FORM |
|
FRUIT |
|
FURNITURE |
|
GARMENT |
|
GEOG |
|
GEOGRAPHY |
|
GROUP_NAMES |
|
HUMAN |
|
IDEO |
|
INANIMATE |
|
INSECT |
|
INSTRUMENT |
|
LETTER |
|
LIVING_BEING |
|
LOCATION |
|
MAMMAL |
|
MATTER |
|
MEASURE_UNIT |
|
MICROORGANISM |
|
MOLLUSC |
|
MONTH |
|
MOTION |
|
MUSHROOM |
|
NOTION |
|
OBJECT |
|
OCCUPATION |
|
OCCUPATION_AGENT |
|
PART |
|
PERCEPTION |
|
PERIOD |
|
PERIODE |
|
PLANT |
|
POSSESSION |
|
PSYCHOLOGICAL_FEATURE |
|
REPTILE |
|
SHRUB |
|
SOCIAL |
|
STATIVE |
|
SUBSTANCE |
|
TIME_PERIOD |
|
TREE |
|
VEHICLE |
|
WEATHER |
List of Polysemy Relations applied:
|
Agentofpersistentactivity-Profession |
|
Animal-Food |
|
Animal-Material |
|
Area-Humangroup |
|
Area-Institution |
|
Building-HumanGroup |
|
Building-Institution |
|
Container-Amount |
|
Convention-Semioticartifact |
|
Flavouring-Plant |
|
Flower-Colour |
|
Flower-Plant |
|
Food-Animal |
|
Fruit-Plant |
|
GeopoliticalLocation-HumanGroup |
|
HumanGroup-Building |
|
HumanGroup-GeopoliticalLocation |
|
HumanGroup-Institution |
|
Information-Semioticartifact |
|
Institution-Building |
|
Institution-HumanGroup |
|
Language-People |
|
Location-HumanGroup |
|
Material-Animal |
|
Material-Plant |
|
Opening-Artifact |
|
People-Language |
|
Plant-Flavouring |
|
Plant-Flower |
|
Plant-Fruit |
|
Plant-Material |
|
Plant-Substance |
|
Plant-Substancefood |
|
Semioticartifact-Container |
|
Semioticartifact-Information |
|
Substance-Colour |
|
Substance-Plant |
List of Semantic Relations applied in the encodings:
|
Agentive |
|
AgentiveCause |
|
Concerns |
|
Constitutiveactivity |
|
Contains |
|
Createdby |
|
Derivedfrom |
|
Hasascolour |
|
Hasasmember |
|
Hasaspart |
|
Indirecttelic |
|
Instrument |
|
Isa |
|
Isafollowerof |
|
Isamemberof |
|
Isapartof |
|
Isin |
|
Istheabilityof |
|
Istheactivityof |
|
Isthehabitof |
|
Livesin |
|
Madeof |
|
Measuredby |
|
Objectoftheactivity |
|
Producedby |
|
Produces |
|
Propertyof |
|
Purpose |
|
Quantifies |
|
Relates |
|
Relatedto |
|
ResultingState |
|
Resultof |
|
Successor |
|
Successorof |
|
Synonym |
|
Telic |
|
Usedas |
|
Usedby |
|
Usedfor |
Bibliography
Braasch, A., A. B. Christensen, S. Olsen & B.S. Pedersen (1998) 'A Large-Scale Lexicon for Danish in the Information Society', in: Proceedings from First International Conference on Language Resources & Evaluation, Granada 1998.
Braasch, A., B. Pedersen (1999). ‘En stor sprogteknologisk ordbog for dansk - med særligt fokus på håndtering af flertydighed i den niveaudelte ordbog’, in: P. Widell (ed.) 7. Møde om Udforskning af Dansk Sprog, Århus Universitet.
Bergenholtz, H., (1990). 'DK87-DK90: Dansk korpus med almensproglige tekster', in: M. Kunøe & Erik Larsen (eds.) 3. Møde om Udforskning af Dansk Sprog, Aarhus Universitet.
Boje, F. & L. Schøsler (ed.) (1992). ‘DISEM - A Semantic MT-Component’ in: CST Working Papers no. 1, Center for Sprogteknologi, Copenhagen.
Christ, O. (1993) : The Xkwic User Manual. Institut für maschinelle Sprachverarbeitung, Universität Stuttgart.
Kjærulff Nielsen: Engelsk- Dansk Ordbog, Gyldendal, Copenhagen.
Malmgren, S. (1988). ‘On Regular Polysemy in Swedish’, in: Studies in Computer-Aided Lexicography, Almquist & Wiksell, Stockholm.
Nimb, S. & B. Pedersen (2000). ‘Treating Metaphoric Senses in a Danish Computational Lexicon – different cases of regular polysemy’, in: EURALEX 2000, Stuttgart, Germany.
Pedersen, B., & S. Nimb (2000) ‘Semantic Encoding of Danish Verbs in SIMPLE – Adapting a verb-framed model to a satellite-framed Language’, in Second International Conference on Language Resources and Evaluation, Athens, Greece.
Politikens Store Nye Nudansk Ordbog, Version 2.1, Politikens Forlag, Copenhagen.
Underwood, N., C. Povlsen, P. Paggio, A. Neville, B.S. Pedersen, L. Jørgensen, B. Ørsnes, A. Braasch (1996). LINDA, Linguistic Specifications for Danish, Technical Report, Center for Sprogteknologi, Copenhagen.