SIMPLE LE4-8346
WP01
SIMPLE - LEXICON DOCUMENTATION
* * *
|
Document first version date |
26/04/00 |
|
|
||
|
Document date |
28/04/00 |
||||
|
DocumentID |
WP1 |
||||
|
Version |
02 |
|
|
||
|
Doc. type |
QAP* |
|
|
||
|
Document status |
to be validated |
|
|
||
|
Validation type |
|
|
|
||
|
Comments |
|
||||
|
|
|
|
|
||
|
|
Name |
Organisation |
Purpose |
||
|
|
|
|
|
||
|
From |
Leiden team LEI |
INL |
documentation |
||
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
||
|
To |
Coordinators, Reviewer |
|
Documentation deliverable D.03.3.2 |
||
|
|
|
|
|
||
|
|
|
|
|
||
Lexicon Documentation DUTCH
0 Introduction
This documentation concerns the SIMPLE part (semantic layer) of the Dutch PAROLE lexicon. For extensive documentation on the Dutch PAROLE lexicon, we refer to the INL's website: www.inl.nl
Contents of this report:
|
1. General design information 1.1 Lexicon population 1.2 Current lexicon contents 1.3 Sample of 100 entries 1.4 Tools 1.5 Impact of SIMPLE/PAROLE 1.6 Remaining work 2. Semantic encoding
2.2. Criteria for assigning Domain features 2.3. Criteria for assigning Semantic class and template type 2.4. Classes derived from encoding 2.5. Representation of Predicative information 2.6. Problems encountered 3. Statistics 4. Bibliography 5. List of Appendices Appendices 1 – 4. |
2 2 3 4 4 6 6 7 7 8 9 10 10 12 13 13 13 14-55
|
1. General design information
1.1. Lexicon population
The Dutch SIMPLE lexicon dd. 28 April 2000 contains 10,472 semantic units (Usems): 7326 noun Usems, 2114 verb Usems and 1032 adjective Usems. The 10,472 Usems are distributed over 3710 lemmata (head words): 2797 nouns, 559 verbs and 354 adjectives. 681 lemmata cover one or more Base Concepts (see below).
For each part of speech, starting point for the lemmalist to be provided with Usems, were the English base concepts (BC) selected by the Linguistic Specification Group from EuroWordnet lexicon (see general SIMPLE documentation). For each BC, a set of related Dutch equivalents (near-synonyms) was chosen, keeping the concept in mind (to avoid mere translation of the English word). For a number of BC's, no appropriate Dutch equivalent could be found. The whole set of Dutch equivalents was checked on occurrence in the Dutch PAROLE lexicon, so as to be able to connect the semantic descriptions to syntactic descriptions in the PAROLE lexicon. For nouns, occurrence in the Dutch EuroWordnet lexicon could also be checked (by cooperation of Piek Vossen), which resulted into a finetuned list of Dutch equivalents. For verbs and adjectives, this comparison was not possible due to pragmatic reasons. Per BC, a 'prototypical' Dutch equivalent was selected. There were three reasons for selecting more than one prototypical equivalent per BC: (1) 'real' synonyms from which a choice would be too arbitrary, (2) no single lemma covering the BC could be found, (3) for nouns: the preferred prototypical lemma was at the time not in the Dutch EuroWordnet lexicon. Totally 681 SIMPLE lemmata cover one or more BC's: 371 nouns, 172 verbs and 138 adjectives. See appendix 1 for the noun, verb and adjective lemmata covering one or more BC's.
The lemmalists were extended on the basis of an automatically lemmatized type-frequency list derived from the Dutch PAROLE Reference corpus (the corpus itself is not yet lemmatized; cf. 1.5). This list was compared with the PAROLE lemma list. The list of matching entries was ranked from high to low frequency. In the range of lower frequencies, priority was given to lemmata corresponding with BC's.
Coverage and completeness
BC-meanings were covered by Dutch Usems as precisely as possible, in view of the multilingual links foreseen.
Other meanings per lemma were selected by consultation of several medium-sized dictionaries of Dutch, and if necessary with other reference works (a.o. Wordnet). A criterion generally applied was that meanings shared by at least two of the dictionaries were selected as Usems for SIMPLE, based on the assumption that these meanings can be considered 'standard Dutch'. However, meanings that were considered outdated or obsolete (relics from older dictionaries) were not included. Domain specific meanings were included due to their importance for language technology (the domain field in SIMPLE).
For reasons of the rather limited size of the PAROLE lexicon (ca. 20,000 lemmata) and the SIMPLE lexicon ( 3,710 lemmata), target Usems in the qualia roles (formal, agentive, constitutive and telic) were determined on the basis of their suitability as target Usem, rather than their occurrence in the PAROLE or SIMPLE lexicon. The tables in 2.3.3. show that at the lemma level, target Usems are covered by the PAROLE lexicon between 76% and 85%, and by the (much smaller) SIMPLE lexicon between 36% and 50%.
1.2. Current Lexicon Contents
The standard templates delivered by the Specification Group were loaded into a database. For each Usem, the lexicographers fill in a database template form. The template forms are automatically converted into the SIMPLE SGML format, by software developed for the purpose. For further details about tools, see section 1.4. A lexicographer's manual (written in Dutch), which was of course based on the guidelines but specified some aspects and some working procedures, was used in order to enhance quality and consistency among the lexicographers.
Apart from the obligatory template fields, the lexicographers compiled the recommended 'type hierarchy information' ('template_supertype', 'unification_path'), 'polysemous class' and 'qualia roles' (formal, agentive, constitutive and telic), for reasons of their relevance as establishing relationships between different (groups of) lemmata irrespective of Part of Speech, and more specifically between specific meanings of lemmata. It is just this feature that is missing, or only partially or not systematically treated in dictionaries. Particular these relationships will be used in one of our institutional projects on the longer term (see 1.5.).
Up to now, all noun Usems are compiled as for obligatory fields and the recommended fields mentioned. For the verb and adjective Usems, the recommended and obligatory fields are compiled, but the obligatory fields 'predicative representation', 'selectional restrictions' and link with syntax have not yet been finished (this is reported on in the last bimonthly). This is mainly due to the complexity of these fields, the time we needed to become familiar with the matter and its representation in SGML, and the problems we met (cf. 2.1, 2.5). However, now work is going on without major problems and completion is guaranteed by permanent staff with thorough knowledge of the matter working on it. Additionally, we found a way to continue contracts with temporary staff (see last bi-monthly). Apart from the 7326 noun Usems, ca. 450 verb Usems and ca. 150 adjective Usems have these fields finished now.
Appendices 1- 4 show more detailed information on the current lexicon contents.
1.3. Sample of 100 entries
The Dutch sample of 100 entries has the following characteristics.
As a starting point for the composition of the entrylist, we adopted the distribution figures applied in SIMPLE: 70% noun, 20% verb and 10% adjective Usems. The sample contains 71 noun entries with 263 Usems, 20 verbs entries with 90 Usems and 10 adjective entries with 43 Usems. In addition to the 20 verb entries, 9 Usems of 6 verbs that have a master link with noun Usems in the sample, have been included, which results in a total of 99 verb usems. The sample contains a total of 405 Usems.
The noun and adjective entries are the most frequent ones from the corresponding lemmalists (cf. 1.1.). Due to the short term of preparation of the review, the selection of the verbs is less elegant: we selected 2 x 10 verbs from 2 working files used for the compilation of the predicative fields (cf. 1.2). In order to demonstrate the non-master/master link between noun and verb Usems, the 9 Usems of the corresponding 6 verbs were added. These noun/verb Usem pairs concern: begin_1 ~ beginnen_4; gebruik_1 ~ gebruiken_1; leven_1, _2, _6, _7, _8 ~ leven_1, _6; onderzoek_1 ~ onderzoeken_1; werk_1,_4,_5,_6,_7 ~ werken_1,_2; wil_1,_2,_3 ~ willen_1,_3.
See appendices 2B-4B for the number of Usems per template type, per domain and semantic class for the Usems in this sample, separately for each Part of Speech.
1.4. Tools
Software tools used by INL for SIMPLE are:
Database load tool
This tool (written in Perl) converts the standard 'skeleton' templates provided by the Linguistic Specification Group into an SQL command file, mainly consisting of 'insert'-statements. The tool searches for the relevant attribute-value pairs and builds the appropriate insert statement for it. The attribute-value pairs are easy to recognise as they are mostly in in the format: attribute:value.
When the command file is executed against the SIMPLE database, the skeleton template is loaded.
As an example we take the template for artwork. For this example we assume that it only consists of:
Usem: 1
Template_Type: [Artwork]
Unification_path: [Concrete_entity | ArtifactAgentive | Telic]
Applying the tool results in the following insert statement:
insert into template (USEM, TEMPLATE_TYPE, UNIFICATION_PATH) values (1,'[Artwork]', '[Concrete_entity | ArtifactAgentive | Telic]');
Data entry tool
This tool is a data entry form similar to the standard templates, built with Uniface, a visual development environment available for a variety of platforms and databases systems. Although we dot not use the latest release, our release is still perfectly suited for the SIMPLE tasks.
The main functionality for SIMPLE was to enable the lexicographer to add, modify or delete information about Usems. Nearly all this functionality could be generated with Uniface, so we only had to write some additional code for e.g. checking purposes.
Report tools
Report tools (all written in Perl) are used to collect information from the SIMPLE database which is not easy to collect by means of an SQL query. A typical approach is to obtain a set of data from the database and then applying the appropriate report tool.
As an example we look at the report tool for obtaining the number of Usems per domain. One Usem can have more than one value for domain (separated by ,). We need the separate values, so we have to split domain. As this cannot be done by SQL, we use a report tool. E.g the data set obtained from the database (one row represents one usem):
POLITICS_AND_GOVERNMENT, HISTORY, MONARCHY
FINANCE, ECONOMICS
FINANCE, SOCIOLOGY, GENERAL
POLITICS_AND_GOVERNMENT, GENERAL
The result after applying the tool:
|
DOMAIN ECONOMICS FINANCE GENERAL HISTORY MONARCHY POLITICS_AND_GOVERNMENT SOCIOLOGY |
FREQUENCY
2 2 1 1 2 1 |
Conversion to SGML
The software for conversion to SGML consists of a large suite of Perl and C++ programs converting the database contents to SGML and establishing the link with the PAROLE lexicon. The conversion implies formal error correction (due to manual work in the database template forms) and the creation of additional SGML objects. The latter is automized, the former as far as possible.
1.5. Impact of SIMPLE/PAROLE
The PAROLE lexicon is distributed by ELRA, and, for researchers in the Netherlands and Belgium only, by our institute. The PAROLE lexicon (without the SIMPLE part) is furthermore used in two (inter)national projects: the Dutch-Flemish project Corpus Gesproken Nederlands (Corpus of Spoken Dutch, comparable with BNC) and the Dutch project ToKeN2000, which aims at a sophisticated knowledge retrieval system, including modules concerning automatic language generation and spoken answers to questions by users. The latter project will also use the SIMPLE part of PAROLE.
The SIMPLE data will be used in the institutional project Integrated Language Database of 8th-21st Century Dutch, a long term project approved by the Dutch and Flemish governments. This project aims at creating a database in which data from linguistically annoted texts, electronic dictionaries and linguistic files will be linked in a meaningful way, in order to function as an instrument for research into the Dutch language and culture throughout the centuries. Especially the SIMPLE data establishing relationships between different (meanings of) lemmata are interesting for this project (cf. 1.2) and will be further developed in this framework. Furthermore, the PAROLE POS tagset will, with some extensions for the historical periods, probably be used for POS tagging of the texts in the database.
A current activity of our institute is to make the PAROLE corpus accessible over the Internet in a way similar to (but more modern than) the three INL corpora already operational (see www.inl.nl). Work is going on automatic lemmatization, POS tagging according to the PAROLE tagset and global syntactic tagging. A retrieval system is being developed which will give access to this corpus on linguistic parameters.
1.6. Remaining work
As explained in section 1.2., compilation of the fields 'predicative representation', 'selectional restrictions' and link with syntax have not yet been finished for verbs and adjectives, but completion is guaranteed by permament staff with thorough knowledge of the matter working on it. Additionally, we found a way to continue contracts with temporary staff (see last bi-monthly).
Permanent staff will be concerned with checks on quality and consistency.
2. Semantic encoding
2.1. Criteria for Syntax-Semantic linking
General criteria for assigning readings to syntactic descriptions.
Syntax is linked to semantics by way of connecting the positions of the complementation frames of the Parole lexicon entries with the arguments of the Simple predicates. So the starting point for the link between Usem and Usyn are the syntactic descriptions (if any) of a lemma in the PAROLE lexicon. These syntactic descriptions are corpus based. In order to establish the connection we first had to decide about the semantic correspondence of each syntactic complement frame of the lexicon entry with the specific argument structure of the related Usem. Secondly, we had to decide which of the complements of a frame could actually be semantically linked to an argument. These decisions were made by interpretation of the relevant predicate and then compare it with the example phrases belonging to the syntactic complementation frames in question in order to see whether a correspondence might be established.
Criteria for defining language particular 'correspondences' for predicative SynUs.
Adjective: link between syntax and semantics
In the Dutch PAROLE lexicon, the basic fact of an adjective determining a noun is considered to belong to the grammar. In the syntactic frames, as a consequence, there is no position available for the noun in question. So, in our lexicon, adjectival complementation as found in: the man is angry with me, is described in the grammatically correct phrasing: The ‘with me angry’ man, where the entry ‘angry’ has a one-place frame with the prepositional complement PP(with) on the first and only position (P0).
For that reason, up to now, we described adjectival predication as one-place predicates:
where <arg0>=PP(with) with selectional restriction [Living Entity]
This implies that we don’t have the possibility to describe what kind of noun (for example: only living entity) the adjective selects for. Moreover, subject clauses which have the same P0 position in the frame, like:
it is easy for me to do that (read: ‘to do that’ is easy for me)
cannot be linked to a semantic argument either.
This fact causes a considerable loss of semantic information and we plan to change the one-place predicates into two-place predicates:
He is angry with me, the ‘with me angry’ man
where <arg0>=NP-[Living_Entity] and <arg1>=PP(with) [Living_Entity]
By consequence we can only link <arg1> to a syntactic position, and have leave <arg0> unlinked, which is allowed in Simple.
Prototypicality prevails over corpus based presence
As said above, the selection of syntactic complements in the Parole lexicon has been based on their freqency in our corpora. Sometimes, however, we considered a certain complementation pattern as non-prototypical from a semantic point of view. So in these cases, the predicate differs from the corpus based complementation patterns. By consequence, the ID shows that the non-prototypical complementation pattern is possible, but the respective positions have not been linked to an argument. Cf: boek_4, (book_4) with gloss ‘deel van een meerdelig boekwerk’ (part of a multipartite work). We choose for a predicate with only one (prototypical) argument: the name of the book, like the book Genesis. The two possible syntactic complements PP(van), naming the author of a book and PP(over), naming the topic the book is about, are not considered prototypical for that specific noun Usem and is therefore not linked to the only available argument.
Selectional restriction prevents linking
Sometimes different syntactic complements (e.g. an N or a PP) might be linked to one semantic argument, but this is not possible because of the selectional restriction on the argument, eg:
where <arg0> = Role_ProtoAgent and <arg1> = Role_ProtoPatient
The PAROLE lexicon has three descriptions for nota:
(1) de nota Kunstbeleid (the note (called)Art management)
(2) de nota van de minister (the note of the minister)
(3) de nota over het kunstbeleid (the note on art management)
Only two syntactic frames (2 and 3) can be linked. The complement in (1) cannot be linked to <arg1> because of its role: Role_Adjunct instead of Role_ProtoPatient.
2.2. Criteria for assigning Domain Features
The general SIMPLE criterion for the selection of a domain value from the SIMPLE domain list is the topic of texts in which a Usem usually appears, or is most likely to appear. The most specific domain in the hierarchy was selected. If no suitable specific domain was available (e.g. a particular branch of sport), the immediate node was selected. Usems of very common usage have the value 'general'. A Usem can have one or more domain values. If a Usem is both domain specific (i.e. likely to appear in domain specific texts) and of very common usage, both specific domain values and the value 'general' are assigned. See appendix 3A for frequencies of Usems per domain value, separately for each Part of Speech (3.A.1-3.A.3).
2.3. Criteria for assigning Semantic Class and Template Type
The principles for the selection of a semantic class from the SIMPLE hierarchy were essentially the same as for domain: whenever possible the most specific one. Only one value is either assigned or selected from the list. A semantic class value can be specified by a distinctive feature. See appendix 4A for frequencies per semantic class, separately for each Part of Speech (4.A.1-4.A.3).
The selection of a template type was based on knowledge of the ontology and on the linguistic tests and the information per template type provided by the Specification Group. Furthermore, the suitability of the qualia roles and their values was used as a check on the correct template type: the selection was probably wrong if the qualia roles were not suitable for the particular meaning of an Usem. Irrespective of its status as core or recommended, the template type which was considered most suitable and most specific was selected.
After the final and complete set of template types was provided by the Specification Group, the lexicographers checked the selection of template type for noun Usems corresponding with Base Concepts, as at the time of compilation the set of template types was much smaller.
2.3.1. Language specific typing
No language specific typing was applied.
2.3.2. Template subtyping for language specific encoding
No language specific template subtyping was applied.
2.3.3. Criteria for encoding Semantic Relations
Relations. In the standard templates, template-specific relations for the qualia roles are 'predefined', with an indication of their status (type-defining, optional). These were taken as starting point. Whenever a predefined relation could not be filled in sensefully, this is marked as such in the database and then the relation is not implemented in the SGML file. Relations were added if they were judged essential for the particular Usem. These cases are marked as such in the database and implemented in the SGML file.
Choice of target Usems. As said above (1.1), due to the limited size of both the SIMPLE lexicon and the PAROLE lexicon, it was considred not desirable to select only target Usems that are in those lexicons. For this reason, our first criterion for the selection of target Usems was suitability for the specific relation between the Usem and its target Usem.
The present state of the art of coverage of target Usems in the lexicons is as follows. Per Part of Speech, the target Usems were collected and compared with the complete entrylist of the PAROLE lexicon and the SIMPLE lexicon, respectively. Note that 'target lemmata' are lemmatized target Usems, that is a target lemma includes all target Usems with an identical word form.
NOUNS
|
Number of target Usems |
18,523 3,860 2,939 (76%) 1,583 (41%) |
VERBS
|
Number of target Usems |
4,211 1,099 944 (85%) 556 (50%) |
ADJECTIVES
|
Number of target Usems |
1,068 471 378 (80%) 170 (36%) |
2.3.4. Criteria for encoding Derivation Relations (it's optional).
Not applicable. This field has not been compiled.
2.4. Classes derived from encoding
Polysemic relations given in the standard templates provided by the Specification Group were judged for their suitability for a particular Usem. If not, the relation is marked as such in the database template field and not implemented in the SGML file.
Only polysemous relations that have a counterpart in another Usem of the lemma concerned, are implemented in the SGML file.
Synonymy and hyponmy is not (yet) encoded in our lexicon.
2.5. Representation of Predicative information.
Type of link: Master/Non-master
The first step on the predication path was to decide about a Usem’s Type of Link with another Part of Speech item. Our starting point in this has been the basic SIMPLE assumption (Del. 2.1, p34/35): "We assume that verbs and adjectives always have a master relation with the predicate". For that reason even denominal verbs and adjectives are considered to be Master (where their true (etymological) derivation is to be described in the derivation field of the template). Nouns therefore were the only Part of Speech to be decided upon. As a consequence of what has been said, noun Usems are only Master if they cannot be semantically related to an adjectival or verbal predicate. All other noun Usems are non-Master, independently of the factual etymological relation between the noun and the verb or the noun and the adjective.
Some criteria for Non-Master usems
Formal resemblance prevails
Whenever a noun Usem could be semantically related to the predication of two different verbs the semantically most resembling was taken. In the case of:
feeling~feel|feel
we choose for ‘voelen’.
Verb prevailes over Adjective
Whenever a noun Usem could be semantically related to a verbal as well as to an adjectival predicate, the verb was given priority, cf:
association~ associate/associative
Two or more possible adjectives
Whenever a single noun Usem could be semantically related to two (or more) adjectives, the most meaningful was choosen: (the suffix –achtig (‘-like’) can be added to nearly every noun)
monster ~ monsterlijk/monsterachtig (we choose ‘monsterlijk’)
mos ~ mossig/mosachtig (we choose ‘mossig’)
paniek ~ panisch/paniekerig/paniekachtig (we choose ‘panisch’)
Relation of a noun to itself
Sometimes a noun might be semantically related to an adjective, but this adjective is in fact the noun itself in an adjectival function, cf:
model_(noun), model_(adj) in: een model vader (an exemplary father)
We considered the nouns model and moslim to be a Master.
Relation of a noun to another noun: Master
Whenever a noun Usem could only be semantically related to another noun, we considered that noun Usem to be a Master, cf:
ministerie~minister (ministry~minister)
Some criteria for predication
Prototypicality prevails over maximalization
The Guidelines propose to maximalize predication. We somehow restricted this principle by the criterion of prototypicality. For example, in many cases locative arguments are possible, and they are often part of the PAROLE descriptions of nouns. We, however, have only included them in the Simple predication if they were prototypical for the Usem in question. Take for example ‘artikel’ (article), where we consider the locative argument to be prototypical, because ‘artikel_2’, the item published in a paper, is different from, e.g. ‘artikel_3’, an item which only shows up in dictionaries.
Semantic Role
As the semantic roles mentioned in the Simple Guidelines are merely syntactically defined, we were in need of a more semantically driven notion in order to decide upon the semantic role of verbal arguments. For example, syntactic subjects vary considerably where their semantic role is concerned. They can have semantic role ProtoAgent, ProtoPatient or Underspecified. So we introduced the notion of ‘Control’ to decide whether a syntactic subject had role ProtoAgent (cf. Oppentocht 1999). Our working definition of this role is: "someone having control over the act expressed in the verbal or nominal predicate". According to the Guidelines the direct object is always attributed the ProtoPatient-role. We missed the possibility to discriminate between whether the object is ‘undergoing the event’ or ‘benefitting from the event’. Moreover syntactic subjects may also have RoleProtoPatient, cf.
So in addition to the Guidelines definition: "for the direct object and strongly bound prepositional complements", we supplied "and for those subjects which experience or undergo the event expressed by the nominal or verbal predicate".
As a consequence Role_Underspecified is often attributed to subjects which neither control nor experience or undergo the action expressed in the verbal or nominal predicate: cf
his anger showed how much he was affected by this
In addition to the closed list of Semantic Roles, we would be very happy with the semantic roles Beneficiary and Goal.
2.6. Problems encountered.
Problems are reported on in the respective sections.
3. Statistics
See appendices 1 and 2A-4A for information concerning the complete dataset. See for information about the delivered 100 entries appendices 2B-4B.
4. Bibliography
Oppentocht, L. (1999), Lexical Semantic Classification of Dutch verbs. Towards constructing NLP and human-friendly definitions. Ph.D. dissertation, University of Leiden, The Netherlands.
5. List of appendices
Appendix 1: List of lemmata covering one or more base concepts, for nouns (1.1), verbs (1.2) and adjectives (1.3), respectively.
Appendix 2.A: Number of Usems per template type in the complete dataset, for nouns (2.A.1), verbs (2.A.2) and adjectives (2.A.3), respectively.
Appendix 2.B: Number of Usems per template type in the sample of 100 entries, for nouns (2.B.1), verbs (2.B.2) and adjectives (2.B.3), respectively.
Appendix 3.A: Number of Usems per Domain in the complete dataset, for nouns (3.A.1), verbs (3.A.2) and adjectives (3.A.3), respectively.
Appendix 3.B: Number of Usems per Domain in the sample of 100 entries, for nouns (3.B.1), verbs (3.B.2) and adjectives (3.B.3), respectively.
Appendix 4.A: Number of Usems per Semantic Class in the complete dataset, for nouns (4.A.1), verbs (4.A.2) and adjectives (4.A.3), respectively.
Appendix 4.B: Number of Usems per Semantic Class in the sample of 100 entries, for nouns (4.B.1), verbs (4.B.2) and adjectives (4.B.3), respectively.
Appendix 1.1: List of NOUN lemmata covering one or more base concepts
aantal
aanval
aarde
actie
activiteit
afbeelding
afdeling
affiche
agglomeratie
akte
alternatief
apparaat
argumentatie
arts
associatie
auto
avond
baan
bedrag
bedrijf
behandeling
behoefte
bekende
beleid
belevingswereld
bemanning
bericht
bestraffing
bestuur
bevel
beweging
bewering
bezitting
bezoek
bezoeker
biografie
bloedvat
boek
boot
bouw
bouwgrond
bouwmateriaal
brandweer
brief
brochure
buis
buitenkant
bureau
cel
cijfer
club
commercie
commissie
communicatiemiddel
computerprogramma
constructie
cursus
daad
dag
dans
database
denkproces
deskundige
dessin
ding
discipline
district
doel
doelwit
domein
doorgang
drank
drankje
eenheid
eensgezindheid
eigendom
eigenschap
eind
einde
element
etmaal
exemplaar
expert
fabriek
familielid
feest
feestdag
figuur
fout
functie
functionaris
gang
gas
gebaar
gebeurtenis
gebied
gebouw
gedeelte
gedrag
gedragscode
gegeven
geheel
geld
gelijke
geloof
gelovige
geluid
gemeenschap
gemeente
geneesmiddel
gepeins
geslacht
getal
gevoel
gevolg
gewaarwording
gewoonte
gezelschap
god
godsdienst
groep
grond
grondslag
grootte
haar
handeling
hoeveelheid
hond
hoofd
hout
huid
huis
huishouden
hulp
hulpverlener
ideologie
inboedel
informatie
inhoud
instelling
instituut
instrument
jaar
jongen
kaart
kamer
kant
kantoor
karakteristiek
keer
kenmerk
kerk
keuze
kind
kleur
kuil
kunst
kunstenaar
kwestie
land
landbouwproduct
leider
letsel
letter
leven
lichaam
lichaamsdeel
licht
lichtbron
lied
lijn
maand
machine
macht
machthebber
man
manager
manier
materiaal
medewerker
medicijn
mensheid
menu
methode
militair
mislukking
mogelijkheid
moment
musicus
muziek
muziekstuk
naam
nacht
naslagwerk
natuur
natuurkunde
natuurwetenschap
niveau
nummer
olie
omtrek
onderdeel
onderneming
onderwerp
ontwikkeling
oorlog
oorlogvoering
oorzaak
operatie
oppervlak
organisatie
organisme
overeenkomst
overheid
paard
papier
parcours
partij
peil
periode
pijpleiding
plaat
plaats
plicht
positie
poster
preparaat
prestatie
probleem
procedure
proces
product
productie
programma
programmatuur
prospectus
provincie
rand
reactie
redenering
regel
regering
regio
reis
relatie
resultaat
richting
rol
route
ruimte
samenstelling
samenvoeging
schilderij
school
schrijver
segment
seks
serie
set
situatie
software
sokkel
soldaat
soort
sport
sportveld
spraak
staat
stad
stadium
stadsgewest
status
stel
stem
stemgeluid
steun
stijl
stoel
strategie
structuur
struik
substantie
systeem
taak
taal
tafel
tas
team
teken
tekst
telefoonnummer
terrein
thema
theorie
tijd
tijdstip
titel
toename
toestand
transformatie
transportmiddel
trend
trilling
uiteinde
uiterlijk
unie
unit
universum
vaardigheid
vaartuig
vacht
vakgebied
valuta
vel
veld
verandering
verband
verbond
vereniging
verklaring
verlies
vermogen
verplaatsing
verrichting
vertegenwoordiging
vertoning
vertrek
vervoermiddel
verzameling
vis
visitekaartje
vlakte
voedselvoorraad
vogel
volk
volksvertegenwoordiging
voorrecht
voorstelling
voorwerp
voorzitter
vorm
vorming
vriend
vriendin
vrouw
vrucht
wapen
water
wedstrijd
weer
weg
werk
werkdag
werknemer
werkplek
werkwijze
wezen
wijn
wijziging
wildgroei
wind
winkel
woning
woord
zaak
zak
ziekte
zijde
zin
zone
zorg
Appendix 1.2: List of VERB lemmata covering one or more base concepts
aankunnen
aantonen
afbreken
afnemen
bedekken
bedoelen
bedriegen
begrijpen
behandelen
beheersen
bekijken
beoordelen
bepalen
bereiken
berekenen
beschadigen
beschermen
beschrijven
besluiten
bespreken
bestaan
besteden
betalen
betreffen
bevestigen
bewegen
bezeren
bezitten
bezorgen
beëindigen
binnengaan
blijven
breken
brengen
concluderen
creëren
dalen
delen
denken
doden
doen
doodgaan
doorgaan
draaien
dragen
duwen
eindigen
ervaren
eten
fabriceren
gaan
gebeuren
gedragen
geven
halen
handelen
hanteren
hebben
helpen
herinneren
herscheppen
houden
identificeren
komen
kopen
krijgen
kwijtraken
laten
leggen
leiden
leven
leveren
lijken
lopen
maken
markeren
nadenken
nemen
ondergaan
onderscheiden
ontdekken
onthouden
ontzien
opdelen
opgeven
ophouden
opwinden
ordenen
overeenkomen
overeenstemmen
pakken
plannen
proberen
raken
rangschikken
samenwerken
scheiden
scheppen
schoonmaken
schrijven
slaan
sluiten
spelen
sterven
steunen
sturen
toelaten
toenemen
toepassen
toestaan
toevoegen
tonen
transformeren
treffen
uiten
uitleggen
uitspreken
uitvoeren
vallen
variëren
vastleggen
vastmaken
vechten
veranderen
verbeteren
verbinden
verbrokkelen
verdelen
vergroten
verhinderen
verklaren
verkopen
verlangen
verliezen
verminderen
veroorzaken
verplaatsen
verschaffen
verslechteren
versnipperen
versplinteren
verspreiden
vertegenwoordigen
vertellen
vertrekken
vervormen
verwachten
verwijderen
verzamelen
verzoeken
verzorgen
vinden
voelen
voltooien
voorbijgaan
voortzetten
vormen
vragen
vullen
wachten
wegdoen
weggaan
werken
weten
wijzigen
willen
worden
zakken
zeggen
zetten
zien
zijn
Appendix 1.3: List of ADJECTIVE lemmata covering one or more base concepts
aanwezig
aardig
afgelopen
algemeen
arm
behaaglijk
belangrijk
belangwekkend
binnenlands
breed
commercieel
compleet
correct
cruciaal
cultureel
dagelijks
democratisch
dichtbij
diep
direct
donker
dood
doorzichtig
duidelijk
duur
echt
economisch
eender
eenvoudig
eerlijk
effectief
enkel
essentieel
federaal
financieel
gebruikelijk
gehard
gelijk
gelukkig
gemakkelijk
gereed
geslaagd
gevaarlijk
gewoon
gezamenlijk
groot
hard
hartelijk
hecht
heftig
helder
huidig
huiselijk
huishoudelijk
identiek
individueel
industrieel
interessant
internationaal
jong
juist
juridisch
klaar
klein
koel
koninklijk
kort
koud
krachtig
laat
landelijk
lang
lokaal
makkelijk
manlijk
mannelijk
medisch
menselijk
microscopisch
militair
moeilijk
mogelijk
mooi
nationaal
nieuw
nobel
normaal
nucleair
nuttig
onafhankelijk
onjuist
onmogelijk
onwaarschijnlijk
oorspronkelijk
open
oud
perfect
plaatselijk
politiek
populair
prachtig
professioneel
publiek
rechtstreeks
regionaal
rijk
seksueel
serieus
significant
snel
sociaal
sterk
succesvol
toekomstig
traditioneel
veilig
verantwoordelijk
verkeerd
verleden
vermoedelijk
vers
verschillend
verschuldigd
volledig
volmaakt
vrij
vroeg
waar
waarschijnlijk
warm
werkelijk
wezenlijk
zacht
zeker
ziek
zwaar
zwak
zwart
Appendix 2.A.1: Number of Usems per template type in the complete dataset for NOUNS
TEMPLATE_TYPE
[3_D_Location] 45
[Abstract_Entity] 172
[Acquire_knowledge] 8
[Act] 16
[Agent_of_persistent_activity] 96
[Agent_of_temporary_activity] 126
[Agentive] 23
[Air_Animal] 14
[Amount] 157
[Animal] 11
[Area] 59
[Artifact] 226
[Artifact_Food] 22
[Artifactual_area] 55
[Artifactual_drink] 13
[Artifactual_material] 18
[Artwork] 24
[Aspectual] 25
[Body_part] 111
[Building] 101
[Cause_act] 3
[Cause_aspectual] 10
[Cause_change] 13
[Cause_change_location] 21
[Cause_change_of_state] 49
[Cause_change_of_value] 14
[Cause_constitutive_change] 17
[Cause_experience_event] 17
[Cause_motion] 9
[Cause_natural_transition] 5
[Cause_relational_change] 15
[Change] 15
[Change_of_location] 38
[Change_of_possession] 10
[Change_of_state] 18
[Change_of_value] 20
[Clothing] 24
[Cognitive_event] 89
[Cognitive_fact] 87
[Color] 3
[Commissive_speech_act] 1
[Concrete_entity] 84
[Constitutive] 43
[Constitutive_change] 7
[Constitutive_state] 17
[Container] 61
[Convention] 101
[Cooperative_activity] 119
[Cooperative_speech_act] 12
[Copy_creation] 5
[Creation] 10
[Declarative_speech_act] 15
[Directive_speech_act] 18
[Disease] 24
[Domain] 84
[Drink] 6
[Earth-Animal] 30
[Entity] 108
[Event] 104
[Exist] 8
[Experience_event] 70
[Expressive_speech_act] 7
[Flavouring] 2
[Flower] 4
[Food] 15
[Fruit] 7
[Furniture] 19
[Geopolitical_Location] 40
[Give_knowledge] 37
[Group] 140
[Human] 182
[Human_Group] 224
[Identificational_state] 18
[Ideo] 17
[Information] 243
[Institution] 202
[Instrument] 100
[Judgement] 4
[Kinship] 39
[Language] 4
[Living_entity] 7
[Location] 221
[Material] 6
[Mental_creation] 4
[Micro_organism] 2
[Modal_event] 23
[Money] 18
[Moral_standard] 11
[Move] 47
[Movement_of_thought] 13
[Natural_substance] 46
[Natural_transition] 9
[Non_relational_act] 5
[Number] 12
[Opening] 25
[Organic_object] 13
[Part] 404
[People] 7
[Perception] 16
[Phenomenon] 106
[Physical_creation] 8
[Physical_object] 12
[Physical_power] 9
[Physical_property] 63
[Plant] 33
[Profession] 183
[Proper_noun] 3
[Property] 112
[Psych_property] 50
[Psychological_event] 28
[Purpose_act] 270
[Quality] 37
[Relational_act] 196
[Relational_change] 13
[Relational_state] 94
[Reporting_event] 25
[Representation] 135
[Role] 39
[Semiotic_artifact] 190
[Shape] 41
[Sign] 44
[Social-status] 75
[Social_Property] 24
[Speech_act] 27
[State] 117
[Stative_location] 43
[Stative_possession] 17
[Stimulus] 24
[Substance] 40
[Substance_food] 3
[Symbolic_creation] 12
[Telic] 12
[Time] 142
[Transaction] 33
[Unit_of_measurement] 68
[Vegetal_entity] 10
[Vehicle] 52
[Water-Animal] 9
[Weather_verb] 8
139 rows selected
Appendix 2.A.2: Number of Usems per template type in the complete dataset for VERBS
TEMPLATE_TYPE
[Acquire_knowledge] 14
[Act] 15
[Agentive] 2
[Aspectual] 47
[Cause] 20
[Cause_act] 15
[Cause_aspectual] 20
[Cause_change] 18
[Cause_change_location] 54
[Cause_change_of_state] 86
[Cause_change_of_value] 12
[Cause_constitutive_change] 33
[Cause_experience_event] 22
[Cause_motion] 14
[Cause_natural_transition] 10
[Cause_relational_change] 24
[Change] 25
[Change_of_location] 54
[Change_of_possession] 30
[Change_of_state] 45
[Change_of_value] 23
[Cognitive_event] 104
[Commissive_speech_act] 8
[Constitutive_change] 7
[Constitutive_state] 12
[Cooperative_activity] 24
[Cooperative_speech_act] 5
[Copy_creation] 2
[Creation] 18
[Declarative_speech_act] 20
[Directive_speech_act] 36
[Event] 85
[Exist] 13
[Experience_event] 29
[Expressive_speech_act] 12
[Give_knowledge] 35
[Identificational_state] 36
[Judgement] 12
[Mental_creation] 2
[Modal_event] 33
[Move] 58
[Natural_transition] 8
[Non_relational_act] 22
[Perception] 23
[Phenomenon] 15
[Physical_creation] 24
[Psychological_event] 51
[Purpose_act] 220
[Relational_act] 232
[Relational_change] 22
[Relational_state] 116
[Reporting_event] 27
[Speech_act] 37
[State] 76
[Stative_location] 31
[Stative_possession] 23
[Stimulus] 11
[Symbolic_creation] 16
[Transaction] 26
59 rows selected
Appendix 2.A.3: Number of Usems per template type in the complete dataset for ADJECTIVES
TEMPLATE_TYPE
[emotive] 2
[emphasizer] 19
[extensional] 42
[intensional] 56
[intensity] 33
[manner] 24
[modal] 11
[object-related] 64
[phys_property] 246
[psych_property] 328
[relation] 59
[social_property] 71
[temporal] 16
[temporal_property] 61
14 rows selected
Appendix 2.B.1: Number of Usems per template type in the sample of 100 entries, for NOUNS
Template_type Freq.
--------------------------------------- --------
[3_D_LOCATION] 1
[ABSTRACT_ENTITY] 11
[AGENT_OF_PERSISTENT_ACTIVITY] 1
[AGENT_OF_TEMPORARY_ACTIVITY] 2
[AIR_ANIMAL] 1
[AMOUNT] 8
[AREA] 5
[ARTIFACTUAL_AREA] 3
[ARTIFACTUAL_DRINK] 1
[ARTIFACT] 5
[ASPECTUAL] 2
[BODY_PART] 8
[BUILDING] 3
[CAUSE_CHANGE_OF_STATE] 1
[CHANGE_OF_STATE] 1
[COGNITIVE_EVENT] 1
[COGNITIVE_FACT] 2
[CONCRETE_ENTITY] 2
[CONSTITUTIVE] 4
[CONSTITUTIVE_STATE] 1
[CONTAINER] 1
[CONVENTION] 2
[COOPERATIVE_ACTIVITY] 6
[DOMAIN] 2
[ENTITY] 4
[EVENT] 3
[EXIST] 1
[EXPERIENCE_EVENT] 1
[GEOPOLITICAL_LOCATION] 5
[GROUP] 2
[HUMAN] 13
[HUMAN_GROUP] 20
[INFORMATION] 7
[INSTITUTION] 12
[INSTRUMENT] 1
[KINSHIP] 4
[LOCATION] 5
[MODAL_EVENT] 1
[MONEY] 2
[MOVE] 1
[NATURAL_SUBSTANCE] 3
[NUMBER] 3
[PART] 13
[PHENOMENON] 3
[PHYSICAL_OBJECT] 1
[PHYSICAL_PROPERTY] 2
[PLANT] 1
[PROFESSION] 4
[PROPERTY] 5
[PSYCHOLOGICAL_EVENT] 1
[PSYCH_PROPERTY] 1
[PURPOSE_ACT] 5
[RELATIONAL_ACT] 3
[REPRESENTATION] 6
[ROLE] 1
[SEMIOTIC_ARTIFACT] 10
[SIGN] 3
[SOCIAL-STATUS] 1
[SOCIAL_PROPERTY] 1
[STATE] 5
[STATIVE_POSSESSION] 1
[STIMULUS] 1
[SUBSTANCE] 1
[TIME] 22
[UNIT_OF_MEASUREMENT] 10
Total: 263
Appendix 2.B.2: Number of Usems per template type in the sample of 100 entries, for VERBS
Template_type Freq.
--------------------------------------- --------
[ACQUIRE_KNOWLEDGE] 1
[ACT] 2
[ASPECTUAL] 4
[CAUSE] 1
[CAUSE_ASPECTUAL] 1
[CAUSE_CHANGE_LOCATION] 1
[CAUSE_CHANGE_OF_STATE] 5
[CAUSE_CONSTITUTIVE_CHANGE] 2
[CAUSE_RELATIONAL_CHANGE] 1
[CHANGE_OF_POSSESSION] 1
[CHANGE_OF_STATE] 2
[COGNITIVE_EVENT] 7
[COMMISSIVE_SPEECH_ACT] 1
[CONSTITUTIVE_STATE] 1
[COOPERATIVE_SPEECH_ACT] 1
[DECLARATIVE_SPEECH_ACT] 2
[DIRECTIVE_SPEECH_ACT] 1
[EVENT] 2
[EXIST] 4
[EXPERIENCE_EVENT] 1
[EXPRESSIVE_SPEECH_ACT] 1
[GIVE_KNOWLEDGE] 4
[IDENTIFICATIONAL_STATE] 2
[JUDGEMENT] 1
[MODAL_EVENT] 1
[MOVE] 2
[NON_RELATIONAL_ACT] 1
[PERCEPTION] 2
[PSYCHOLOGICAL_EVENT] 1
[PURPOSE_ACT] 14
[RELATIONAL_ACT] 15
[RELATIONAL_CHANGE] 1
[RELATIONAL_STATE] 2
[REPORTINGEVENT] 4
[SPEECH_ACT] 1
[STATE] 1
[STATIVE_LOCATION] 1
[STATIVE_POSSESSION] 1
[TRANSACTION] 3
Total: 99
Appendix 2.B.3: Number of Usems per template type in the sample of 100 entries, for ADJECTIVES
Template_type Freq.
--------------------------------------- --------
[EMPHASIZER] 6
[INTENSIONAL] 2
[OBJECT-RELATED] 2
[PHYS_PROPERTY] 9
[PSYCH_PROPERTY] 10
[RELATION] 2
[SOCIAL_PROPERTY] 1
[TEMPORAL_PROPERTY] 11
Total: 43
Appendix 3.A.1: Number of Usems per Domain in the complete dataset, for NOUNS
Domain Freq.
--------------------------------------- --------
ACCOUNTING 13
ACOUSTICS 30
ADMINISTRATIVE_LAW 2
ADVERTISING 33
AEROSPACE_ENGINEERING 26
AGRICULTURE 27
AGRICULTURE-FISHING-FORESTRY 6
AIRFORCE 23
AIR_CONDITIONING 2
AIR_TRANSPORT 26
ALCHEMY 2
AMERICAN_FOOTBALL 29
ANATOMY 89
ANESTHESIOLOGY 2
ANGLING 9
ANTIQUITY 11
ARABLE_FARMING 24
ARBORICULTURE 2
ARCHAEOLOGY 24
ARCHERY 4
ARCHITECTURE 40
ARMY 49
ARTS 71
ASTROLOGY 9
ASTRONOMY 34
ATHLETICS 21
AUDIOVISUAL 22
AUTOMATION 1
AUTOMOBILE_ENGINEERING 32
BABY_CARE 17
BACTERIOLOGY 5
BADMINTON 16
BAKERY 19
BALLET 18
BANKING 61
BASEBALL 17
BASKETBALL 25
BASKETRY 1
BEEKEEPING 6
BILLIARDS 13
BIOCHEMISTRY 4
BOOKBINDING 15
BOTANY 98
BOXING 11
BREWING 15
BUDDHISM 5
BUILDING 71
BUILDING_CRAFTS 26
BULLFIGHTING 1
BUSINESS 130
BUS_TRANSPORT 7
BUTCHERY 15
CANON_LAW 1
CARDIOLOGY 8
CARDS 25
CARTOGRAPHY 45
CAR_TRANSPORT 6
CATTLE_FARMING 26
CERAMICS 13
CEREAL_FARMING 10
CHEMISTRY 56
CHESS 29
CHRISTIANITY 42
CHURCH_OF_ENGLAND 1
CIRCUS 12
CITY_PLANNING 36
CIVIL_ENGINEERING 55
CIVIL_LAW 8
CLEANING 8
CLIMBING 5
CLOTHING_INDUSTRY 53
COKING_INDUSTRY 8
COMMERCE 134
COMMERCIAL_LAW 3
COMPUTING 26
CONSTITUTIONAL_LAW 4
CONSTRUCTION 46
COSMETICS 9
CRAFT_INDUSTRY 37
CREATIVE_WRITING 86
CRICKET 19
CRIME 84
CRIMINAL_LAW 14
CROQUET 11
CUISINE 50
CYCLING 28
CYTOLOGY 3
DANCE 38
DEATH 34
DEMOGRAPHY 17
DENTISTRY 6
DERMATOLOGY 8
DIPLOMACY 37
DISTILLING 13
DRINK 29
DRUGS 15
DYEING 3
EAR-NOSE-THROAT 8
EARTH_SCIENCES 12
ECOLOGY 5
ECONOMICS 103
EDUCATION 108
ELECTRICAL_ENGINEERING 34
ELECTRICAL_WORK 19
ELECTRICITY 13
ELECTRONIC_ENGINEERING 26
EMBRYOLOGY 6
EMPLOYMENT 84
ENOLOGY 13
ENTOMOLOGY 16
EQUESTRIAN_SPORT 17
ETHNOLOGY 21
FAMILY_PLANNING 6
FASHION 58
FENCING 3
FEUDALISM 20
FILM 55
FINANCE 109
FIRE 7
FIREFIGHTING 17
FISHING 24
FLOWER_GROWING 10
FOOD 35
FORESTRY 20
FORTIFICATION 6
FRESHWATER_FISHING 4
FRUIT_AND_VEGETABLES 28
FURNISHING 40
FURNITURE 31
GAMES 87
GARDENING 27
GAS 5
GENEALOGY 2
GENERAL 6315
GENETICS 11
GEOGRAPHY 77
GEOLOGY 22
GEOMETRY 36
GEOPOLITICS 5
GLASSMAKING 11
GLAZING 2
GOLF 11
GOVERNMENT-ADMINISTRATION 140
GRAPHIC_ARTS 76
GYMNASTICS 4
HAIR 10
HEALTH 31
HEALTH_AND_MEDICINE 61
HEATING 8
HEMATOLOGY 3
HERALDRY 10
HERPETOLOGY 3
HIGHER_EDUCATION 57
HINDUISM 2
HISTOLOGY 3
HISTORY 130
HOME_AND_GARDEN 17
HOME_LAUNDRY 5
HOROLOGY 8
HORSESHOEING 4
HORSE_RACING 7
HOTEL_BUSINESS 38
HOUSE_PAINTING 5
HUMAN_SCIENCES 7
HUNTING_AND_SHOOTING 20
HYDROGRAPHY 14
HYDROLOGY 19
HYGIENE 10
ICHTHYOLOGY 7
INLAND_WATERWAY_TRANSPORT 42
INSURANCE 18
INTELLIGENCE 2
INTERNATIONAL_AFFAIRS 116
INTERNATIONAL_LAW 8
ISLAM 1
JEWELRY 18
JUDAISM 7
KITCHEN_EQUIPMENT 20
KNITTING 5
LAW 199
LAW_ENFORCEMENT 137
LEISURE 84
LIBRARIANSHIP 14
LIFE_SCIENCES 32
LINGUISTICS 73
LITURGY 2
LIVESTOCK_FARMING 6
LOCKSMITHING 9
LOGIC 4
MAGIC_AND_WITCHCRAFT 12
MAIL 15
MAMMALOGY 40
MANAGEMENT 76
MANUFACTURING_INDUSTRY 68
MARITIME_LAW 2
MARKETING 23
MARRIAGE 31
MARTIAL_ARTS 6
MASONRY 9
MATHEMATICS 57
MECHANICAL_ENGINEERING 30
MEDIA 27
MEDICINE 80
MEETING 168
METALLURGY 11
METEOROLOGY 63
METROLOGY 15
MICROSCOPY 2
MILITARY 176
MILITARY_LAW 3
MINERALOGY 15
MINING-GENERAL 15
MONARCHY 23
MUSIC 168
MYCOLOGY 1
MYTHOLOGY 9
NAVY 44
NEUROANATOMY 1
NEUROLOGY 16
NEWSPAPER_PUBLISHING 58
NUCLEAR_ENGINEERING 4
NUCLEAR_PHYSICS 8
OBSTETRICS 7
OCEANOGRAPHY 18
OFFICE_EQUIPMENT 12
OIL_INDUSTRY 10
ONCOLOGY 3
OPERA 74
OPHTHALMOLOGY-OPTOMETRY 10
OPTICS 4
ORNITHOLOGY 23
ORTODOX_CHURCH 7
PACKAGING 12
PAINTMAKING 5
PALEOBIOLOGY 7
PALMISTRY 2
PAPERHANGING 4
PAPERMAKING 9
PARAPSYCHOLOGY 5
PEDIATRICS 1
PENAL_SYSTEM 40
PERFUMERY 2
PETS 14
PHARMACY 28
PHILATELY 4
PHILOSOPHY 45
PHONETICS 5
PHOTOGRAPHY 39
PHYSICAL_SCIENCES 22
PHYSICS 48
PHYSIOLOGY 22
PIG_FARMING 14
PLASTERING 2
PLUMBING 6
POETICS 6
POLITICS 208
POLITICS_AND_GOVERNMENT 115
POLO 15
POTTERY 12
POULTRY_FARMING 12
PRIMARY_AND_SECONDARY_EDUCATION 38
PRINTING 53
PROTESTANTISM 12
PSYCHIATRY 16
PSYCHOANALYSIS 10
PSYCHOLOGY 272
PUBLISHING 105
PYROTECHNICS 5
QUARRYING 3
RADIO-TELEVISION 94
RAIL_TRANSPORT 30
REAL_ESTATE 59
RELIGION 77
RESTAURATION 75
RETAIL 110
RHETORIC 15
ROAD_TRANSPORT 44
ROMAN_CATHOLICISM 36
ROOFING 6
ROWING 8
RUGBY 31
SAILING_YACHTING_AND_BOATING 39
SCIENCES 73
SCOUTING 6
SCULPTURE 34
SEA_FISHING 15
SEA_TRANSPORT 78
SEISMOLOGY 2
SERVICE_INDUSTRY 27
SEWING 10
SEX 23
SHAVING 2
SHEEP_FARMING 22
SHIP_BUILDING 38
SHOEMAKING 10
SHOWS 56
SKIING 4
SMOKING 3
SOAPMAKING 2
SOCCER 68
SOCIAL_ACTION 48
SOCIAL_SECURITY 29
SOCIOLOGY 133
SPORT 242
SPORTS_AND_LEISURE 26
STATISTICS 5
STEEL_INDUSTRY 9
SUBWAY_TRANSPORT 7
SURFACE_TREATMENT 2
SURFING 6
SURGERY 14
SURVEYING 5
SWIMMING 10
TANNING 2
TAXATION 33
TELECOMMUNICATIONS 33
TENNIS 15
TEXTILES 19
THEATER 81
THEOLOGY 29
TILING 7
TOBACCO_INDUSTRY 5
TOPOGRAPHY 59
TOWN_AND_COUNTRY_PLANNING 62
TRANSPORT 78
TRUCKING 4
TYPOGRAPHY 17
UPHOLSTERING 3
UTILITIES 15
VENERY 7
VERSIFICATION 4
VETERINARY_MEDICINE 8
VIROLOGY 1
VITICULTURE 4
VOLCANOLOGY 1
WASHING 7
WASTE_TREATMENT 15
WATER 8
WATER_SPORT 8
WHEELWRIGHTING 1
WOODWORKING 36
WOOL_INDUSTRY 6
WRESTLING 6
ZOOLOGY 65
Total: 16458
Appendix 3.A.2: Number of Usems per Domain in the complete dataset, for VERBS
Domain Freq.
--------------------------------------- --------
ACCOUNTING 6
ACOUSTICS 11
AEROSPACE_ENGINEERING 2
AGRICULTURE 4
AIRFORCE 1
AIR_TRANSPORT 5
AMERICAN_FOOTBALL 3
ARABLE_FARMING 9
ARCHERY 1
ARCHITECTURE 6
ARMY 2
ARTS 6
ASTROLOGY 1
ASTRONOMY 1
ATHLETICS 2
AUDIOVISUAL 2
AUTOMOBILE_ENGINEERING 3
BABY_CARE 1
BADMINTON 2
BAKERY 3
BALLET 3
BANKING 2
BASEBALL 2
BASKETBALL 3
BOOKBINDING 2
BOTANY 2
BOXING 2
BREWING 3
BUILDING 1
BUILDING_CRAFTS 5
BUSINESS 12
BUTCHERY 1
CARDS 11
CARTOGRAPHY 1
CATTLE_FARMING 3
CEREAL_FARMING 3
CHEMISTRY 8
CHRISTIANITY 2
CIRCUS 3
CIVIL_ENGINEERING 2
CLEANING 4
CLIMBING 3
CLOTHING_INDUSTRY 8
COMMERCE 22
CONSTRUCTION 10
COSMETICS 1
CRAFT_INDUSTRY 2
CREATIVE_WRITING 10
CRICKET 2
CRIME 19
CRIMINAL_LAW 2
CROQUET 2
CUISINE 11
CYCLING 5
DANCE 6
DEATH 15
DENTISTRY 1
DIPLOMACY 2
DISTILLING 3
DRINK 4
DRUGS 4
EAR-NOSE-THROAT 5
EARTH_SCIENCES 2
ECONOMICS 4
EDUCATION 24
ELECTRICAL_ENGINEERING 3
ELECTRICAL_WORK 2
EMBRYOLOGY 2
EMPLOYMENT 15
ENOLOGY 1
ENTOMOLOGY 2
EQUESTRIAN_SPORT 3
ETHNOLOGY 2
FAMILY_PLANNING 2
FASHION 9
FENCING 2
FILM 5
FINANCE 33
FIREFIGHTING 2
FISHING 3
FLOWER_GROWING 1
FOOD 7
FORESTRY 3
FORTIFICATION 1
FRUIT_AND_VEGETABLES 3
GAMES 18
GARDENING 7
GENERAL 2048
GEOGRAPHY 3
GEOMETRY 1
GOLF 2
GOVERNMENT-ADMINISTRATION 5
GRAPHIC_ARTS 13
HAIR 1
HEALTH 6
HEALTH_AND_MEDICINE 11
HIGHER_EDUCATION 2
HISTORY 10
HOME_AND_GARDEN 4
HOME_LAUNDRY 1
HOROLOGY 6
HORSE_RACING 1
HOTEL_BUSINESS 10
HOUSE_PAINTING 1
HUNTING_AND_SHOOTING 10
HYDROGRAPHY 2
HYDROLOGY 4
HYGIENE 1
ICHTHYOLOGY 1
INLAND_WATERWAY_TRANSPORT 10
INSURANCE 2
INTERNATIONAL_AFFAIRS 9
JEWELRY 1
KNITTING 2
LAW 41
LAW_ENFORCEMENT 47
LEISURE 14
LIBRARIANSHIP 1
LIFE_SCIENCES 4
LINGUISTICS 9
LIVESTOCK_FARMING 1
LOCKSMITHING 1
LOGIC 1
MAGIC_AND_WITCHCRAFT 1
MAIL 6
MANAGEMENT 5
MANUFACTURING_INDUSTRY 8
MARKETING 1
MARRIAGE 14
MARTIAL_ARTS 2
MASONRY 1
MATHEMATICS 6
MECHANICAL_ENGINEERING 2
MEDICINE 19
MEETING 32
METALLURGY 1
METEOROLOGY 8
METROLOGY 2
MILITARY 33
MINING-GENERAL 1
MONARCHY 1
MUSIC 32
MYTHOLOGY 1
NAVY 2
NEUROLOGY 2
NEWSPAPER_PUBLISHING 4
OBSTETRICS 4
OCEANOGRAPHY 2
OPERA 8
OPHTHALMOLOGY-OPTOMETRY 2
OPTICS 1
ORNITHOLOGY 4
PALMISTRY 1
PENAL_SYSTEM 4
PETS 4
PHILOSOPHY 2
PHONETICS 1
PHOTOGRAPHY 5
PHYSICAL_SCIENCES 3
PHYSICS 5
PHYSIOLOGY 2
PIG_FARMING 1
PLUMBING 2
POETICS 1
POLITICS 14
POLITICS_AND_GOVERNMENT 2
POLO 1
POULTRY_FARMING 1
PRIMARY_AND_SECONDARY_EDUCATION 2
PRINTING 5
PROTESTANTISM 3
PSYCHOANALYSIS 1
PSYCHOLOGY 71
PUBLISHING 3
QUARRYING 1
RADIO-TELEVISION 10
RAIL_TRANSPORT 2
REAL_ESTATE 2
RELIGION 20
RESTAURATION 12
RETAIL 17
ROAD_TRANSPORT 6
ROMAN_CATHOLICISM 2
ROOFING 1
RUGBY 3
SAILING_YACHTING_AND_BOATING 10
SCIENCES 15
SCULPTURE 8
SEA_FISHING 1
SEA_TRANSPORT 19
SEISMOLOGY 1
SEWING 3
SEX 9
SHAVING 1
SHEEP_FARMING 2
SHIP_BUILDING 7
SHOEMAKING 2
SHOWS 6
SOCCER 10
SOCIAL_ACTION 2
SOCIAL_SECURITY 3
SOCIOLOGY 10
SPORT 39
SPORTS_AND_LEISURE 3
STEEL_INDUSTRY 3
SUBWAY_TRANSPORT 1
SURFACE_TREATMENT 1
SURFING 1
SURVEYING 1
SWIMMING 1
TAXATION 8
TELECOMMUNICATIONS 5
TENNIS 2
TEXTILES 6
THEATER 11
THEOLOGY 2
TILING 1
TOPOGRAPHY 1
TOWN_AND_COUNTRY_PLANNING 2
TRANSPORT 22
TYPOGRAPHY 2
UTILITIES 4
VENERY 1
VERSIFICATION 1
VETERINARY_MEDICINE 2
VITICULTURE 3
WASHING 2
WASTE_TREATMENT 2
WATER_SPORT 1
WOODWORKING 6
WOOL_INDUSTRY 1
ZOOLOGY 8
Total: 3391
Appendix 3.A.3: Number of Usems per Domain in the complete dataset, for ADJECTIVES
Domain Freq.
--------------------------------------- --------
ACOUSTICS 8
AEROSPACE_ENGINEERING 1
AGRICULTURE 2
AIR_TRANSPORT 1
ANATOMY 4
ANTIQUITY 4
ARABLE_FARMING 4
ARCHAEOLOGY 2
ARCHITECTURE 3
ARTS 10
BABY_CARE 1
BAKERY 1
BANKING 1
BOTANY 3
BUILDING 1
BUSINESS 7
BUS_TRANSPORT 1
BUTCHERY 2
CARTOGRAPHY 3
CHEMISTRY 2
CHRISTIANITY 2
CITY_PLANNING 1
CIVIL_ENGINEERING 1
CLEANING 4
CLOTHING_INDUSTRY 10
COKING_INDUSTRY 1
COMMERCE 9
CONSTRUCTION 1
CREATIVE_WRITING 1
CRIME 5
CUISINE 10
DEATH 1
DIPLOMACY 2
DISTILLING 2
DRINK 4
EAR-NOSE-THROAT 1
ECONOMICS 11
EDUCATION 6
ELECTRICITY 1
EMPLOYMENT 4
ENOLOGY 3
ETHNOLOGY 8
FASHION 15
FILM 1
FINANCE 13
FISHING 1
FOOD 4
FRUIT_AND_VEGETABLES 2
FURNITURE 2
GARDENING 1
GENERAL 956
GENETICS 1
GEOGRAPHY 5
GEOLOGY 2
GEOMETRY 6
GEOPOLITICS 1
GOVERNMENT-ADMINISTRATION 1
GRAPHIC_ARTS 14
HEALTH 2
HEALTH_AND_MEDICINE 16
HEATING 4
HERALDRY 1
HIGHER_EDUCATION 3
HISTORY 17
HOME_AND_GARDEN 2
HOME_LAUNDRY 1
HOROLOGY 1
HOUSE_PAINTING 2
HUNTING_AND_SHOOTING 1
HYDROLOGY 2
HYGIENE 1
INLAND_WATERWAY_TRANSPORT 2
INTERNATIONAL_AFFAIRS 4
JEWELRY 5
JUDAISM 1
KNITTING 1
LAW 10
LAW_ENFORCEMENT 7
LEISURE 4
LIFE_SCIENCES 4
LINGUISTICS 9
MANAGEMENT 1
MANUFACTURING_INDUSTRY 1
MATHEMATICS 4
MEDICINE 3
MEETING 11
METALLURGY 3
METEOROLOGY 12
MILITARY 8
MILITARY_LAW 1
MONARCHY 2
MUSIC 10
NUCLEAR_PHYSICS 3
OBSTETRICS 1
OCEANOGRAPHY 1
OPHTHALMOLOGY-OPTOMETRY 3
OPTICS 1
PAINTMAKING 1
PALEOBIOLOGY 1
PHILOSOPHY 5
PHOTOGRAPHY 1
PHYSICAL_SCIENCES 1
PHYSICS 4
PHYSIOLOGY 4
POLITICS 10
POLITICS_AND_GOVERNMENT 22
PSYCHIATRY 3
PSYCHOANALYSIS 3
PSYCHOLOGY 55
PUBLISHING 1
RAIL_TRANSPORT 1
REAL_ESTATE 1
RELIGION 9
RESTAURATION 10
RETAIL 5
ROAD_TRANSPORT 1
SAILING_YACHTING_AND_BOATING 2
SCIENCES 2
SEA_FISHING 1
SEA_TRANSPORT 3
SEWING 1
SEX 5
SHAVING 1
SHOEMAKING 1
SOAPMAKING 1
SOCIAL_ACTION 2
SOCIAL_SECURITY 1
SOCIOLOGY 25
SPORT 4
STATISTICS 2
SURVEYING 1
TANNING 1
TEXTILES 4
THEOLOGY 1
TOPOGRAPHY 2
TOWN_AND_COUNTRY_PLANNING 4
TRANSPORT 7
UTILITIES 2
VIROLOGY 1
VITICULTURE 2
VOLCANOLOGY 1
WASHING 1
WATER 1
WOODWORKING 1
ZOOLOGY 6
Total: 1562
Appendix 3.B.1: Number of Usems per Domain in the sample of 100 entries, for NOUNS
Domain Freq.
--------------------------------------- --------
ACOUSTICS 1
AGRICULTURE 2
AIR_TRANSPORT 1
AMERICAN_FOOTBALL 1
ANATOMY 6
ANGLING 1
ANTIQUITY 1
ARCHAEOLOGY 1
ARCHITECTURE 1
ARTS 1
ASTRONOMY 3
BABY_CARE 1
BADMINTON 2
BAKERY 1
BANKING 4
BASEBALL 1
BASKETBALL 1
BOOKBINDING 3
BOTANY 2
BOXING 1
BUILDING_CRAFTS 1
BUSINESS 9
CARDS 2
CATTLE_FARMING 1
CHEMISTRY 2
CHRISTIANITY 2
CIVIL_LAW 1
CLEANING 1
COMMERCE 6
COMPUTING 2
CONSTRUCTION 5
COSMETICS 2
CRAFT_INDUSTRY 7
CREATIVE_WRITING 3
CRICKET 1
CRIME 4
CROQUET 1
CUISINE 4
DEATH 1
DEMOGRAPHY 1
DISTILLING 1
DRINK 2
EARTH_SCIENCES 1
ECONOMICS 5
EDUCATION 2
EMPLOYMENT 2
ENOLOGY 1
ENTOMOLOGY 2
ETHNOLOGY 1
FAMILY_PLANNING 1
FASHION 1
FEUDALISM 1
FINANCE 6
FIREFIGHTING 1
FISHING 2
FOOD 3
FORESTRY 1
GAMES 1
GENERAL 244
GEOGRAPHY 9
GEOLOGY 1
GEOPOLITICS 1
GOLF 1
GOVERNMENT-ADMINISTRATION 7
GRAPHIC_ARTS 2
HEALTH_AND_MEDICINE 1
HERPETOLOGY 1
HISTORY 3
HOME_AND_GARDEN 1
HOME_LAUNDRY 2
HOROLOGY 1
HUMAN_SCIENCES 2
HYDROGRAPHY 1
HYDROLOGY 1
ICHTHYOLOGY 1
INLAND_WATERWAY_TRANSPORT 2
INTERNATIONAL_AFFAIRS 2
JEWELRY 1
LAW 4
LAW_ENFORCEMENT 4
LEISURE 2
LIFE_SCIENCES 3
LINGUISTICS 4
LOGIC 1
MAMMALOGY 2
MANAGEMENT 3
MANUFACTURING_INDUSTRY 8
MARRIAGE 3
MATHEMATICS 6
MEETING 9
METEOROLOGY 1
MILITARY 2
MONARCHY 1
MUSIC 2
NEWSPAPER_PUBLISHING 2
OPERA 1
ORNITHOLOGY 1
PAPERMAKING 1
PEDIATRICS 1
PERFUMERY 1
PETS 2
PHILOSOPHY 2
PHOTOGRAPHY 2
PHYSICAL_SCIENCES 1
PHYSIOLOGY 1
POLITICS 8
POLITICS_AND_GOVERNMENT 11
POLO 1
PRIMARY_AND_SECONDARY_EDUCATION 1
PRINTING 5
PROTESTANTISM 1
PSYCHOANALYSIS 1
PSYCHOLOGY 7
PUBLISHING 3
RADIO-TELEVISION 1
REAL_ESTATE 1
RELIGION 1
RESTAURATION 2
RETAIL 6
ROAD_TRANSPORT 2
RUGBY 1
SAILING_YACHTING_AND_BOATING 1
SCIENCES 2
SEA_TRANSPORT 3
SERVICE_INDUSTRY 4
SEX 1
SHAVING 2
SHOWS 1
SOCCER 2
SOCIOLOGY 7
SPORT 9
SPORTS_AND_LEISURE 1
STATISTICS 1
TENNIS 2
THEATER 1
THEOLOGY 1
TILING 1
TOPOGRAPHY 2
TOWN_AND_COUNTRY_PLANNING 1
TRANSPORT 5
UTILITIES 1
WASHING 3
WATER 1
WATER_SPORT 1
WOODWORKING 1
WRESTLING 1
Total: 586
Appendix 3.B.2: Number of Usems per Domain in the sample of 100 entries, for VERBS
Domain Freq.
--------------------------------------- --------
ARTS 2
BREWING 1
BUILDING_CRAFTS 1
CIRCUS 1
COMMERCE 3
CONSTRUCTION 1
CRIME 2
CUISINE 1
DRINK 1
DRUGS 1
EARTH_SCIENCES 1
ELECTRICAL_ENGINEERING 1
ELECTRICAL_WORK 1
EMPLOYMENT 3
ETHNOLOGY 2
FAMILY_PLANNING 1
FASHION 1
FINANCE 2
GARDENING 1
GENERAL 98
GOVERNMENT-ADMINISTRATION 2
HEALTH 1
HEALTH_AND_MEDICINE 3
HOROLOGY 1
HOTEL_BUSINESS 2
INLAND_WATERWAY_TRANSPORT 1
INTERNATIONAL_AFFAIRS 1
LAW_ENFORCEMENT 4
LEISURE 2
LIFE_SCIENCES 2
MANAGEMENT 1
MEETING 1
METROLOGY 1
MILITARY 2
PHILOSOPHY 1
PHYSICAL_SCIENCES 1
POLITICS 2
PSYCHOLOGY 4
RELIGION 1
RESTAURATION 2
RETAIL 1
SCIENCES 3
SEA_TRANSPORT 2
SHOWS 1
SOCCER 1
SOCIAL_ACTION 1
SOCIOLOGY 2
SPORT 3
TAXATION 3
THEOLOGY 1
TRANSPORT 1
WOODWORKING 1
Total: 180
Appendix 3.B.3: Number of Usems per Domain in the sample of 100 entries, for ADJECTIVES
Domain Freq.
--------------------------------------- --------
ARABLE_FARMING 1
ARTS 1
CARTOGRAPHY 1
CLOTHING_INDUSTRY 1
CUISINE 1
GENERAL 41
GEOMETRY 1
GRAPHIC_ARTS 1
JEWELRY 1
LEISURE 1
LINGUISTICS 1
SEA_FISHING 1
SURVEYING 1
TOPOGRAPHY 1
TRANSPORT 2
Total: 56
Appendix 4.A.1: Number of Usems per Semantic Class in the complete dataset, for NOUNS
Semantic class Freq.
-------------------------------------------------------------------------
ABSTRACT 660
ACT 463
ACTIVITY 2
ADMINISTRATIVE 37
AFFECTION 13
AGENCY 202
AMOUNT 157
ANIMAL 10
ARTIFACT 439
ARTIFACT, EDIBLE 39
ATTRIBUTE 249
BIO 77
BIRD 13
BODY_PART 111
BUILDING 105
CHANGE 223
COGNITION 168
COGNITIVE_FACT 87
COLLECTIVE 350
COLOR 3
COMMUNICATION 177
COMPETITION 1
CONSUMPTION 2
CONTAINER 2
CREATION 42
CURRENCY 18
DAY 18
EMOTION 93
ETHNOS 8
EVENT 72
FEELING 1
FISH 5
FLOWER 4
FOOD 2
FORM 41
FRUIT, EDIBLE 7
FURNITURE 19
GARMENT 24
GEOGRAPHY 40
GROUP 24
HUMAN 293
HUMAN_COLLECTIVE 2
IDEO 17
ILLNESS 14
INANIMATE 10
INDIVIDUAL_NAMES 3
INSECT 6
INSTRUMENT 100
LETTER 12
LIVING_BEING 7
LOCATION 401
MAMMAL 30
MATTER 45
MEASURE_UNIT 80
MICROORGANISM 2
MONTH 14
MOTION 115
MOVE 1
MUSIC 1
NOTION 329
OBJECT 25
OCCUPATION 84
OCCUPATION_AGENT 355
OPERATION 2
PART 404
PERCEPTION 18
PERIOD 21
PHENOMENON 135
PLANT 18
POSSESSION 55
PROCESS 5
PSYCHOLOGICAL_FEATURE 50
QUANTITY 1
RELATION 1
SHRUB 6
SITU 26
STATE 32
STATIVE 331
SUBSTANCE 68
SUBSTANCE, EDIBLE 20
SYSTEM_OF_THOUGHT 13
TIME 4
TIME_PERIOD 89
TOPS 4
TREE 9
VEHICLE 52
WEATHER 8
Total: 7326
Appendix 4.A.2: Number of Usems per Semantic Class in the complete dataset, for VERBS
Semantic class Freq.
-------------------------------------------------------------------------
BODY 19
CHANGE 417
COGNITION 210
COMMUNICATION 202
COMPETITION 35
CONSUMPTION 17
CONTACT 112
CREATION 102
EMOTION 64
ILLNESS 1
MOTION 202
MOVE 1
NOTION 2
PERCEPTION 54
PHENOMENON 25
POSSESSION 89
SOCIAL 172
STATE 12
STATIVE 378
Total: 2114
Appendix 4.A.3: Number of Usems per Semantic Class in the complete dataset, for ADJECTIVES
Semantic class Freq.
-------------------------------------------
ABSTRACT 23
AGENCY 1
AMOUNT 10
ATTRIBUTE (TAKEN FROM NOUNS) 516
COGNITION (TAKEN FROM VERBS) 11
COGNITIVE_FACT 3
COLOUR 13
DIRECTION 2
EMOTION (TAKEN FROM VERBS) 2
ENTITY 6
EVENT 5
FACULTY 3
NUMBER 5
OCCUPATION 2
OPERATION 1
PERIOD 34
PROCESS 2
PSYCHOLOGICAL_FEATURE 345
STATE 25
SUBSTANCE 1
SYSTEM_OF_THOUGHT 2
TIME_PERIOD 20
Total: 1032
Appendix 4.B.1: Number of Usems per Semantic Class in the sample of 100 entries, for NOUNS
Semantic class Freq.
-------------------------------------------------------------------------
ABSTRACT 26
ACT 11
ACTIVITY 1
AGENCY 12
AMOUNT 8
ARTIFACT 15
ATTRIBUTE 8
BIO 11
BIRD 1
BODY 8
BUILDING 3
CHANGE 4
COGNITION 1
COGNITIVE_FACT 2
COLLECTIVE 17
CURRENCY 2
DAY 5
EMOTION 2
EVENT 4
GEOGRAPHY 5
HUMAN 13
INDIVIDUAL_NAMES 1
INSTRUMENT 1
LETTER 2
LOCATION 14
MATTER 1
MEASURE_UNIT 13
MONTH 2
MOTION 1
NOTION 12
OBJECT 1
OCCUPATION 2
OCCUPATION_AGENT 6
PART 13
PERIOD 2
PHENOMENON 4
POSSESSION 1
PSYCHOLOGICAL_FEATURE 1
STATE 2
STATIVE 7
SUBSTANCE 3
SUBSTANCE,EDIBLE 1
TIME_PERIOD 13
TREE 1
Total: 263
Appendix 4.B.2: Number of Usems per Semantic Class in the sample of 100 entries, for VERBS
Semantic class Freq.
-------------------------------------------------------------------------
BODY 1
CHANGE 20
COGNITION 17
COMMUNICATION 13
COMPETITION 2
CONSUMPTION 5
CONTACT 4
CREATION 1
EMOTION 1
MOTION 4
PERCEPTION 2
POSSESSION 7
SOCIAL 6
STATE 3
STATIVE 13
Total: 99
Appendix 4.B.2: Number of Usems per Semantic Class in the sample of 100 entries, for ADJECTIVES
Semantic class Freq.
-------------------------------------------------------------------------
ATTRIBUTE(TAKEN FROM NOUNS) 24
PERIOD 4
PSYCHOLOGICAL_FEATURE 10
TIME_PERIOD 5
Total 43