SIMPLE LE4-8346

WP03.9

 

 

 

SIMPLE - LEXICON DOCUMENTATION FOR ITALIAN

 

 

* * *

Document first version date

01/05/00

 

 

Document date

01/05/00

Document ID

Deliverable D3.9.2 , WP03.9

Version

01

 

 

Doc. type

QAP*

 

 

Document status

to be validated

 

 

Validation type

 

 

 

Comments

 

 

 

 

 

 

Name

Organisation

Purpose

 

 

 

 

From

Nilda Ruimy

ILC-Pisa

Documentation

 

Cristina Del Fiorentino

 

 

 

Monica Monachini

 

 

 

Marisa Ulivieri

 

 

 

 

 

 

 

 

 

 

To

TM

 

Documentation

 

 

 

 

 

 

 

 

 

 

1. General design information

 

The SIMPLE Semantic Lexicon has been developed in the framework of the SIMPLE project, which started in April 1998 and ran for twenty-four months. This project aimed at adding semantic information on top of the 12 language PAROLE morphological and syntactic lexica. All over the three description levels, these lexica share a common model, linguistic specifications, DTD and exchange format. PAROLE and SIMPLE language resources are general all-purpose NLP lexica.

 

    1. Lexicon population

 

1.1.1. WordNet 1.5 Base Concepts

 

The SIMPLE Italian Lexicon consists of 10,105 word senses encoded at semantic level and released in SGML format. These semantic units (hereafter SemUs) are distributed among the categories of nouns (7,063), verbs (2,032) and adjectives (1,010). Each of them is linked to its corresponding morphological and syntactic units (hereafter SynUs) in the PAROLE Lexicon.

The starting point for the selection of the core SIMPLE Lexicon population was a set of WordNet 1.5 Base Concepts (hereafter BCs) (500 nouns, 200 verbs and 185 adjectives), ranked on the basis of the frequency parameter.

Italian lexical items corresponding to base concepts were searched in the EuroWordNet database, wherein the WN1.5 base concepts (which serve as Inter-Lingua-Index for the project) were already linked to local synsets. The Italian synsets linked to the selected set of WN1.5 base concepts through the two relations ‘Eq_Synonym’ and ‘Eq_Near_Synonym’ were automatically extracted. By contrast, the base concepts for which in EuroWordNet no immediate Italian equivalent was identified (those which had a different type of link to BCs, i.e. hyperonymy, hyponymy or meronymy) were disregarded. For nouns, e.g., these amount to 19 items. These elements are however somehow represented in our lexicon by virtue of their link to hyperonyms that are undoubtedly present in the list of candidate lexical units to encode.

The translated BCs were then checked against the encoded SynUs of the PAROLE Lexicon. This was a crucial step in order to ensure the linking of SIMPLE SemUs to the corresponding items of the PAROLE morphological and syntactic lexica.

Table 1 below summarizes the results.

 

 

 

 

BCs

Italian SemUs

Italian Lemmas

SIMPLE / PAROLE Intersection

Missing in PAROLE lexicon

NOUN

500

764

656

579

10 simple units

67 MWUs

VERB

200

472

346

306

31 simple units

9 MWUs

ADJECTIVE

185

270

191

174

17 simple units

 

Table 1.

 

It should be noted that, for those base concepts translated in EWN by means of Italian MWUs, a synonym single word lemma generally exists in the PAROLE syntactic lexicon. As for single word lemmas missing in PAROLE, most of them have been successively encoded at morphological and syntactic levels in order to become candidate to semantic encoding.

 

Besides this first core of SemUs, common to all partners of the SIMPLE project and which ensures to a certain extent uniformity of coverage across languages as well as possibility of comparison and assessment of data, the semantic lexicon population consists of a subset of entries of the Italian PAROLE lexicon.

 

A 10,000 entry lexical database is relatively small, nonetheless we attempted to aim as much as possible at a closure of the lexicon by coding most of the single word senses used as target SemUs of the relations filling in the Qualia roles. However, some relations are still opened since they point to dummy entries.

In the SIMPLE lexicon, dummy elements are of three types:

 

As for the Italian lexicon, the problem of dummy entries will be overcome soon since an extension of PAROLE and SIMPLE lexica is foreseen in the Italian National Project CLIPS that aims at building a large Italian lexical database.

 

 

1.1.2. PAROLE’s Entries

 

The set of words to be encoded at semantic level was selected among the PAROLE lexicon entries, according to their frequency in the PAROLE corpus. This selection was performed in the following way.

Using an automatic procedure, PAROLE non frame-bearing lexical entries were extracted on the basis of key words appearing in their definitions. Those key words correspond to prototypical target SemUs of the formal quale in SIMPLE template types. For example, entries whose definition had ramo, dominio, branca, disciplina (field, branch, discipline) as genus term were candidate SemUs for the template Domain; malattia, affezione (disease, affection), for the template Disease, etc.

As to argument-bearing words, they were searched out in two different ways. In the ILC-DMI, according to key words occurring in their definitions; in PAROLE lexicon — where frame-bearing entries are assigned no definition but rather an example of use —, according to their syntactic descriptions. Hence, for example, for quality denoting words, we searched for the description identifier of mass nouns subcategorizing for an optional 'of_pp', while for feeling denoting nouns we looked for identifiers of deverbal nouns subcategorizing for an optional 'of_pp' (corresponding to the verb subject) and an optional 'for_pp'. Within the redundant set of entries obtained, the identification of quality or feeling denoting nouns was then quite easy.

 

In an effort towards a completeness of entries, we decided to encoded the main readings of each lexical unit selected. For each word of the coding list, reading distinction was therefore determined on the basis of meaning differences and their syntactic descriptions were studied in order to establish the appropriate link between syntactic and semantic units. As a general rule, we tried to deal with all members of derivation paradigms, e.g.: aggredire, aggressore, aggressione (to mug, mugger, mugging), provided they were encoded at syntactic level.

The 10,037 senses encoded correspond to 7,285 lemmas, which means an average number of 1,38 reading distinctions per lemma.

 

Focussing on the PAROLE lexicon from a semantic perspective has sometimes led us to revise it, either for inserting BCs entries, for cancelling, adding or adjusting syntactic descriptions or to make uniform the syntactic description of words whose clustering in a semantic class evidenced a similar behaviour.

 

 

 

    1. Dealing with the Ontology

 

The SIMPLE Semantic Type System consists of a Core Ontology, whose use is mandatory and a Recommended one, which is optional.

The Core Ontology consists of the hierarchy upper and general types that meet a large consensus across languages and provide the most essential information for describing word senses. The Recommended Ontology consists of the hierarchy lower and specific types that clearly provide more granular information about word meaning.

 

For encoding the Italian lexicon, we chose to use the whole Ontology which consists of a set of 153 semantic types (Appendix A), and tried to balance the population of each type. It has been obviously easier to retrieve in the PAROLE lexicon meanings fitting in some kinds of semantic types rather than other ones. For example, there was clearly no problem for identifying nouns in order to fill Animal, Human, or Instrument templates, and indeed only the most frequent ones were sorted out. On the other hand, top templates such as Telic — used to encode very underspecified non-concrete nouns which only convey a Telic dimension, e.g.: scopo (goal) — or Entity, Concrete_Entity, Representation, Act or Change which subsume more specific types, were used for the coding of a restricted set of word senses. We endeavoured in fact to use the most specific type whenever possible since this allowed us to provide more granular information.

 

Since lexicographer's subjectivity is a reality that cannot be ignored / disregarded in a lexicon building process, each lexicographer was assigned a given portion of the Ontology in order to ensure that word senses belonging to a particular semantic area were described according to the same approach and interpretation. Besides, coding systematically sets of word senses belonging to the same semantic type guarantees somehow the consistency of coding. In this way, synonyms and near synonyms are encoded in a very similar way and target SemUs are used more consistently, e.g.: in order to avoid a proliferation of synonyms, the verb fabbricare (to build) was chosen as prototypical target SemU in the Agentive relation 'created_by', for all instrument denoting nouns.

 

The coding of a selected area of the lexicon, say Living entities, Artifacts, Properties, or Change denoting senses, always started with the description of word meanings belonging to top types. This allowed us to have at once at our disposal, as existing entries, those more generic words that would then be used as targets of the Formal relation throughout the template population. In this way, we avoided an undesirable creation of dummies words. Similarly, when dealing with adjectives, whereby the formal quale is expressed through an antonymic relation, antonym adjectives were encoded in a sequential way in order to immediately turn dummy links into real ones.

 

 

2. Semantic Encoding

 

In the Italian SIMPLE lexicon the coding of entries was performed using the encoding tool supplied by the Catalan site. This tool, whose 1st version was delivered in January 1999, is an interface to encode/check/browse the Simple data, stored in a relational database (MSAccess). It has been created by Marta Villegas and Teresa Sadurní and developed by Teresa Sadurní at the Institut d'Estudis Catalans (IEC), in Barcelona. This tool has provided a most valuable help for the encoding process. Its great number of very useful options for inserting entries, copying and maintaining existing data has allowed to perform an accurate and quick coding of entries and a continuous consistency checking of data.

 

In the following, the different steps of the encoding of a semantic entry will be explained and illustrated by means of the relevant SGML objects.

 

For a deeper insight into the SIMPLE model and any related theoretical issue, the reader is referred to SIMPLE Linguistic Specifications. Let us only say here that in SIMPLE lexical entries, word meaning is described by means of two descriptive objects:

Valued features are expressed through the SGML objects 'WeightValSemFeature' and Relations through 'RWeightValSemU' objects. The weight of these objects may be either 'Prototypical', for type defining information, i.e. the information that intrinsically characterizes a semantic type, or 'Essential' for optional information, i.e. the one that is not crucial to define a semantic type but rather provides information about lexical units.

In the Italian lexicon, all entries are described by means of both descriptive objects. A total set of 1001 features (Appendix B) and 96 relations (Appendix C) were used.

 

 

2. 1. Linking a Semantic Unit to its Syntactic Correspondent

 

Every semantic entry is linked to (at least) one syntactic one. This linking is formally expressed in the Correspondence object ‘CorrespSynUSemU’ that allows to relate syntactic and semantic layers. The correspondence object is embedded in the representation of the SynU. The syntax-semantic links may be of different types:

<SynU

id="SYNU_XXX_N"

<CorrespSynUSemU

targetsemu="USem59592">

</SynU>

 

<SynU

id="SYNU_libro_N"

example="Insieme di fogli che contengono un testo stampato o manoscritto, rilegati e provvisti di copertina"

description="n-0-x_c">

<CorrespSynUSemU

targetsemu="USem4046"> <!-- Semiotic_artifact reading -->

<CorrespSynUSemU

targetsemu="USem4047"> <!-- Information reading -->

</SynU>

 

<SynU

id="SYNU_adatto_A_2"

example="bagaglio adatto per viaggiare"

description="a-infper-x_pred_post_g">

<CorrespSynUSemU

targetsemu="USemD6679"

correspondence="RED2to1P1Arg0">

</SynU>

<SynU

id="SYNU_adatto_A_3"

example="una persona adatta a fare qlco"

description="a-np-infa[s]-x_pred_post_g">

<CorrespSynUSemU

targetsemu="USemD6679"

correspondence="ISObivalent">

</SynU>

<SynU

id="SYNU_adatto_A_4"

example="adatto alla situazione"

description="a-ppa-x_pred_post_g">

<CorrespSynUSemU

targetsemu="USemD6679"

correspondence="RED2to1P1Arg0">

</SynU>

<SynU

id="SYNU_adatto_A_5"

example="adatto per quel lavoro"

description="a-ppper-x_pred_post_g">

<CorrespSynUSemU

targetsemu="USemD6679"

correspondence="RED2to1P1Arg0">

</SynU>

 

 

2. 2. Linking Semantic Arguments to Syntactic Positions

 

Besides the linking of entries of the two description levels, the linking of semantic arguments of predicative entries to syntactic positions is performed through the values of the feature 'Correspondence'. Note that some positions may however have no corresponding arguments and some arguments may not be linked to any syntactic position. In the Italian lexicon, the ‘Correspondence’ feature is assigned to all verbs, to predicative nouns — either deverbal or simple ones — and to frame bearing adjectives.

The types of correspondence values that were used in the Italian lexicon are:

<SynU

id="SYNU_costruzione_N_2"

example="la costruzione dell'edificio da parte della ditta"

description="nv-ppdi-ppdapartedi)-x_m">

<CorrespSynUSemU

targetsemu="USem4173"

correspondence="CROSSEDbivalent">

</SynU>

<SynU

id="SYNU_sciare_V"

example="Piero va a sciare"

description="i-xa">

<CorrespSynUSemU

targetsemu="USem6784"

correspondence="AUG1to2">

</SynU>

 

 

2. 3. Gloss

 

A lexicographic gloss, inspired from Italian medium size dictionary definitions, was assigned to each semantic entry. For most of them, an example of use was provided as well.

 

 

2. 4. Feature Assignment

 

2.4.1.Template Type Assignment

 

In the Italian lexicon, template type assignment was performed taking into account not only the Core but also the Recommended SIMPLE Ontology, which provides a more granular structuring of information (Appendix A).

 

Template type assignment was decided only after the distinction of readings was established. Template type is assigned according to the semantic type a word sense belongs to. Templates consist in fact of a cluster of structured information, among which the semantic type. Since templates are organized in a hierarchical structure, template type assignment amounts to scan a selected area of the hierarchy for choosing and instantiating the template which provides the most adequate kind and amount of information necessary to both define the semantics of a given word sense and discriminate among other possible senses of the same lexical item.

 

Following the Generative Lexicon Theory, the SIMPLE model is based on the assumption that lexical units differ as to the degree of complexity their semantics conveys. The Generative Lexicon allows to provide a uniform representation of lemmas of heterogeneous complexity. As a matter of fact, some word senses can be exhaustively characterized in terms of a monodimensional, taxonomic relation to other lexical units. This is the case of words such as luogo (location), defined as a type of concrete entity; uccello (bird), a type of animal; virus (virus), a type of living entity; udito (hearing), a type of physical property; disciplina (discipline), a type of abstract entity; fenomeno (phenomenon) or cerimonia (ceremony), a type of event, etc. These word meanings are assigned simple types.

On the other hand, word senses denoting a more complex bundle of information, whose meaning consists of orthogonal dimensions and cannot be captured by a mere subtype relation, are assigned unified, i.e. multidimensional types. This is for example the case of all words denoting artifacts: their characterization as types of concrete entities can in no way be deemed sufficient. They inherit their constitutive properties from different semantic types (orthogonal inheritance): they are concrete entities, intentionally created by some human process, for a certain purpose. Only by taking into account all of these meaning dimensions can one provide an adequate description of their semantic content.

 

While for trivial cases, template assignment has been merely based on world knowledge, for more complex cases, a parsing of definitions has pointed out how the elements of meaning generally map quite easily on the dimension(s) expressed via qualia roles. This highlights the adequacy of qualia relations for capturing key aspects of word meaning, especially for nouns, as illustrated in the table below. Such a mapping has in some cases guided the selection of the most adequate template type.

 

 

 

SemU

Dictionary definition

Template type

Manufatto:

Artifact

Oggetto fatto a mano o con attrezzi manuali (Agentive: created_by)

object which has been made by hand or with manual tools

Artifact

Botte:

Barrel

Recipiente di legno (Constitutive: made_of) fatto di doghe arcuate tenute unite

wooden container made of curved staves held together

da cerchi di ferro (Agentive: created_by) che serve per la conservazione e

by metal strips used for keeping and

il trasporto (Telic: used_for) di liquidi, specialmente vino (Constitutive:

transporting liquids, especially wine

contains)

 

Container

Materiale:

Material

Tutto ciò che serve per creare o costruire qualche cosa (Telic: used_for)

everything which serves for creating or building something

Material

Organo:

Organ

Ogni parte (Constitutive: is_a_part_of) del corpo animale/vegetale avente

Each part of a (human/animal) body or plant having

una particolare funzione (Telic: used_for)

a particular function

Body_part

Banconota:

Banknote

Biglietto di banca emesso dalla banca centrale (Agentive:created_by) a cui lo

Banknote issued by the Central Bank which is

Stato attribuisce valore di moneta legale (Telic: used_for)

assigned the value of legal currency by the government

Money

Pensiero:

Thinking

qualsiasi rappresentazione mentale, prodotto dell'attività del pensiero o

Any mental representation, product of the activity of thinking or

dell'immaginazione (Agentive: result_of)

imagination

Cognitive_fact

Pediatria:

Pediatrics

branca (Constitutive: part_of) della medicina che studia (Telic: purpose) le

Branch of medicine which studies

malattie del bambino (Constitutive: concerns)

children disease

Domain

 

Table 2.

 

 

Note that those dimensions that are not explicitly expressed in the definition of a word meaning are retrievable since they are inherited by virtue of its membership to a semantic type, as shown in the table below.

 

 

SemU

Dictionary definition

Inherited quale

Cazzuola:

trowel

Attrezzo del muratore (Agentive: used_by) di forma

Mason tool triangle

triangolare,per distendere la calcina (Telic: used_for)

shaped, used for spreading cement

Cazzuola

 

­

Instrument

(Agentive: created_by)

Pane:

Bread

Alimento costituito da un impasto d’acqua e farina,

Nutriment made of a mixture of water and flour

per lo più condito con sale, (Constitutive: made_of),

generally seasoned with salt,

lievitato e cotto al forno (Agentive: created_by),

leavened and baked

in forme diverse

in different shapes

 

Pane

 

­

Food

(Telic: used for)

Sedia:

Chair

Mobile su cui ci si siede (Telic: used_for), costituito

Piece of furniture to sit on with

da un piano orizzontale che poggia su quattro gambe e

an horizontal plane resting on four legs, with

da una spalliera (Constitutive: made_of)

a support for the back

 

Sedia

 

­

Furniture

(Agentive: created_by)

Comunicato:

Communiqué

notizia d'interesse generale divulgata da un mezzo di

Piece of news of general interest divulged by an

informazione, (Agentive: result_of)

information means

Comunicato

­

Information

(Telic: indirect_telic)

 

Table 3.

 

Clearly, template type assignment has not been an easy task for all word senses considered. The selection of the most suitable type has been sometimes quite awkward and the resulting choice may be debatable. Would for example materiale (material) be better encoded in the template Material or ¾ considering the underspecified genus part of its definition (everything which serves for creating or building something) ¾ rather in the Telic top type?

More than for nouns, some difficulty in assigning the adequate template type has sometimes been encountered for verbs and for adjectives.

 

From the practical point of view of coding, note that assigning a semantic type to an entry implies letting it inherit another information, i.e. the position of the type within the whole hierarchy. This information is provided by means of one of the two features: WVSFTemplateSuperTypeXXXPROT (for Simple types) or WVSFUnificationPathXXX-YYY-ZZZPROT (for Unified types).

 

 

2.4.2. Semantic Class Assignment

 

The assignment of a semantic class to SIMPLE entries is meant to provide a mapping between SIMPLE's ontology that encompasses both monodimensional and multidimensional types and LEXIQUEST's monodimensional organization of semantic types.

 

In a relevant number of cases, SIMPLE types and LEXIQUEST's semantic classes for nouns coincide, e.g.: Cognitive_fact, Vehicle, Container, Amount, Number, even though the label sometimes slightly differs, e.g.: Mouvement_of_thought vs. System of thought, Unit_of _measurement vs. Measure_unit, Quality vs. Attribute, People vs. Ethnos, Human_group vs. Human, Institution vs. Agency, Abstract_entity vs. Abstract, Psych_property vs. Psychological_feature, etc.

 

In some cases, different LEXIQUEST's labels correspond to a unique SIMPLE type and a choice had thus to be made:

 

It is worth noting that in the Italian Lexicon, the use of different semantic classes for SemUs encoded in a same template generally corresponds to a virtual subtyping which is in fact indicated by a different hyperonymic relation (see below Instrument).

 

Conversely, a single semantic class was sometimes linked to a number of SIMPLE types:

 

For verbs, the 15 semantic classes provided by LexiQuest proved to be insufficient. These classes were assigned to all relevant verb entries: 'Motion' to entries encoded in Move, Cause_Motion, Cause_change_of_location, 'Emotion' to Psychological_Event, Experience_event, Cause_experience_event; 'Change' to all Change and Cause_Change types, etc. but a few verb entries still miss semantic class information.

 

As for adjectives, the semantic classes of nouns were assigned, wherever possible. For other adjectives, according to the Project Specifications, the use of meaning components in the Constitutive Role is considered to make up for the lack of specific semantic classes and to provide equivalent information.

 

 

 

2.4.3. Domain Assignment

 

Domain information is to be selected among the elements of LexiQuest's domain list and is meant to inform about the topic of texts in which the SemU at hand is more likely to appear. Strangely enough, the assignment of the domain information has not been as straightforward as it could appear first.

In compliance with the Project Guidelines, no specific feature has been selected for common or unclassifiable word senses, which amounted to assigning the domain 'General'.

For word senses which may occur in texts dealing of different topics, different domains have been selected such as: 'Manufacturing_Industry', 'Craft_Industry', 'Service_Industry', 'Construction' for operaio (worker); 'Banking', 'Commerce', 'Economics' for banconota (banknote). Note that, since multiple choices were not allowed in the previous version of the tool, the very first set of encoded word senses may still need to undergo a revision in this regard.

For a quite relevant number of readings, the necessity of assigning the domain 'General' besides (a) more specific one(s) was felt. Since a domain 'General' does not exist in the LexiQuest's domain list and the assignment of such a value is only a default one, there is no possibility to distinguish between those entries which were assigned specific domains only and those that were meant to pertain to a 'General' domain besides (a) specific one(s). Therefore, for the time being, such a 'General' domain is to be intended as assigned by default to all SemUs in the Italian Lexicon. This provisional solution is only acceptable given the relatively restricted number of lemmas encoded in the framework of this project. SIMPLE's lexicon consists in fact of a majority of lexical units denoting meanings that may be found in general texts and of a very small number of word senses pertaining to specific domains exclusively. Should the SIMPLE lexicon be extended with domain specific terms, a solution to this problem would be crucially needed.

As recommended in the Project Guidelines, the most specific domain value has always been selected, hence 'Islam', rather than 'Religion' for the SemU moschea (mosque); 'Cuisine' rather than 'Food' for arrosto (roast); 'Biochemistry' rather than 'Chemistry' for proteina (protein).

Domain information is sometimes a relevant element of sense discrimination. Consider the coding of the word console (consul), a term which denotes both a diplomat and an authority in ancient Rome. The two senses are encoded in the template type Social_status and are distinguished by three elements: i) the lexicographic gloss, ii) the target SemUs of Formal and Constitutive roles, iii) the domain value 'Diplomacy' vs. 'Politics_and_government' and 'Antiquity'. The same holds for pasta (dough, pastry and pasta) in Artifact_food: the 'dough' and 'pastry' senses are assigned the domain value 'Bakery' while 'pasta' is assigned the value 'Food'.

Another example is colletto whose three meanings (collar of a dress/suit; collar of a plant; neck of a tooth) are all encoded in the template Part and are respectively assigned 'Clothing_Industry' and 'Fashion'; 'Botanics'; and 'Dentistry'.

 

Domain information has been much less assigned to verbs than to nouns because of the highly versatile nature of verbs. Domain values have been ascribed to specific senses like e.g.: suonare (to play) 'Music'; imprigionare (to jail) 'Penal_system'; esportare, importare (to export, to import) 'Business'; navigare (to sail) 'Sea_transport'; friggere (to fry) 'Cuisine'; convertire (to convert) 'Religion'; sposare (to marry) 'Marriage'; sceneggiare (to dramatize) 'Film', 'Theater'.

 

No Domain Information has been assigned to adjectives.

 

 

 

2.4.4. Other Types of Features

 

Some semantic features have been assigned with a view to easing the retrieval of entries that are not encoded under the same semantic type but still share a common feature. This is for example the case of:

 

The binary feature ‘Connotation’ that captures the 'common sense feeling' about a property or an event has been mostly used for Quality, Experience_event, and Expressive_speech_act typed entries. In a few cases, such optional information proved to be relevant for discrimination purposes, e.g.: for the word evento (event), evento_1 (fact which already occurred or may occur) and evento_2 (event of great relevance) were distinguished, besides the gloss, through the constitutive feature 'connotation=positive' ascribed to the second reading.

 

The use of other more specific features will be illustrated below in the section that describes the treatment of entries template by template.

 

 

2. 5. The Qualia Structure

 

In SIMPLE templates, the different dimensions of word meaning are captured in the Extended Qualia structure, which consists of four roles. In the Qualia Structure, the information is expressed mainly in terms of relations between lexical units but also by valued features. Although Qualia Structure information is not mandatory, we attempted to provide the widest range of different types of information which entries may carry by filling in the relevant roles; in the most underspecified entries, Qualia information only consists in the Formal Role relation.

 

The Formal Role allows to provide a broad characterization of an entity with respect to other entities. Formal quale information, which is expressed by the 'isa' hyperonymic relation for simple nouns and event-denoting entities is deemed quite important in the Italian lexicon since it creates an intermediate level between semantic types and lexical units. In fact, the value of the target SemU of the 'isa' relation gives in most of the cases a more granular information w.r.t. the one provided by the semantic type, and allows a further subtyping of entries sharing the same template, as e.g.: 'isa' mammifero, rettile, felino, pachidermo (mammal, reptile, feline, pachyderm) enables to differentiate entries encoded in Earth_Animal type.

As a general rule, we thus endeavoured to assign the closest hyperonym and to avoid as far as possible circular 'isa' relations.

For adjectives, the formal role is not expressed by a hyperonymic relation but rather by antonymy. Three different antonymic relations were used:

 

<SemU

id="USem61898"

naming="rosso"

......

<RWeightValSemU

weight="ESSENTIAL"

comment="bianco"

target="USemD2657"

semr="SRAntonymMult">

 

<SemU

id="USemD6473"

naming="italiano"

example="ragazzo italiano"

.....

<RWeightValSemU

weight="ESSENTIAL"

comment="francese"

target="USem61732"

semr="SRAntonymMult">

 

The Constitutive Role expresses the internal constitution of an entity. Typical constitutive relations which were used are: 'is_a_member_of', 'is_a_part_of', 'has_as_member', 'resulting_state', 'lives_in', 'has_as_property', etc.

Constitutive relations were filled as much as possible. Not only when they were part of the type defining information, as 'is_a_part_of' for the template Part, 'has_as_member' for the templates Group and Human_group, but also to give optional additional information each time it was deemed necessary in order to better grasp and describe the word meaning at hand, provided this didn't implied a proliferation of dummy entries. We used for example the optional information 'constitutive_activity' in the template types and subtypes Animal; 'has_as_part' in Plant, Building, Instrument, Vehicle: 'made of', mostly for Artifact_Material/Food/Drink; 'has_as_colour', for Plant, Fruit, Natural_Substance; 'contains', mainly for Container and Semiotic_Artifact.

A special mention should be made for the constitutive relation 'concerns', which is never type defining but has still been largely used. According to us, it proved in fact to be quite useful in many cases to express some aspects of word meaning that are only present in the lexical gloss and would therefore be lost otherwise. In the template Disease, for example, this relation was used to indicate the organ affected by the disease, whenever possible e.g.: for congiuntivite, occhio (conjunctivitis eye). Similarly, some semantic units typed as Clothing were assigned as target of the 'concerns' relation uomo (man) or donna (woman). Other examples, for sbocciare, fiore; for odore, olfato; for rublo, Russia.

As for adjectives, the crucial importance of the Constitutive Role will be illustrated in the section devoted to adjective encoding.

 

The Agentive Role provides information concerning the origin of an entity. Typical agentive relations which were used are 'created_by', for all kinds of artifacts; 'result_of', used mostly in all templates subsumed by the type Representation; 'caused_by' used to indicate the cause of a stimulus or a disease; 'agentive_prog' in the template Agent_of _temporary_activity to indicate the action which determines the way a person is referred to, e.g.: pedone o scioperante (walker, striker); 'agentive_cause' in all causative templates with prototypical target SemUs such as fare or causare (make, cause); 'agentive_experience', in the Experience_event template with the prototypical targets provare, sentire (to feel).

 

The Telic Role specifies the function of an entity, the purpose for which it exists or has been created. The main Telic relations that have been used are the following ones: 'used_as', mostly in Substance, Natural_substance and Artifactual_material; 'used_for' in Artifact, Building, Substance, Flavouring, Container, Clothing; 'is_the_activity_of' in Profession; 'object_of_the_activity' mostly in Clothing, Fruit, Food and subtypes; 'indirect_telic' in Representation and subtypes; 'telic', mainly in Purpose_act and Institution.

 

 

 

In the Formal role, the target of the hyperonymic relation may be the most adequate or a more general one, as happened for some verbs whose 'isa' target is just agire (to act). In the Constitutive role, the target for type-defining relations is generally easy to identify and, obviously optional information is only provided if a target is indeed identified. Conversely, it may be the case that a word meaning clearly conveys an Agentive or a Telic dimension, that this information is linguistically relevant and yet that its expression is problematic. For the Telic role, this phenomenon may have three causes:

Istituzione (institution):
Ente od organo istituito per determinati scopi pratici
(Organization created for specific practical purposes)

Strumento
(tool):
Attrezzo o dispositivo atto al compimento di determinate operazioni

(Tool or device able to perform particular operations)

Apparecchio (apparatus):
Dispositivo semplice o complesso per specifiche realizzazioni
(Simple or complex device for specific realizations)

Locale
(premises):
Parte di un edificio destinata ad un uso determinato
(Part of a building intended for a particular use)

 

Biblioteca (library):

Luogo ove sono raccolti e conservati libri | Edificio, sala con grandi

(Place where books are gathered and stored | Building, room with large

raccolte di libri a disposizione del pubblico per lettura e consultazione

quantities of books at the disposal of the public for reading and consultation)

 

Cantina (cellar):

Locale fresco, interrato o seminterrato, adibito alla produzione e

(Cool room, in a basement, used for domestic production and

conservazione familiare del vino o di derrate alimentari

storage of wine and food)

 

Carta (paper):

Materiale ottenuto dalla lavorazione di fibre di cellulosa, che si presenta

(Material obtained from cellulose fibers which is usually constituted

in forma di fogli sottili e pieghevoli, adatti a vari usi

by thin and folding sheets, suitable for different purposes)

 

Let us consider the specific case of carta. The last part of the definition clearly conveys a Telic information. It is indeniable that the paper has some kind of use and this induces to consider this information as linguistically relevant. However this information is totally underspecified as for a possible value. As a matter of fact, the appropriate Telic relation could be filled with a number of SemUs, such as: scrivere (to write), disegnare (to draw), stampare (to print), incartare (to wrap up in paper), etc., given the large and heterogeneous range of uses that paper may have. Hence, no unique semantic type can be found which could express a generalization over the different functions of carta. On the other hand, defining at lexical level the most prototypical usage of carta would certainly be restrictive, with the consequence that relevant information would be lost. It goes without saying that the awareness of the different possible uses of carta depends on the world knowledge of each individual.

From a pratical point of view, in such cases, the encoding options for similar cases seem to be:

(i) instantiating the role as many times as necessary in order to cover all possible functions of the meaning being described:

 

(i) used_for (<carta_1>, <scrivere>: [Symbolic_creation])

used_for (<carta_1>, <disegnare>: [Symbolic_creation])

used_for (<carta_1>, <stampare>: [Symbolic_creation])

used_for (<carta_1>, <incartare>: [Cause_change_of_state])

 

but this would be a time-consuming and anyway non exhaustive solution;

 

(ii) providing a telic information lexically underspecified

 

 

On the other hand, how is it possible to express for example the Telic dimension which undeniably exists in promettere (to promise); volere (to want); minacciare (to threaten); intenzione (intention); or the Agentive dimension existing in dimenticare (to forget), rompersi (to break), morire (to die)? Finally, while the target SemU of the Agentive relation for marxismo (Marxism) is obvious, no SemU can be found to express the agentive of socialismo (socialism) and yet this dimension is undoubtedly present in this word meaning.

For all the above problematic cases, we decided to use the features ‘WVSFTelicYesPROT’, ‘WVSFTelicYesESS’, ‘WVSFAgentiveYesPROT’, ‘WVSFAgentiveYesESS’ in order to preserve linguistically relevant information while avoiding to create underspecified or odd relations.

 

 

 

In the Italian lexicon, besides the type-defining information which was always provided, optional relations and features were always filled whenever possible. Note that some information which might be rejected as 'world knowledge only' proves in fact to be linguistically relevant. In the above case, for example the information provided in the 'Constitutive_activity' relation may then be exploited in the selectional restrictions of the corresponding verb arguments, as illustrated in the example below.

id = "USem1915"

naming = "cane"

freedefinition = "mammifero domestico"

weightvalsemfeaturel="TSVP_MAMMAL_TS_classificateur_de_nom_C TSVP_MAMMALOGY_TS_domaine_D WVSFHabitatEarthPROT

WVSFTemplateEarth-AnimalPROT WVSFTemplateSuperTypeAnimalPROT"

Relations:

semr = "SRIsa"

weight = "PROTOTYPICAL"

comment = "mammifero" (Animal)

target = "USem1123"

semr = "SRConstitutiveactivity"

weight = "ESSENTIAL"

comment = "abbaiare" (Non_Relational_Act)

target = "USem6391"

id = "USem6391"

naming = "abbaiare"

freedefinition = "verso del cane"

weightvalsemfeaturel="TSVP_COMMUNICATION_TS_classificateur_de_verbe_C WVSFEventTypeProcessPROT

WVSFTemplateNonRelationalActPROT WVSFTemplateSuperTypeActPROT"

Predicative Representation:

predicate = "PREDabbaiare#1"

typeoflink = "Master"

Predicates:

id = "PREDabbaiare#1"

naming = "abbaiare#1"

type = "LEXICAL"

multilingual = No

argumentl="ARG0abbaiare#1"

Arguments:

id = "ARG0abbaiare#1"

semanticrolel = "Role_ProtoAgent"

informargl = "INFARGS1"

InformArg:

id = "INFARGS1"

semu = "USem1915

Relations:

semr = "SRIsa"

weight = "PROTOTYPICAL"

comment = "verso" (Non_Relational_Act)

target = "USem6386"

semr = "SRTypicalof"

weight = "PROTOTYPICAL"

comment = "cane" (Earth_animal)

target = "USem1915"

 

 

 

2.6. Other Types Of Relations

 

Polysemic relation

Different senses (encoded in different templates) of polysemous lexical items belonging to regular polysemous classes were linked to each other using the relation ‘SRPolysemyX-Y’, where X and Y are two template types, e.g.:

 

<SemU

id = "USem3980"

naming = "aumentare"

comment = "BC 10"

freedefinition = "rendere più grande, più intenso, più numeroso; accrescere"

...

<RWeightValSemU

weight = "ESSENTIAL"

comment = "aumentare"

target = "USem3981"

semr = "SRPolysemyChangeofvalue&Causechangeofvalue"> ...

 

<SemU

id = "USem3981"

naming = "aumentare"

example = "la popolazione è aumentata del 10 %"

...

<RWeightValSemU

weight = "ESSENTIAL"

comment = "aumentare"

target = "USem3980"

semr = "SRPolysemyChangeofvalue&Causechangeofvalue">....

 

The following polysemic relations are encoded in the Italian lexicon for abstract and concrete nouns (table 4) and for adjectives and event-denoting lexical units (table5):

 

Polysemous Class

Examples

Related templates

Activity-Profession

musicista (musician)

[Agent_of_ persistent _activity] [Profession]

Animal-Food

agnello (lamb)

 

[Animal] [Substance_Food]

[Air-Animal] [Substance_Food]

[Earth-Animal] [Substance_Food]

[Water-Animal] [Substance_Food]

Animal-Fur

volpe (fox)

[Animal] [Artifactual_material]

[Air-Animal] [Artifactual_material]

[Earth-Animal] [Artifactual_material]

[Water-Animal] [Artifactual_material]

Artifact-Information

libro (book)

[Semiotic_artifact] [Information]

Convention-Artifact

contratto (contract)

[Convention] [Semiotic_artifact]

Building-Institution

scuola (school)

[Building] [Institution]

Figure-Ground

finestra (window)

[Opening] [Artifact]

Container-Content

scatola (box)

[Container] [Amount]

Substance-Color:

Flower-Colour

turchese (turquoise)

viola (violet)

[Natural_substance] [Colour]

[Flower] [Colour]

People-Institution

chiesa (church)

[Human_Group] [Institution]

People-Language

italiano (Italian)

[People][Language]

Place-People:

Organization-Location

citta' (city)

giornale (newspaper)

[Location] [Human_group]

[Area] [Human_group]

[Geopolitical_Location] [Human_Group]

[Building] [Human_Group]

Producer-Product:

Plant-Fruit

Plant-Flower

 

limone (lemon tree/lemon)

violetta (violet)

 

[Plant] [Fruit]

[Plant] [Flower]

Plant-Spice

pepe (pepper)

[Plant] [Flavouring]

Tree-Wood

noce (walnut tree/walnut)

[Plant] [Natural_substance]

Plant-Drink

caffè (coffee /coffee)

[Plant] [Artifactual_drink]

Table 4

 

Polysemous Class

Examples

Related templates

Inchoative-Causative

cominciare (to begin)

[Aspectual] [Cause_aspectual]

Inchoative-Causative

suonare (to ring)

 

[Causeact] [Nonrelational_act]

Inchoative-Causative

trasformare (to transform)

[Cause_change] [Change]

Inchoative-Causative

asciugare (to dry)

[Change_of_state] [Cause_change_of_state]

Inchoative-Causative

diminuire (decrease)

[Change_of_value] [Cause_change_of_value]

Inchoative-Causative

attaccare (to stick)

[Constitutive_change] [Cause_constitutive_change]

Inchoative-Causative

angosciare (to grieve at st.; to anguish)

[Experience_Event] [Cause_Experience_Event]

Inchoative-Causative

rotolare (to roll)

[Move] [Cause_motion]

Inchoative-Causative

collegare (to link)

[Relational_change] [Cause_relational_change]

Inchoative-Causative

temere (to fear)

[Cognitive_event] [Experience_event]

 

italiano (Italian)

[Nationality] [Style]

 

freddo (cold)

[Temperature] [Behaviour]

Table 5

 

 

Derivational relation

Derivation was marked by means of the following relations:

 

Synonymic relation

Synonymic relations were assigned in two cases:

 

 

2. 7. Predicative Entries

 

Predicative entries are assigned a predicative representation which consists in the assignment of a predicate, the type of link the entry holds with the predicate and the description of the arguments: predicate’s arity, semantic role of each argument and selectional restrictions.

 

2. 7. 1. Predicate Assignment

Each predicative SemU, be it verb, deverbal, deadjectival or simple noun, is assigned one lexical predicate. A total number of 2754 predicates were created in the Italian lexicon.

For verbs, predicate names coincide with the SemU naming, e.g.: SemUandare ó Predandare. As to deverbal nouns, they share with their verbal base the same predicate, i.e. accusare, accusatore, accusato, accusa (to accuse, accuser, accused, accusation) all point to the predicate accusare, be they encoded in the same semantic type or not. By contrast, polysemic entries of a verb may give rise to different predicates if they have a different arity — this is the case of inchoative and causative readings of verbs which point to two different predicates, a monovalent and a bivalent one — or even only different selectional restrictions on arguments.

<Predicate

id="PREDrompere-1"

naming="rompere-1"

comment="inchoative reading"

type="LEXICAL"

multilingual="No"

argumentl="ARG0rompere-1">

<Predicate

id="PREDrompere-2"

naming="rompere-2"

comment="causative reading"

type="LEXICAL"

multilingual="No"

argumentl="ARG0rompere-2 ARG1rompere-2">

 

<Predicate

id="PREDesporre-2"

naming="esporre-2"

type="LEXICAL"

multilingual="No"

argumentl="ARG0esporre-2 ARG1esporre-2 ARG2esporre-2">

<Predicate

id="PREDesporre-3"

naming="esporre-3"

type="LEXICAL"

multilingual="No"

argumentl="ARG0esporre-3 ARG1esporre-3 ARG2esporre-3">

<Argument

id="ARG0esporre-2"

semanticrolel="RoleProtoAgent"

informargl="INFARGN2">

<Argument

id="ARG0esporre-3"

semanticrolel="RoleProtoAgent"

informargl="INFARGT90">

<Argument

id="ARG1esporre-2"

semanticrolel="RoleProtoPatient"

informargl="INFARGT97">

<Argument

id="ARG1esporre-3"

semanticrolel="RoleProtoPatient"

informargl="INFARGN2">

<Argument

id="ARG2esporre-2"

semanticrolel="RoleLocation"

informargl="INFARGT97">

<Argument

id="ARG2esporre-3"

semanticrolel="RoleUnderspecified"

informargl="INFARGT35">

<InformArg

id="INFARGN2"

weightvalsemfeaturel="TSVP_PLUS_TS_HUMAN_T">

<InformArg

id="INFARGT35"

weightvalsemfeaturel="WVSFTemplateEventPROT">

<InformArg

id="INFARGT90"

weightvalsemfeaturel="WVSFTemplateEntityPROT">

<InformArg

id="INFARGT97"

weightvalsemfeaturel="WVSFTemplateConcreteEntityPROT">

 

 

 

 

2. 7. 2. Link to Predicate

 

The link the SemU holds with the predicate is expressed through the feature ‘typeoflink’. In the Italian lexicon, a predicative representation has been assigned to the following classes with the instantiation of the following links:

 

 

 

 

 

Beside for verbs, the type of link 'Master' was used for the following classes of predicative non-deverbal nouns:

 

 

 

 

 

 

2. 7. 3. Arguments

 

2. 7. 3. 1. Semantic Role

The third part of the predicative representation concerns the description of predicate arguments. Each semantic argument is assigned a semantic role. ‘ProtoAgent’ was assigned to verb subjects, provided the subject was not felt as undergoing passively the event; ‘ProtoPatient’ to verb objects, some verb subjects and strongly bound PPs; ‘Role_2Participant’ to indirect objects; ‘Role_SOA_ARG’ to clausal complements; ‘Role_Location’, ‘Role_Direction’ and ‘Role_Origin’ to complements of stative location or movement verbs; ‘Role_Kinship’ to all SemUs encoded under the Kinship type; ‘Role_HeadQuantified for amount denoting nouns.

 

 

2. 7. 3. 2. Selectional Restrictions

Restrictions on arguments are clearly not to be taken as real restrictions but rather as preferences of combinations, in prototypical situations. The SIMPLE model offers three possibilities to semantically restrict arguments:

<Predicate

id="PREDricoprire-1" (to cover)

naming="ricoprire-1"

type="LEXICAL"

multilingual="No"

argumentl="ARG0ricoprire-1 ARG1ricoprire-1 ARG2ricoprire-1">

.....

<Argument

id="ARG2ricoprire-1"

semanticrolel="RoleUnderspecified"

informargl="INFARGN13">

<InformArg

id="INFARGN13"

weightvalsemfeaturel="WVSFTemplateMaterialPROT WVSFTemplateSubstance">

 

<Predicate

id="PREDaggredire-1"

naming="aggredire-1"

type="LEXICAL"

multilingual="No"

argumentl="ARG0aggredire-1 ARG1aggredire-1">

<Argument

id="ARG0aggredire-1"

semanticrolel="RoleProtoAgent"

informargl="INFARGN3">

......

<InformArg

id="INFARGN3"

weightvalsemfeaturel="TSVP_PLUS_TS_HUMAN_T WVSFTemplateAnimalPROT">

 

 

<Predicate

id="PREDpattinare-1"

naming="pattinare-1"

type="LEXICAL"

multilingual="No"

argumentl="ARG0pattinare-1 ARG1pattinare-1">

...

<Argument

id="ARG1pattinare-1"

semanticrolel="RoleUnderspecified"

informargl="INFARGS14">

<InformArg

id="INFARGS14"

comment="pattino"

semu="USem62518">

 

 

3. Language Specific Typing

 

 

Going through the SIMPLE ontology, we comment in the following some relevant points concerning the typing of entries which is performed in the Italian lexicon.

 

3.1. Top Types

 

The top type Entity has been used to encode a few very abstract word meanings such as Dio, entita', cosa, spirito (God, entity, thing, spirit). For such senses, the information provided consists, as for all SemUs, of type hierarchy, domain, semantic class and, as far as qualia structure is concerned, of a generic isa relation in the formal quale.

 

Very underspecified non-concrete nouns, not easy to formalize from a semantic point of view, such as scopo, obiettivo (goal, objective) which only convey a bare Telic dimension; or origine, causa, motivo (origin, cause, motive) etc. which lexically instantiate the Agentive quale; or parte, elemento, modo, maniera (part, element, way) which are intrinsically Constitutive are encoded respectively in the top types Telic, Agentive and Constitutive. For such word senses - defined in dictionaries either by means of underspecified genus terms, e.g.: scopo: " cio' a cui si tende, che si desidera ottenere" (something you hope to achieve) or by synonymy - a taxonomic information obviously does not make any sense. We therefore encode a relation in the qualia dimension their meaning instantiates and, whenever possible, a synonymic relation.

 

<SemU

id="USem3376"

naming="scopo"

freedefinition="cio' a cui si tende, che si desidera ottene"

weightvalsemfeaturel="TSVP_ABSTRACT_TS_classificateur_de_nom_C WVSFTemplateSuperTypeTopPROT WVSFTemplateTelicPROT">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="ottenere"

target="USem59859"

semr="SRTelic">

<RWeightValSemU

weight="ESSENTIAL"

comment="obiettivo"

target="USem3985"

semr="SRSynonym">

 

Constitutive subtypes

 

Prototypical predicative entries denoting 'parts' and 'groups' are encoded in the Constitutive type. For those entries, the selectional restriction on the argument is the loosest one (Entity).

 

On the other hand, word senses which are perceived more as part of some entity rather than as autonomous units are encoded as Part members. This kind of perception is sometimes quite subjective. While a consensus would probably be found regarding membrana (membrane) as 'part of a cell', what about carburatore (carburettor): is it perceived more as 'part of an engine' or as 'apparatus'? This is the reason why, in the Linguistic Specifications, freedom was left to the partners to add the semantic relation 'is_a_part_of' as additional information to describe SemUs of other semantic types. In the Italian Lexicon, this relation was used in a number of types such as Artifact, Instrument, Building, Opening, Natural_substance, Time, Unit_of_measurement, Money, etc. Besides, all entries encoded in the template Part or bearing the relation ‘is_a_part_of’ are assigned the semantic feature ‘Plus_Ts_Part_T’ and can therefore be automatically retrieved.

 

SemUs denoting body parts are encoded in the specific template type which is linked to the semantic class 'Body_part'. An optional 'indirect_telic' relation allows to express the functionality of organs, e.g.: occhio (eye) ‘indirect_telic’ vedere (to see).

 

The type Group is assigned to those words whose meaning denotes a collection of any kind of entities (except humans, which have a more specific collocation in the type Human_group) e.g.: collezione, stormo, mandria, costellazione, collezione, attrezzatura, equipaggiamento (collection, flock, herd, constellation, equipment, outfit). Most of these units are predicative ones. Their semantic characterization differs as to semantic class, domain, target of the Constitutive type-defining relation 'has_as_member' and consequently, selectional restrictions.

 

The type Amount is assigned to quantity denoting word readings. The PPdi (of) complement of all the entries typed as Amount is either a mass noun or the plural form of a count noun. Besides prototypical lexical units indicating a quantity, e.g. quantita', grado (quantity, degree), a relevant number of SemUs encoded in this template consists of the content reading of container denoting nouns, i.e. un cucchiaio di sale, una bottiglia di vino, una scatola di cioccolatini (a spoonful of salt, a bottle of wine, a chocolate box). In this case, a specific polysemic relation links the two entries:

 

 

<SemU

id="USemD1268"

naming="bottiglia"

freedefinition="recipiente di vetro o plastica che serve a contenere liquidi"

weightvalsemfeaturel="TSVP_CONTAINER_TS_classificateur_de_nom_C WVSFTemplateContainerPROT WVSFUnificationPathConcreteentity-ArtifactAgentive-TelicPROT">

<RWeightValSemU

weight="ESSENTIAL"

comment="bottiglia"

target="USem2438"

semr="SRPolysemyContainer-Amount">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="recipiente"

target="USem2965"

semr="SRIsa">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="fabbricare"

target="USemD387"

semr="SRCreatedby">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="contenere"

target="USemD883"

semr="SRUsedfor">

<RWeightValSemU

weight="ESSENTIAL"

comment="liquido"

target="USem1388"

semr="SRContains">

<RWeightValSemU

weight="ESSENTIAL"

comment="vetro"

target="USem3144"

semr="SRMadeof">

<RWeightValSemU

weight="ESSENTIAL"

comment="plastica"

target="USem3007"

semr="SRMadeof">

 

<SemU

id="USem2438"

naming="bottiglia"

freedefinition="la quantita' di liquido contenuto in una bottiglia"

weightvalsemfeaturel="TSVP_AMOUNT_TS_classificateur_de_nom_C WVSFTemplateAmountPROT WVSFTemplateSuperTypeConstitutivePROT">

<PredicativeRepresentation

typeoflink="Master"

predicate="PREDbottiglia-1">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="quantita'"

target="USemD1595"

semr="SRIsa">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="liquido"

target="USem1388"

semr="SRQuantifies">

<RWeightValSemU

weight="ESSENTIAL"

comment="bottiglia"

target="USemD1268"

semr="SRPolysemyContainer-Amount">

<Predicate

id="PREDbottiglia-1"

naming="bottiglia-1"

type="LEXICAL"

multilingual="No"

argumentl="ARG0bottiglia-1">

<Argument

id="ARG0bottiglia-1"

example="una bottiglia di vino"

semanticrolel="RoleHeadQuantified"

informargl="INFARGN10">

<InformArg

id="INFARGN10"

weightvalsemfeaturel="TSVP_PLUS_TS_LIQUID_T WVSFStateLiquidPROT">

 

 

 

3.2. Concrete_Entity

 

3. 2.1. Location

 

The top type Location encodes general words denoting places, such as luogo, posto, localita' (location, place, locality) and has no type defining relation besides the Formal one.

Specific simple subtypes, on the other hand, allow to encode further type of information for natural locations, i.e. tridimensionality for mare, montagna, rilievo, altura (sea, mountain, natural elevation) in 3_D_location; bidimensionality, for spiaggia, campo (beach, field) in Area. Geopolitical_location, where proper nouns such as Italia, Milano, and common nouns referring to geopolitical locations, e.g.: nazione, citta', paese, quartiere (nation, town, village, quarter) are described. Common nouns are polysemous with the corresponding Human_group reading.

 

Beside simple types, Location subsumes also three unified types.

 

Opening allows to represent the agentive dimension of word meanings such as buco, tunnel (hole, tunnel). Some of the entries encoded in this template e.g.: finestra, porta (window, door) show a polysemic relation with their corresponding artifact reading.

 

In Building and Artifactual_area types, both Agentive and Telic dimensions are expressed. An interesting feature of some building-typed entries is their polysemic relation with another reading, e.g.: casa (house/home) Building-Human_group; ditta (company) Building-Institution, or even with both of them, e.g.: scuola, banca, chiesa, parlamento (school, bank, church, parliament).

 

Artifactual_area encodes entries referring to areas or surfaces which have been intentionally created, e.g. piazza, autostrada, percorso, strada (square, highway, course, route).

 

 

3.2.2. Material

 

Lexical items belonging to this class are entities of different types which are used as material and are underspecified with respect to both their natural/artifactual nature and composition. The qualia structure of the unified type Material instantiates the Formal and Telic dimensions. The target SemU of the isa relation is the SemU materiale (material) and the distinctive feature of the class components is provided by the target of their Telic relation, i.e.: imbottire, rivestire, ricoprire (to fill, to cover, to line). Only a few entries are encoded in this template since those materials which are specified for their artifactual nature and composition are collocated in the more specific template Artifactual_material, and those which are derived from natural substances, e.g.: argento (silver) are encoded in the template Natural_substance, which is also optionally specified for the Telic role.

Except for mattone (brick), the SemUs typed as Artifactual_material are all mass nouns. In this template, two relevant dimensions of word sense are highlighted: the Agentive and the Telic ones. The target SemU of the Agentive relation 'created_by', i.e.: lavorazione, fusione, raffinazione, conciare, cuocere (working, melting, refining, to tan, to cook) provides information about the process through which artifactual materials such as polistirolo, ottone, benzina, coccodrillo, porcellana are obtained. Moreover, an additional optional Agentive relation 'derived_from' informs, whenever possible, on the composition of the derived products, e.g.: benzina ¬ petrolio; carta ¬ cellulosa; bambagia ¬ cotone (gasoline, oil; paper, cellulose; cotton wool, cotton).

The Telic dimension is expressed by the relations 'used_for' and 'used_as'. The latter has a generic target materiale (material) unless more specific uses may be indicated, e.g.: for benzina: solvente and carburante (gasoline: solvent, fuel); for kerosene: combustibile (kerosene, fuel). The relation 'used_for' is filled in a number of cases with the generic verb fabbricare (to fabricate). For other words, it has been possible to provide more precise indication on their use, e.g.: calcestruzzo, cemento, mattone: costruire (concrete, cement, brick: to build); catrame: rivestire (tar, to cover); collante: incollare (glue, to glue) .

 

Note that some entries encoded in the template Artifactual_material, e.g.: coccodrillo, lucertola, tartaruga, visone (crocodile, lizard, tortoise, mink) display a polysemic relation with the corresponding reading encoded in Animal (or its subtypes).

 

 

3.2.3. Artifacts

 

The top type Artifact subsumes a number of subtypes as Artwork, Instrument, Vehicle, Container, Clothing, Money, Furniture, Semiotic_artifact, and the one already commented Artifactual_material. As for all top types which subsume more informative subtypes, Artifact includes only a few members, i.e. words denoting broadly artifacts (most of them are synonyms) and representing the top of the taxonomy, e.g. manufatto, strumento, utensile, apparecchio, dispositivo, arnese, attrezzo, macchina (artifact, instrument, tool, apparatus, device, tool, machine). For these word senses, Formal, Agentive and Telic relations are filled in with the most generic target SemUs. They are defined as hyponyms of manufatto, (which is itself defined as a kind of entity), the Agentive relation is generically expressed by the verb fabbricare (to fabricate) and the Telic dimension is, in most of the cases, provided by the Telic feature ‘WVSFTelicYesPROT’.

 

Artwork is the only subtype of Artifact lacking a type defining Telic quale. A Telic relation - as well as a Constitutive one - may however be added, as optional information, as was done for dramma (drama).

 

<SemU

id="USem902"

naming="dramma"

example="i drammi di Shakespeare"

freedefinition="componimento teatrale di tono serio"

weightvalsemfeaturel="TSVP_ARTIFACT_TS_classificateur_de_nom_C TSVP_THEATER_TS_domaine_D WVSFTemplateArtworkPROT WVSFUnificationPathConcreteentity-ArtifactAgentivePROT">

<RWeightValSemU

weight="ESSENTIAL"

comment="rappresentare"

target="USemD3056"

semr="SRObjectoftheactivity">

<RWeightValSemU

weight="ESSENTIAL"

comment="atto"

target="USemD5580"

semr="SRHasaspart">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="comporre"

target="USem5062"

semr="SRCreatedby">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="DUMMYopera_d'arteN1"

target="USemD609"

semr="SRIsa">

<RWeightValSemU

weight="ESSENTIAL"

comment="tragedia"

target="USemD5065"

semr="SRSynonym">

<RWeightValSemU

weight="ESSENTIAL"

comment="DUMMYscenaN1"

target="USemD624"

semr="SRHasaspart">

 

 

Instrument

A virtual subtyping of the type Instrument is possible by means of the hyperonymic relation encoded in the Formal role. In the Italian Lexicon, the 'isa' relation has in fact allowed to create an intermediate level between types and lexical units. For this purpose, we relied on dictionary definitions and their taxonomical partition. Thus, instruments are sub classified as:

 

(tool) (parallel bars, trapeze, rowing-machine, scythe, fork, axe, pincers)

(tool) (grill, cutter, drill, trowel, welder, chisel)

(tool) (nutcracker, corkscrew, mill, shears, fan)

 

Each of these subclasses subsume word senses belonging to different domains of usage. As to the Telic dimension, inherited from the Artifact type, and represented by the 'used_for' relation, it allows to group instruments belonging to different taxonomies, e.g.: bidente (pitchfork) 'isa' arnese; vanga (spade) 'isa' attrezzo; zappa (hoe) 'isa' attrezzo and all of them are linked through the same Telic: dissodare (to plough).

 

 

Vehicle

Here again a virtual sub partition of the type is performed through the 'isa' relation, which allows to differentiate four-wheel vehicles, i.e.: autoveicoli e.g.: automobile, ambulanza, autobus (car, ambulance, bus), from two-wheel ones, i.e.: ciclomotori, e,g.: motocicletta, vespa (motorbike, vespa). Vehicles which, for different reasons, do not fit into these two well-defined classes, e.g.: carro, cingolato, astronave, bicicletta (cart, tracked vehicle, space vessel, bicycle) are assigned a more generic 'isa' relation: veicolo (vehicle). The Constitutive relation, although not marked as type defining, allows to make further distinctions and therefore enables the extraction of subclasses. In the 'has_as_part' relation, the target motore (motor) and ruote (wheels) is shared by autoveicoli and ciclomotori. Air vehicles, are marked as having wings besides motore. Cycles, on the other hand, are marked as having pedals besides wheels. The Telic relation of vehicles is either trasportare or viaggiare (to transport or to travel). More specific uses, such as the one of navette (shuttle) could not be lexically specified. Domain information is here a quite relevant feature since it allows to distinguish the specific area of use of vehicles; i.e.: 'Road_transport' , 'Bus_transport', 'Car_transport', 'Rail_transport', 'Air_transport', 'Sea_transport', etc.

 

 

Container

The lexical units typed as containers are assigned either recipiente or contenitore (container) as target of the 'isa' relation, depending on the fact they contain liquids or solids, hence bottiglia (bottle) 'isa' recipiente; scatola (box) 'isa' contenitore. Generic terms denoting containers are assigned the target contenitore.

A relevant number of word meanings encoded as containers display a polysemic relation with a corresponding reading encoded in the template Amount (see above the entry for bottiglia).

 

Semiotic_artifact

In this type are encoded objects which are physical supports of information, e.g. libro, rivista, contratto, regolamento, lettera, documento (book, magazine, contract, regulation, letter, document). The corresponding readings denoting the information itself are typed as Information and a relation of regular polysemy holds between them. This is reflected in the selectional restrictions of the predicates: while the arg1 of leggere or scrivere (to read, to write) select the information reading, verbs such as strappare, bruciare, portare (to tear, to burn, to carry) select the concrete, artifact one.

 

 

3.2.4. Food

 

The top type Food encodes as usual generic terms such as cibo, alimento, nutrimento, piatto, portata, (food, aliment, nutriment, course) etc. The Telic dimension is lexically represented by the semantic unit mangiare (to eat).

The type Artifact_food provides the same information as Food, plus the Agentive dimension. So, everything which is elaborated in order to be eaten - either cooked , e.g. arrosto (roast) or only prepared, e.g. insalata (salad) - is better encoded in this template. Typical verbs used as target of the Agentive relation are cucinare, preparare, cuocere, impastare (to cook, to prepare, to knead).

The type Flavouring has as elements all those substances and plants which are used either to flavour or to season food. For plant derived flavourings, the 'isa' relation is aroma (spice) and a polysemic relation links flavouring and plant readings.

 

 

3.2.5. Living_entities

 

The template Animal is used to encode classes of animals such as mammifero, insetto, rettile, anfibio, etc. (mammal, insect, reptile, amphibian). The three subtypes Earth_animal, Air_animal and Water_animal are used for describing animal denoting nouns. Additional Constitutive information is encoded whenever it points to a characteristic feature of the animal described, e.g. tromba (trunk) for elefante (elephant). For amphibians, the two living environments are marked, e.g. rana (frog) is encoded as Earth_animal and its description includes the additional feature 'habitat=water'; anatra, as Air_animal, with the additional feature 'habitat=water'.

Some members of animal subtypes present a polysemic relation either with a food reading, a skin or fur reading, or with both, e.g.: agnello (lamb) is encoded as Earth_animal, Substance_food and Artifactual_material.

 

The type Human encompasses generic terms denoting humans, such as persona, uomo, donna, bambino, femmina, maschio (person, man, woman, child, female, male) characterized - except for the underspecified persona - by the features 'age' and 'sex'. These entries are assigned the semantic class 'Bio'. In this template are also encoded a relevant number of metaphorical uses of animal names, e.g. Luca e' un orso, un leone, una volpe (Luca is a bear, a lion, a fox). These entries are easily retrievable by means of the 'metaphor' relation which links them to the animal reading. Other metaphorical uses, not linked to animal names but rather to Social_status, e.g. nababbo, califfo (nabob, caliph) have also been encoded. Such metaphorical meanings are assigned the semantic class 'Situ'. The Constitutive relation 'has_as_property' is used in order to express the kind of physical or psychical property which is meant when dealing with deadjectival nouns such as calvo, biondo, (bald, blond).

Nouns denoting humans which are (or have been) object of an event (result_of), e.g. laureato, inviato (graduate, correspondent) are also typed as Human.

The type Human is also the place where nouns of persons were encoded. Such entries are retrievable by means of their specific semantic class 'Individual_names'.

 

Most of the nouns typed as People have a polysemic relation with the reading encoded in the template Language, reading which denotes either a language or a dialect.

 

A few lexical items are encoded in the template Role, i.e. membro, seguace (member, follower), since its subtypes Kinship, Ideo and Social_status provide more information.

 

Kinship nominals, which are relational nouns, are encoded in the template Kinship. The members of this class subcategorize for a human argument. Hence, the following representation is given:

 

<SemU

id="USem4026"

naming="figlio"

example="Guido e' figlio di Maria"

freedefinition="ogni individuo di sesso maschile rispetto a chi l'ha generato"

weightvalsemfeaturel="TSVP_BIO_TS_classificateur_de_nom_C WVSFAgeYoungPROT WVSFSexMalePROT WVSFTemplateKinshipPROT WVSFTemplateSuperTypeRolePROT">

<PredicativeRepresentation

typeoflink="Master"

predicate="PREDfiglio-1">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="persona"

target="USemD735"

semr="SRIsa">

<RWeightValSemU

weight="PROTOTYPICAL"

comment="famiglia"

target="USemD5487"

semr="SRIsamemberof">

<Predicate

id="PREDfiglio-1"

naming="figlio-1"

example="il figlio di Maria e Piero"

type="LEXICAL"

multilingual="No"

weightvalsemfeaturel="TSVP_PLUS_TS_HUMAN_T"

argumentl="ARG0figlio-1">

<Argument

id="ARG0figlio-1"

example="figlio di Maria"

semanticrolel="RoleKinship"

informargl="INFARGN2">

<InformArg

id="INFARGN2"

weightvalsemfeaturel="TSVP_PLUS_TS_HUMAN_T">

 

Ideo encodes nouns denoting people who follow some ideological movement, e.g.: integralista, marxista, impresionista (integralist, marxist, impressionist). The target of the Constitutive relation 'is_a_follower_of' is a member of the type Movement_of_thought, e.g.: integralismo, marxismo, impresionismo (integralism, marxism, impressionism).

 

In the template Social_status are encoded nouns which refer to people having a special social role in different fields: religion, aristocracy, government, e.g. Papa, duca, sindaco (Pope, duke, major). Social_status is different from Profession because it lacks the Telic role. In some borderline case this difference is not easy to establish: is senatore (senator) better classified as a profession or as a social status?

 

Agent_of_temporary_activity is used to encode word meanings such as ambasciatore, messaggero, visitatore, pedone (messenger, visitor, pedestrian), i.e. a human referred to with a particular semantic unit in virtue of the action that he is performing (or has performed, e.g.: assassino (murderer)) an action. This action is specified as target of the Agentive relation.

 

Agent_of_persistent_activity allows to encode nouns denoting humans which have a par