1.INTRODUCTION

This document is the second part of the overview of the conceptual model underlying the general purpose lexicons constructed in the course of the LE-PAROLE project.

The LE-PAROLE model is based on the GENELEX model with contributions from the MLAP/LE EAGLES project.

This report describes the PAROLE syntactic model:

Please note that the document contains lots of examples and SGML encodings. Most of them were extracted from PAROLE lexicons, but some of them were created as a matter of illustration and do not correspond to encoded data.

2.DESCRIPTION OF SYNTACTIC UNITS

The syntactic level is where all the information about syntactic behaviour(s) of lexical unit is described, especially what cannot be predicted from just knowing its morphosyntactic category and subcategory. As for morphology, complex and structured objects are defined in order to support explicitly the syntactic properties of each lexicon unit.

The " syntax " layer of the PAROLE conceptual level deals with:

These descriptive objects of the syntactic level consist mostly of:

  • Description (Frame) (a syntactic behaviour as the association of one Self and a Construction)
  • Self (the lexical unit characteristics or constraints)
  • Construction : list of Positions (slots) (a complementation frame inserted or not in a wider context)
  • Position (Slot) (a complement or an element of context)
  • Syntagma (Slot realization) (one of the possible surface realizations of a Position; it can bear complex information and, when non-terminal, it can be structurally constrained as a list of Positions)
  • Typed-feature (restriction to be added on SyntagmaNTC or SyntagmaT or on Self)
  • Association of Frames in Related objects
  • The basic descriptive element of a syntactic behaviour is a Description.

    2.1 Articulation with the Morphological Layer

    The morphological layer of the model describes lexical units from a morphological viewpoint.

    The description of lexical units also requires the representation of other types of information, for example concerning the particular syntactic behaviour of the entry. The objective here is to record as accurately as necessary the specific characteristics that distinguish it from the general behaviour associated with its grammatical category assigned in the morphological layer. For instance, we know that verbs have one subject and 0 to n complements. We will thus have to specify the number of complements and their nature. The syntactic layer of PAROLE is dedicated to the recording of this type of information.

    A morphological unit (Mu) having one and only one syntactic category at the level of the morphological layer (grammatical category) may have one or several "syntactic behaviours".

      Examples:

      L'homme arrive à Paris.
      (The man arrives in Paris)
      movement verb
      L'homme arrive à comprendre.
      (The man succeeds in understanding)
      modal verb
      L'homme vole une pomme.
      (The man steals an apple)
      transitive verb
      L'oiseau vole rapidement.
      (The bird flies quickly)
      intransitive verb

    The PAROLE model provides several ways for dealing with surface variation depending on how we use and combine the different syntactic objects: viz. syntactic unit, description, self, construction, position and syntagma. Since the model is not restrictive in this respect, one of the central tasks of syntactic encoding is to decide how the different surface realizations are modelled into subcategorization frames and how the resulting set of frames belonging to a particular lexical entry are organized into syntactic units.

    The PAROLE model presented concerns both simple and compound lexical units. Besides, the formalization of simple words obviously applies to the representation of the external syntax of compound words.

    Each morphological unit of type MuC (compound) and MuS (simple) may be in relation with at least one syntactic unit. If a Mu has several syntactic behaviours, then it may be in relation with several SynUs, it may be associated with only one SynU containing more than one description, or the construction associated to it may have alternating categories in its positions.In our approach, we consider that MuAff (affix) and MuCont (concatenate) do not have distinct syntactic behaviours that need to be represented in a lexicon.

    2.2 Syntactic Units

    A Syntactic Unit is equivalent to a syntactic entry or reading, and has one base description and possibly several derived descriptions.

    If the entry belongs to a major category, then the SynU gives a minimum description of its complementation pattern (complementation of the Verb, the Noun, the Adjective, the Adverb) through a base Description (attribute ‘description'). Transformed descriptions may be specified in the attribute ‘descriptionl’.

    If the entry does not belong to a major category, the SynU may express the context in which the entry inserts.

    All SynUs may bear a CombUF, a combination of four usage features: level of language (‘style’), frequency (‘frequency’), geographical variant (‘vargeog’) and datation (‘dating’).

    A SynU can be simple or compound. A compound syntactic unit is lexicalized with respect to phrase structure. It bears a Composition that contains the list of its components and the Self of its base Description has an internal structure (for a complete description of compound SynU, please refer to §2.12).

    The element TransfSynU is used to put in relation two specific SynUs; it is especially useful for derived words (e.g. cook - cooker).

    The Descriptions inside a SynU may be related by means of a FrameSet or not. The element FrameSet bears an identifier and thus it may be shared by several objects. It is used to express systematic relations between descriptions (e.g. pronominalisation).

    2.3 Descriptions

    A SynU is defined by a Base description and 0 to n related descriptions.

    Descriptions are shared objects defined by a Self and a Construction. The Self describes the morphosyntactic and semantic characteristics of the head of the SynU and the Construction specifies the complementation pattern of the head.

    A description is not intrinsically basic or transformed, but it fulfils these roles for a given SynU. The same description may play both the role of a base description for a SynU and that of a related description for another.

    A Description may not bear a Construction if one wants to describe the behaviour of the Mu without specifying its context of occurrence. But a Description consists at least of a Self.

    A Description can belong to one or more FrameSets.

    2.4 Self

    The Self carries the information of the lexical item when inserted in the context given by the Construction.

    For compound syntactic units, Self also allows to describe the internal structure of the compound unit through ‘syntagmats’ and ‘syntagmatsl’ features.

    Given a syntactic construction, Self allows to express all the characteristics of the entry for this construction as a caller through Intervconst:

  • category of the entry: it may be different from its morphological category (grammatical category of the Mu), which allows to indicate a difference between a morphological category and a functional category (category indicated by the syntagmatic label of the SynU). It is thus possible to describe the adjectival behaviour of the noun "abricot" (apricot), in "la robe abricot" (the apricot dress).
  • Example:

    abricot
    (apricot)

    Self : IntervConst : A

  • Function: Self is not always the Head of the Syntagma where it is inserted. In copular constructions, adjectives can be considered as the head of an adjectival syntagma which itself depends on the copula.
  • Example:

    Il est intéressant de remarquer cela
    (It is interesting to note that)

    bc : P0 P1 P2
    P0 : PRO[LEX:il][MORPHSUBCAT:IMPERSONAL]
    P1 : V[SYNSUBCAT:COPULA]
    P2[Function:SUBJPRED]:AP
    AP : (P0) SELF P1
    Self : IntervConst[Function:HEAD]:A
    P0 : ADVP
    P1 : PP[PREP:de]

  • Thematic roles
  • Using syntactic features on SyntagmaT linked to Self allows one to encode information such as conjugation auxiliaries if it is a verb, morphological restrictions, preverbal particles, etc.
  • Example of conjugation auxiliary:

    tomber
    (to fall)

    Self : IntervConst : V[AuxFeature:ETRE]

    Example of a morphological restriction:

    lustres
    Self : IntervConst : N[TNUMBER:PLURAL]

    Example of preverbal particle for uses of French "true pronominal verbs"

    s'en aller
    (to go away)

    Self : IntervConst : V[NPRONOMINAL:SEEN]

    2.5 Constructions

    A Construction describes the syntactic context required and/or restricted by the described entry. In other words, for verbs, a construction describes what is usually called a "complementation pattern".

    A construction is defined by:

    1. a list of positions (InstantiatedPositionC),
    2. their surface organization (OrderConstraint),
    3. a syntagmatic label (attribute ‘syntlabel’),
    4. the point insertion of the Self (attribute ‘selfinsertion’),
    5. interdependence between positions (attribute ‘solidarity’),
    6. it can also bear restricting features (SyntFeatureClosed / SyntFeatureOpen).

    The list of non-terminal syntagmatic label is given in §2.7.3.

    A list of available syntactic features is given in §2.8.

    As there is the possibility to rewrite some Syntagmas and insert the Self (together with its immediate context) in a wider context (a tree, as deep as necessary), Self is not always inserted at the top level of the construction. So, the attribute ‘selfinsertion’ is used to specify the insertion level for the entry. It has to be noted that this rewriting possibility for wider context is generally not useful for verb description, as the verb usually does not need to be inserted in wider context. But this is very useful for other categories.

    One may also wish to express the insertion point of Self in the construction (or the phrase in which Self occurs). To do so, the attribute ‘selfinsertion’ that takes the value i is used. It means that Self is inserted before the position Pi. If Self comes after all the positions of the construction, the value of i will be the value of the last position + 1. If one does not want to record the insertion point, the attribute will not be documented.

      Example of the French Construction NT162:

    This Construction has 2 PositionCs and -its Self inserts before the second position. It is used to encode verbal heads which are preceded by a subject and followed by a direct object.

      Examples:

      Paul lit un livre
      Paul reads a book

      Sgml encoding:

      <Construction
      id="NT162"
      syntlabel="Clause"
      selfinsertion="1">
      <InstantiatedPositionC
      range="0"
      optional="RATHERNOO"
      positionc="P0SynSN">
      <InstantiatedPositionC
      range="1"
      optional="RATHERNOO"
      positionc="P1SynSN"></Construction>

    2.6 Positions

    2.6.1 Definition

    A Position is an element entering into the definition of a construction or a non-terminal phrase.

    A Position is associated to a set of three elements:

    1. distribution (‘syntagmacl’ attribute): a position can be filled by a terminal phrase (SyntagmaT) or by a non-terminalnon-terminal phrase (SyntagmaNT) which can itself be rewritten,
    2. function: mandatory in PAROLE model,
    3. thematic roles (attribute ‘throle’): not mandatory in PAROLE model.

    The attribute ‘repetable’ serves to encode if a position can be repeated several times in a Construction.

    Positions may be shared by different constructions although their rank (P0, P1, ...) in these constructions may vary.

    2.6.2 List of available functions

    Here follows the final list of functions used in the PAROLE project. Not all functions are used by all Partners, please refer to P-WP1.1-MEMO-ERLI-5 V2: " Annex to TA: Encoding features and values for the morphological layer in the lexicon merged tags " for more information on this point.

    HEAD, SUBJECT, OBJECT, INDIRECTOBJECT, OBLIQUE, SUBJPRED, OBJPRED, NCOMP, NSUBJ, NOFCOMP, NPREPCOMP, NAPPOSITION, NADJUNCT, NCLAUSCOMP, NDETERMINATIVE, NATTRIBUTIVE, NMODIFIER, ACOMP, APREPCOMP, ACLAUSCOMP, AADJUNCT, AMODIFIER, ADVCOMP, ADVPREPCOMP, ADVMODIFIER, DETMODIFIER, PREPDEPENDENT, CONJDEPENDENT, PREPOBJ, ADVERBIAL, COMPL, CLAUSCOMP, NGENATTRIBUTIVE, NLEFTATTRIBUTIVE, NPOSTPCOMP, NRIGHTATTRIBUTIVE, REALSUBJ.

    2.6.3 Link between Positions, SyntagmaNTC and Constructions

    Contrary to the GENELEX model, Positions are not directly linked to Constructions or SyntagmaNTC. This link is ensured through InstantiatedPositionC objects.

    If a SyntagmaNTC or a Construction consists of 2 Positions, then 2 InstantiatedPositionCs are linked to the SyntagmaNTC or the Construction. InstantiatedPositionC objects encode:

    1. the corresponding position,
    2. the range of the position in the SyntagmaNTC and/or the Construction ( ‘range’ attribute),
    3. the optionality of the Position ( ‘optionality’ attribute).
    4. Example of the French Construction NT162 that has 2 PositionCs and 2 InstantiatedPositionC (please refer to §2.5 for an explanation of the Construction):

      Sgml encoding:

      <Construction
      id="NT162"
      syntlabel="Clause"
      selfinsertion="1">
      <InstantiatedPositionC
      range="0"
      optional="RATHERNOO"

      positionc="P0SynSN">
      <InstantiatedPositionC
      range="1"
      optional="RATHERNOO"
      positionc="P1SynSN"></Construction>

      <PositionC
      id="P0SynSN"
      comment="Fonction attribuee"
      function="SUBJECT"
      syntagmacl="SynSN">

      <PositionC
      id="P1SynSN"
      comment="Fonction attribuee"
      function="OBJECT"
      syntagmacl="SynSN">

    2.7 Syntagma

    2.7.1 General definition

    A position may be filled either by one or several terminal phrases (SyntagmaT) or by one or several non-terminal phrases (SyntagmaNTC).

    A phrase occupying a position is formally described by a terminal or non-terminal syntagmatic label with which a set of constraints is associated if any.

    SyntagmaNTCs and Constructions share the same definition, they have the same attributes and they are connected to the same objects. Please refer to §2.5 for information concerning the structure of SyntagmaNTC.

     

    SyntagmaT are defined by:

    1. a terminal syntagmatic label,
    2. syntactic features (not mandatory).

    2.7.2 Syntagmatic labels for SyntagmaT

    Here follows the final list of terminal syntagmatic labels used in the PAROLE project. Not all labels are used by all Partners, please refer to P-WP1.1-MEMO-ERLI-5 V2: " Annex to TA: Encoding Features and values for the morphological layer in the lexicon merged tags " for more information on this point.

    Available values:

    V, N, A, PRO, ADV, CONJ, ADP, DET, ART, NUM, RES, UNIQUE, INTER, ADADJ, POSTADV, E.

    2.7.3 Syntagmatic labels for SyntagmaNTC

    Here follows the final list of non-terminal syntagmatic labels used in the PAROLE project. Not all labels are used by all Partners, please refer to P-WP1.1-MEMO-ERLI-5 V2: " Annex to TA: Encoding Features and values for the morphological layer in the lexicon merged tags " for more information on this point.

    Available values:

    NP, VP, PP, AP, ADVP, Clause, NG, DETP, PSP, WITHOUTE.

    2.7.4 Alternatives of realization

    A Position can be filled by one or more Syntagma types. This is useful to encode alternatives of distribution, that is to say distribution paradigms.

    For instance, if a lexicographer wishes to encode that a transitive verb can either take a clausal direct object or a nominal direct object, he can do so by linking 2 SyntagmaNTCs to the same position.

    Example of the Catalan verb sentir which can either be followed by a VP or a VP introduced by a as a complement:

     

    Examples:

      He sentit a dir que apujaran els preus.
      I have heard that there will be a rise in the prices

      He sentit dir que apujaran els preus.

      Construction of the verb sentir (hear):

      <Construction
      id="CSnOinfA"
      selfinsertion="1"
      syntlabel="Clause">
      <InstantiatedPositionC
      range="0"
      optional="NOO"
      positionc="Snp">
      <InstantiatedPositionC
      range="1"
      optional="NOO"
      positionc="OvpinfA">

      Position that encodes the alternatives of realizations:

      <PositionC
      id="OvpinfA"
      function="OBJECT"
      syntagmacl="VPinf VPinfA">

    2.7.5 Structure constraints

    In most cases, the label referring to the phrase is sufficient to describe it as a position filler, and no constraint on its structure needs to be expressed for the described entry.

    However one may need to express constraints on the structure in which a lexical entry inserts.

    To do so, it is possible to use:

    1. either syntactic sub-category features,
    2. or lists of embedded positions allowing to describe n-depth syntactic trees; in that case, this is a tree-structured rewriting of phrases.

    2.7.5.1 Tree-structured rewriting

    2.7.5.1.1 Description

    Positions can recursively be rewritten because the SyntagmaNTC and the Construction have both the same definition:

    2.7.5.1.2 Examples of use

    AP will be rewritten in French as follows:

     

       

    The optionality of ADVP pre-modifier is indicated using parenthesis.

  • description of left-positioned attributive adjective
  • In these schemata, the star (*) encodes the fact that the position can be repeated.

  • description of right-positioned attributive adjectives
  • Example:

  • description of subject predicative adjectives
  • Example:

    2.7.5.2 Partial rewriting of Phrases

    For some entries we may want to express partial restrictions on a phrase without having to rewrite it completely, because it is not always possible; for instance, in the case of a verbal phrase, we do not know how many positions there are if we do not know the head.

    In that case, the structure of a "prototypic" phrase (list of positions) is maintained, but certain position fillers are restricted both at the level of their list (removal of fillers) and at the level of constraints (addition of restricting features) on each.

    The solution is then to give only the list of positions that is restricted or the type of fillers, and to use the attribute ‘positionl‘ whose values (OPEN, CLOSED) allow to specify whether the list of rewriting positions specifies entirely (CLOSED) or partially (OPEN) the rewriting of the phrase.

    2.8 Features

    Features are restrictions adding to the syntagmatic label in the specification of a Phrase.

    The different usable types and sub-types of features, their attributes and the values they can take are going to be presented.

    Since time has past since the specification of the features and the encoding, some features were finally not used by partners. In this case, only available values are indicated for the feature, no explanation or sgml encoding is given.

    2.8.1 Lexical features

    Lexical features (LexFeature) allow to constrain all or part of the lexicalization of a phrase.

    To do so, one has to specify:

    There are two types of lexical features: introducers and the LEX feature itself.

    2.8.1.1 Introducers (INTROD, PREP, CONJ, RELPRO, INTPRO and POSTP)

    These features allow to specify the lexicalization of phrase introducers without having to rewrite them. They are not ambiguous and only apply to non-terminal phrases (SyntagmaNTC).

    6 types of features are distinguished:

    1.- PREP is used for the preposition introducing a non-terminal phrase, typically a Prepositional Phrase

    Example of the encoding of the adjunct introduced by about in English:

    Sgml encoding:

    2.- CONJ is used for the conjunction introducing a sentence typically in a that-clause.

    Example of the encoding of the que that introduces that-clauses in French:

    Sgml encoding:

    3.- RELPRO is used for the relative pronoun introducing a sentence

    4.- INTPRO is used for interrogative pronouns that can be specified by verbs requiring interrogative clauses (mainly verbs of speech).

    Example of si (wether) that introduces interrogative clauses of verbs like demander (ask)):

    Sgml encoding:

    5.- INTROD is used for any non-terminal phrase introducer (particle, "recategorizer", etc.) not belonging to any of the four other types.

    In the Danish Lexicon, the value at for INTROD feature (corresponding to english that) encodes the presence of the introducer at in subordinate clauses.

    Example:

    Sgml encoding:

    6.- POSTP

    2.8.1.2 LEX

    The LEX feature allows to specify the lexicalization of:

    A particular case of restriction is when the phrase is saturated by its head, i.e. the noun phrase is limited to the noun. To express this property, we use the attribute ‘saturesynt.’

    When it has the value YESSA, the head of the phrase is the only leaf with the Introducer, if any. This applies to phrases entering in the definition of simple and compound SynUs.

    The field saturesynt must always have the value YESSA when the feature applies to a terminal category: by definition, the leaf is saturated by the lexicalizing element.

    In the rewriting alternative, we will use the LEX feature (see next paragraph) on terminal phrases.

    Example of the French verb accepter (accept) that sub-categorize for a VP complement introduced by de:

    Example:

    Sgml encoding:

    2.8.1.3 Cooccurrence of lexical features

    A particular phrase can only bear one and only one LEX feature that will lexicalize the terminal phrase in the one case, and the head, and only the head of the non-terminal phrase, in the other case.

    Since each non-terminal phrase has only one introducer, then a phrase can only bear one feature specifying its introducer.

    On the contrary, for non-terminal phrases, introducers may be combined to the LEX feature.

    These remarks are also valid by reflection for the corresponding RefLex features which are described in the chapter on compound syntactic units (cf. §2.12).

    2.8.2 Morphological features

    They allow to express a restriction on the value of a morphological feature (MOOD(mood of the verb), TENSE (tense of the verb), PERSON (person), GENDER (gender), TNUMBER (number), and POSSESSOR (number of the possessor), GENDERPOSS (gender of the possessor)) of the phrase or of one of its components.

    MOOD, TENSE, PERSON, GENDER, TNUMBER and POSSESSOR are features that can combine with one another to form a specific combination of morphological features.

    Here follows the list of available values for morphological features as they are specified in the document P-WP1.1-MEMO-ERLI-7(V3): " List of feature names and feature values used in the PAROLE lexicon DTD ".

    MOOD:

    Available values:

    Example of Italian verbs that sub-categorize for a that-clause whose Mood is indicative:

    Example:

    Sgml encoding:

    TENSE:

    Available values:

    PRESENT, IMPERFECT, FUTURE, PAST, PLUSQUEPARFAIT.

    In the Danish Lexicon, the value PAST of the TENSE feature is used for verbs to mark constructions with past participle.

    Example:

    jeg mindes ikke brevet sendt
    I do not remember that the letter has been posted

    Sgml encoding:

    PERSON:

    Available values:

    In the Spanish lexicon, PERSON is used to distinguish 'terciopersonal' verbs; in this case, the value is "3". These verbs do not inflect in person and always occur in third person singular form, these include:

    (a) 'metereological' verbs

    Example:

    llover
    rain

    (b) verbs (exclussively) taking sentential subjects

    Example:

    me consta que le gustas
    I know that he likes you

    (c) and impersonal verbs (no subject) such as bastar

    Example:

    me basta con eso
    to me suffices with that

    Sgml encoding:

    GENDER:

    Available values:

    MASCULINE, FEMININE, NEUTER, GCOMMON, MF, CONT, INDISCRIMINATE, OO, INANIMATE, NONMASCULINE, NONNEUTER, .

    GENDER is used for the description of impersonal adjectives in the Greek Lexicon. In Greek, besides impersonal verbs, there are two kinds of impersonal expressions, both introduced with the verb "eimai" (be) the third singular person, and either a noun or an adjective in a specific form: both the noun and the adjective must be in the singular nominative case, and, furthermore, the adjective appears only in the neuter gender. Impersonal expressions subcategorise for clausal complements.

    Sgml encoding: