|
Grup de Recerca per a l'Estudi del Repertori Lingüístic (GRERLI) |
![]() |
| index | presentation | members | projects | publications | doctoral theses | activities | corpus |
| < Spenc Corpusr > |
| <CesCa Corpus > |
The Spencer corpus is made up of 4 subcopora:
Spanish L1. Texts obtained in Cordoba (monolingual environment) and Barcelona (bilingual environment). Spencer Project: Developing Literacy in different contexts and in different languages
Catalan L1 . Texts obtained in Barcelona. Projects Discourse processing and organization of expository texts, both oral and (ref.: 1999-RED-5020-2A) and Linguistic depersonalization resources: crossslinguistic, developmental, and didactic perspectives (ref : BSO2000-0676 )
Spanish L2 . Texts collected in Murcia and Madrid from subjects of Arab, Chinese, and Korean origin. Project The development of linguistic repertoire in non-native speakers of Spanish and Catalan (ref : SEJ2006-11083 )
Catalan L2 . Texts collected in the Barcelona metropolitan area, from subjects of Arab, Chinese, and Korean origin. Project The development of linguistic repertoire in non-native speakers of Spanish and Catalan (ref : SEJ2006-11083 )
These 4 subcorpora are formatted for use by native and non-native Spanish and Catalan speakers, in two registers ( narrative and expository ) and two modalities ( oral and written ), starting from the same production conditions ( Berman and Verhoeven , 2002; Aparici, Argerich , Perera, Rosado and Tolchinsky ( eds .), 2000; Tolchinsky and Rosado, 2005).
In terms of subject characteristics, there are 4 groups, related to age or level of linguistic training: 9 years old (4th course of elementary school), 12 years old (2nd course of middle school), 16 years old (2nd course of high school) and adults (university students).These subcorpora of native speakers (Spanish L1 and Catalan L1) include the productions of 20 subjects per age group (800 texts in total) and the subcorpora of non-native speakers include the productions from an average of 10 subjects per group (in total, 450 texts).
Access to Spencer Corpus ( http://clic.ub.edu/es/spencer-es )
Written Scholastic Catalan in Catalonia
The CesCa project aims to provide the educational community with a fundamental tool for knowing the linguistic usages of its students: a reference corpus of written scholastic Catalan in Catalonia with derivative data to be obtained after its processing.
The project has collected and processed 2,426 texts produced by children since the last course of early education (P5) through the last course of compulsory education (4th of ESO), from 31 regional education centers in Catalonia.
The corpus contains Vocabularies produced in 5 lexical fields:
- food names
- articles of clothing
- natural phenomena
- leisure activities
- personality traits.
Here you will find organized information about:
- frequency of word use: forms and lemmas
- relationship between forms and lemmas
- school-level distribution of lemmas, given the time the subjects speak Catalan and their maternal language.