Corpus description

Corpus collection

The NOCANDO Corpus is a corpus of spoken narrative texts. It was created by recording free picture-based narrations of native speakers in five different languages: Catalan, Italian, Spanish, English, and German.

Speakers

The participants were mostly students at the Universitat Pompeu Fabra in Barcelona. A smaller number came from different working environments.

Catalan and Spanish speakers were undergraduate or graduate students, mean age 22 for Catalan (between 18 and 30) and 20 for Spanish (between 17 and 29). They were all from Catalonia except one (Catalan speaker) from the Comunidad Valenciana and one (Spanish speaker) from Castilla y León.

Italian, English and German speakers were mostly recently arrived Erasmus students, mean age 29 for Italian (between 20 and 56), 27 for English (between 20 and 41), 34 for German (between 22 and 67). Italian speakers came from different parts of Italy. English speakers came from the United States and the UK. German speakers came from different parts of Germany.

Methodology

Speakers were asked to tell a story by following the pictures of three text-less books : onefrogtoomany

Mayer, M. (1973). Frog on his own. New York: Dial Books, Penguin.

Mayer, M. (1974). Frog goes to dinner. New York: Puffin Books, Penguin.

Mayer, M. and Mayer, M. (1975). One frog too many. New York: Dial Books, Penguin.

For 40 speakers, recordings were done in an acoustically isolated room provided by the Universitat Pompeu Fabra. For the remaining 28 speakers, they were done in a regular room at the Universitat Pompeu Fabra, with a digital recorder.

The three books were given in a random order for each speaker. The speaker could browse the book before starting the narration.

Total number of speakers: 68
Total number of narrations: 222
Total duration: ca 16 hours (2' to 9' per narration)

	Catalan	Italian	Spanish	German	English
Speakers	19	16	13	9	11
Recording time	4:02:43 h	4:04:32 h	2:35:20 h	2:09:13	2:32:20 h
Word count	37555 w	27392 w	25077 w	15944 w	21970 w (estimated)
Segment count	5856 seg	4306 seg	3801 seg	2154 seg	3140 seg (estimated)

Table: Quantitative information on each language represented in the NOCANDO corpus.