A treasure trove of Welsh vocabulary

20th May 2005 at 01:00
February 2005 saw the launch of Ein Geiriau Ni (Our Words), aka Egni (Welsh for "energy") - a Corpus of Children's Literature in Welsh. This comes after two years of painstaking development by the educational psychology team of Swansea Local Education Authority with the help of the National Foundation for Educational Research's (NFER) Welsh Unit - the same development team which created the All Wales Reading Test Series. But what exactly is Egni and, more importantly, how can it be of use in the teaching of Welsh?

The developers' first step was to collect almost 500 Welsh children's books, ranging in difficulty from the elementary pages of Sali Mali to key stage 3 history text books, although the majority were key stages 1 and 2 fiction. The books then passed through a panel of teachers and language experts who tagged them at national curriculum levels 1-5, before being scanned into a database, or corpus, until the number of words exceeded 2 million, the largest corpus of Welsh vocabulary ever produced. Following a lengthy period of correcting the spellings which had not scanned properly the corpus was ready for analysis.

Its most basic use is the production of word frequency lists. One complication in word frequencies in Welsh is that the basic word can change when the initial consonant is "mutated", as when "cath" (cat) becomes "ei gath" (his cat) or "fy nghath" (my cat). However, this is no bother for the corpus as mutated forms can be included separately or recorded under the basic "cath". Different verb forms such as "aeth" (heshe went) and "af"(I shall go), can also be counted separately or gathered under the verb-noun "mynd" (to go).

The frequency lists are often revealing. In presenting a new language to learners, the emphasis is often placed on nouns, yet of the most frequent 200 words in this large sample only 20 are nouns. The two commonest are "mam" and "dad", virtually proper nouns, while others can be used as prepositions or adverbially, such as "lawer tro" (many times). Boys' names are much more common than girls', suggesting that boys are more often the main characters in children's books. Incidentally, the most popular names for boys are Tomos and Huw, and for the girls Catrin and Llio.

Each word was tagged with the book it came from, so word frequencies can be established for particular NC levels. The frequency of word combinations is also available, and so we see that the most frequent combinations of the adjective "bach" (small) are "yn ddistaw bach" (quietly), "y ty bach" (the little house - or toilet), and "y mochyn bach", the little pig, which probably comes in threes.

But Egni is more than just a plaything for linguists. Amongst its practical uses are:

* Identifying a basic first 100 words and first 200 words list for first steps in reading

* Helping authors and translators of children's books to target their work at particular age groups

* Indicating the reading age and difficulty level of fictional material

* Helping to establish appropriate readability levels for Welsh-medium text books in different subjects

* Providing guidelines for the creation of Welsh reading tests which reflect the use of the language in real books.

Teachers will surely discover other uses for this remarkable resource to which will be added further books as they are published. For more information visit www.egni.org.

Robat Powell is head of NFER's Welsh Unit

Subscribe to get access to the content on this page.

If you are already a Tes/ Tes Scotland subscriber please log in with your username or email address to get full access to our back issues, CPD library and membership plus page.

Not a subscriber? Find out more about our subscription offers.
Subscribe now
Existing subscriber?
Enter subscription number


The guide by your side – ensuring you are always up to date with the latest in education.

Get Tes magazine online and delivered to your door. Stay up to date with the latest research, teacher innovation and insight, plus classroom tips and techniques with a Tes magazine subscription.
With a Tes magazine subscription you get exclusive access to our CPD library. Including our New Teachers’ special for NQTS, Ed Tech, How to Get a Job, Trip Planner, Ed Biz Special and all Tes back issues.

Subscribe now