Preparing reading schemes

By Peter Viney, author of Fast Track to Reading and several of the Oracle Readers. Peter is also the Series Editor for the Oracle Readers

When you set up a graded reading scheme at any institution, you will be using books from a number of different schemes from different publishers, and the primary task is to impose your own ‘super grading scheme’ which ties all the different series together related to your teaching levels. We used to do this by putting coloured stickers on the books to create our own six levels. I mentioned this recently, and that it took some time to balance the levels between different publishers’ schemes, and a teacher said, ‘Surely that’s easy. All the 500 word lists must be the same nowadays. Surely you just take the first five hundred words of the British National Corpus for your 500 list, the first one thousand for your 1000 list and so on.’

Illustration from the Oracle Reader, The Case of the Dead Batsman
Not so. In practice, each publisher guards their grading schemes jealously. In the 1970s and 1980s, it was common for publishers to produce useful handbooks, listing the contents of word counts for different levels, and also listing the structural grading for each level, as well as listing other considerations such as sentence length, number of sub-clauses allowed and so on. Nowadays, the tendency is to say ‘This is a 500 word count book. It’s Council of Europe A1. It contains 4,627 separate words. It’s graded structurally, but we’re not going to tell you how, or what the 500 words are. Trust us.’

Such information is useless. Council of Europe A1 covers at least three levels in most schools courses, and the only reason they tell us the fact that it has 4,627 words is because the word processor tells them that at the push of a button.

The list of the most frequent 500 words in the British corpus, or the American corpus is of interest. However, it cannot dictate the contents of a wordlist for a graded reading scheme. Any scheme will also need to be structurally graded in some way. Teachers might argue for weeks on the best order of presentation for various structures, but the vast majority will agree that you teach the present tenses first, before past tenses. You’ll cover the past simple before teaching the present perfect, which will come before the past perfect. If we tried to define it in any more detail than that we would run into arguments and opinions. Broadly, there will be a consensus on that progression. When you look at a frequency list, you will find that words like went, would, should, gone, score highly for frequency. However, you could not integrate them into a reader for students who had not yet reached the past simple in their studies. Complexity will outweigh frequency.

A frequency list will indicate that like is highly frequent in spoken English, and in this current use: My new laptop’s like really cool. It’s got like a small kinda screen, and a sort of apple thingy in the lid that like kinda lights up when you, erm, like, open it. It’s a different use than the grammar book I like tea, but I don’t like coffee. If we want to replicate modern spoken English, students need to be able to understand it receptively, or rather to understand that like is a meaningless noise like um or er. However, students aren’t native speakers, and there’s no virtue in teaching them how to be inarticulate in a foreign language, just because many native speakers are inarticulate. It’s an important receptive point, but of no productive value. Frequency among native speakers would not dictate its place in a scheme.

Similarly, the fifty most-used words on Google include contact, copyright, privacy and rights. On the surface, contact looks useful, but it’s there because virtually every website has a box with Contact us (which promotes us to 20th most frequent word). Copyright, privacy, and (digital) rights are all there in an increasingly futile bid to preserve copyright.

Graded readers (unless written as cartoon strips, which is an effective idea at lower levels) will require narrative language. Connecting devices are important, so connectives such as because, but, before and then will be included even in the earlier levels.

Word counts

How are wordlists prepared, and how do they relate to your teaching? The Extensive Reading Foundation (ERF)  has a grading scale to help teachers equate various word counts with levels.

The ERF website also contains a bibliography of articles on extensive reading, and a list of the winners and nominees in the annual Language Literature Awards, If you’re starting a library, these lists will aid selection. An international jury selects the winning books in five categories, taking into account the internet comments of teachers and students around the world. You and your students can comment on the 2010 nominations from April.

Photo under CC license

When we were working on our own word lists for Streamline Graded Readers, which became Storylines and has now been adapted extensively into Garnet Oracle Readers, we had all the reference books and frequency counts that we could lay our hands on. A General Service List of English Words, The Cambridge English Lexicon, The Leuven English Teaching Vocabulary List, The Threshold Level, English Grammatical Structure (which related sets of words to structures for creating exercises), the defining vocabularies for various learners’ dictionaries, and just about every other graded reading scheme from L.A. Hill’s ancient lists for OUP to the Ladybird list for children. We had the vocabulary indexes (or indices, which is a word fast going out of use!) for the coursebooks themselves, as well as those for other coursebooks. We had about ten ELT dictionaries, three Thesauruses (is that the plural of Thesaurus? I don’t know), and a list of over 200 international words we had compiled while we were writing Streamline Departures and Connections. It has been stated that the average beginner ‘knows’ or can readily guess around 200 words which are used in English.

The deeper we looked into vocabulary lists, the more holes we found. The fun part of working on graded readers is writing the stories, not the wordlists, but for the series editor, that has to be the starting point.

The first issue is that these wordlists were ancient, even twenty years ago. Many of them were evolved in some way from A General Service List of English Words by Michael West, published originally sixty-six years ago in 1947, and revised many times after that. West’s pioneering work gave a list of 2000 words with notes on different uses, and the comparative frequency of each use. The samples were written English and the list showed its age. For example, it held that seldom was more frequent than rarely which was more frequent than hardly ever. Everybody we asked rated them in the reverse order.

We wanted to create a word list that combined frequency with narrative usefulness, a word list that was clearly related to the structure list, and a word list that would work in a contemporary setting. Everybody who had compiled a word list or a frequency count seemed to have done some rather strange things.

L.A. Hill created some of the most popular graded reading series of all time for OUP, which were still heavily used in the 1960s and 1970s. Elementary Stories for Reproduction was the best-known. So, what were words like lion, mouse and donkey doing in a 1000 word count?  Hill was fond of rewriting Nasreddin stories, and donkey is a pretty useful word in the circumstances. There is a virtue in using already known, classic stories, and if you follow that route, words which are not especially useful in everyday life in the 21st century suddenly become important.  Classic stories need swords, helmets, spears, axes, bows, arrows, donkeys, frogs, foxes, lions, horses, dragons, kings, queens, princes, princesses, beggars, emperors, caliphs, wizards, genies, ghosts, palaces, castles, fishermen and fishing nets. Most of these are easy to incorporate in illustrations, and most graded readers have illustrations.

The word list for Longman Structural Readers thirty years ago was far more mundane and included chalk, root, field, grass and supper at level 1. Inexplicably, pine (a kind of tree) was included at level 3, supermarket wasn’t in at all, computer was level 5, police came way down in level 4, and television was rejected from level 1. (It joined leaf in level 2.) There must have been a feeling that classroom language was important (or already known), so that words like teacher, blackboard, board, chalk were put into the first level. Age was not an excuse. In the mid-70s when it was at its height of popularity, police and television would be level 1 for most schemes. The series went out-of-print with select titles being rewritten and re-graded for Penguin Readers (which had their own more modern lists). Note that a somewhat-creaky wordlist doesn’t prevent authors from writing good stories.

The Cambridge English Lexicon by Roland Hindmarsh, published in 1980, was very much an update of West. Hindmarsh had 4500 words divided into six levels. He had not included across, in spite of, week or video. The list had grocer at level 1, baker at level 2, but supermarket was held back to Level 5, which was already way out of date in 1980, let alone thirty-four years on. It was hard to see why bite or board had been included in level 1. We couldn’t work out why menu should be level 4, and soup level 2.

Why did the 2000 word defining vocabulary for the Longman dictionaries contain words like donkey and Hinduism in the first edition (1978) , but not in the later editions? Defining vocabularies are somewhat different. Every word in a learners’ dictionary is either explained from a list of 2000 words, or later with OUP, 3000 words, or illustrated. Defining vocabularies need low-frequency words like root to explain other words. A potato/turnip/carrot is a root vegetable. It’s hard to see how useful donkey might be as a defining word. The only word that comes to mind is mule (a mule is a cross between a horse and a donkey), or possibly stubborn as in Donkeys are stubborn animals. Lists go out of date. A frequent word in lists is secretary, a job which is never advertised nowadays. It was changed to personal assistant many years ago, except for the exulted ministerial posts of Home Secretary, Foreign Secretary and Secretary to the Treasury in the highest echelons of government.

Secretary was also held to be sexist, because a woman might be called a secretary, whereas a man doing much the same job might be called an assistant manager. This is culture-bound in English speaking countries. In many parts of  the world, the majority of secretaries are male and are called secretaries too. More to the point is that computers have meant that most people have to organize their own jobs and type their own communications, rendering the job of someone who organized things for someone else redundant. Gender-marked words for jobs are eliminated from modern lists. Waiter and actor is the word for both males and females, and waitress and actress are no longer used. It’s certainly true that I have only ever heard female actors use the word actor to describe themselves in the last twenty years, though as Stephen Fry said, that has never stopped a female actor happily accepting the Best Actress award in the Golden Globes or the Oscars. Waitress is one that persists in spite of equal opportunities job descriptions rejecting it. The mental picture of a waiter is still male to most people. I prefer the American term server for a non-gender marked version, but it still sounds odd in a British context.

In 1989, Dee Reid’s Word for Word analysed the top 2000 words used by native-speaker 7 to 8 year olds. This takes a different angle, being based on ‘pupil use’ rather than frequency outside the classroom, and it threw up several surprises. Words like giant, dinosaur, monster and bubbles appeared, which had not turned up in any of the earlier lists. As we saw from L.A. Hill’s lists, animals are important in children’s stories, hence all those lionsmice and donkeys. Nowadays, the taste in animals is more exotic, so we have dinosaur, pandas and (since Jurassic Parkraptors. Disney’s massive marketing campaign for ‘Princess’ toys and clothes have made princesses a major word for small kids. Most two year olds know it.

When you compare the lists for various schemes, some are more rigid than others. For example, Storylines had foot and ball in the early lists, and so football could then be assumed. In some schemes, football is counted as an extra word. In both Storylines and Bookworms the prefix grand- was counted as one word, which allowed all the compounds with other words in the list: grandfather, grandchild, grandmother, grandson etc. In Penguin’s list, each was a separate word. That means that some 500 word lists are easier than others, as they have fewer truly different items.

This impacts on structure. For a zero beginner, am, is, are, was, were, be, being, been are seen as eight separate words. Teachers will say, no, according to structural complexity there are three items: 1) am/is/are 2) was/were 3) be/being/been. Some word counts will say, no, these are all aspects of one word, be. This is what is meant by a headword in dictionaries, it is the word that heads an entry, and bis a headword. When word counts are discussed there is often confusion as to whether we are discussing a 500 word count or a 500 headword count. be is the most awkward example. If we take close, it’s apparent that close/closes/closing and closed are all examples of one headword. eat/eats/eating/ate/eaten are less obvious.

Most reading schemes are fuzzy on the distinction between a word and a headword, and for sound practical reasons. All agree that the contracted negatives don’t count as separate words, so that is includes isn’t (as long as not is in the list … but it will be. Not appears in primary 50-word counts.) Most accept that has, have, hasn’t, haven’t can be counted as one word, but some declare that had is additional. Another area for discussion is irregular past forms and past participles. Are they separate words? Some will say ‘yes’, others ‘no’ and some ‘sometimes’. The argument would be that built can be deduced from build, but that the highly frequent went and saw are sufficiently far from the root word to be counted separately.

Words may appear twice or three times, counted separately. So orange may appear twice, counted as orange: noun and orange: adjective, or in everyday terms, orange: fruit and orange: colour. Pound might appear as pound, noun, money then pound, noun, weight and pound, verb (He pounded his fist on the door). The fourth use pound, noun as in The horses were kept in a pound and its related verb form The goods were impounded by customs officials are probably outside any scheme. In editing, we often noticed authors slipping the obscurer meaning of a word into stories. Only the defined meanings in the wordlist count.

An area where frequency falls down is efficiency. It would be inefficient to include both close and shut in a low-level wordlist. Both are frequent, but the author of a graded reader will be able to compose the story using either one. Having both is a waste of a word. By the time you get to a 1200-word list, you would have both, but below that one of the two will suffice. Similarly, a scheme won’t need to include usually, normally and generally or both hit and beat, or chair and seat. At the earliest levels, the choice would be made on the grounds of cover. That is, in how many situations you could use the word. Therefore seat has greater cover than chair. You have seats in planes, trains, living rooms, kitchens, waiting rooms, parks etc. chair is more limited, and you can write seat instead of chair in most stories at the lower levels. If the author prefers chair (maybe the story involves an electric chair or a sedan chair (!) or is about a furniture store ) then it becomes one of the additional words. Most schemes like to have both pen and pencil in level one. Think. If a character needs to write something in a story, either will suffice. I’d choose pen, then if the story really needs a pencil, it’s an extra word, though given the love of primary courses for illustrating classroom stationery, I doubt that many students don’t know it.

Another pitfall is the tendency to pair words. Push and pull, yes, fine.  But do you need every word in a set? Knife, fork, and spoon like to travel in a group of three, but in terms of both frequency and narrative usefulness, knife is far the most important, then spoon weighs in as much more useful than fork. I cast a retrospective eye on my old Storylines word list when I compiled the Garnet Oracle Readers list. There are several words there that I don’t think ever got used in our sixteen readers. The pair king and queen stands out. Horse was only used once. It’s the kind of word you illustrate if you need it. Prison appeared in stories a couple of times, but was not valuable enough to include as part of a 750 word list. It’s in other schemes too. Hotel was in there, but it’s so international that its space could be taken more usefully by a word like maybe …..

I wonder how often schemes compare their vocabulary lists with the books that have already been published in the scheme? When I revised Streamline Graded Readers into Storylines then Garnet Oracle, I took into account words which had been needed in glossaries more than a couple of times (add them), as well as words that apparently hadn’t been needed at all (take them out). An example is cat. Most schemes have dog and cat at an early level. A retrospective view showed that dog had been used several times, but cat hadn’t. If you need a cat in a story, there’s no problem, but is it needed in the word count? Here we get into defining vocabulary for glossaries, where ‘a big cat’ can be used to explain lion, tiger, leopard. It doesn’t differentiate them though, and I can’t think that a story with a plot hinge on these animals would fail to illustrate them.

When you’re trimming vocabulary lists to hit a magic round number (no one advertises a 407 word count), another factor is how specific a word is. Take words like bridge or pilot. bridge or pilot is either going to be in a particular story or not. This means glossary, not basic word list. That leads me to eliminate specific fruits (apple and banana are in many lists). I can’t think that in an age of iPods and iPads there are many learners out there unaware of Apple and its logo.  While apple was not international, Apple has made it so. On the other hand, I’d be wary of including an iPod or iPad in a story, international as they might be. You have to think ten years ahead, as graded readers live long lives in dusty school cupboards. It wasn’t so long ago that Walkman was an important international word, and cassette was in the original Streamline Graded Readers earliest level. Will we really be using iPods in ten years time?

Schemes vary in the rigidity of application of their word lists. Most schemes have a set number of extra words which are allowed in addition to the word list for a particular story. This is vital, as all stories will need some specific vocabulary items to propel them. Thirty was a common number, though Penguin at lower levels used to allow twenty. So, the authors of the books in a scheme could use thirty extra words in their story.  The extra words were usually listed at the back of each reader. They might be glossed (defined) or illustrated. When we were editing we had a thirty-extra-word limit. We didn’t worry if authors used 28 extra words or 32 extra words. If it was 35, they would be asked to reduce the number. If it was only twenty, they would be asked to introduce more new words as I believe that extensive reading should be expanding vocabulary. The thirty new words had to be glossed, in common with most systems. They also had to be used at least three times each in the story so that students had a chance to deduce meaning. Authors were free to use additional words which are considered to be international – words like taxi, jetinternet, hotel, football. International words don’t have to be of English origin either, just international: pizza, spaghetti, kung-fu, karate, algebra, café. They also had lists of numbers, days and dates, titles, countries and nationalities, weights and measures and abbreviations for each level.

In some schemes words are glossed on the page, either in footnotes or in a sidebar. I disagree strongly with this way of doing things. We learn new words by seeing them in context several times and guessing the meaning. Learners should get into the habit of reading past unfamiliar words then reflecting whether they needed to know or were able to guess before resorting to a glossary or dictionary. Glosses on the page encourage the habit of stopping every time you see a problem word before reading on. Illustration is a vital part of graded reading too. Illustration can ‘explain’ vocabulary instantly, as well as creating mood and atmosphere. See Oracle Readers Illustration for example.

In the last ten years most schemes have become more relaxed about the number of extra words. Schemes which used to restrict themselves strictly to twenty or thirty words now allow forty or more. There is also a widening gap between ‘international publishers’ (based in the UK or USA) and publishers from Continental European countries, with readers which are initially designed for use within their own countries of origin. Some excellent and inventive readers are appearing from European-based publishers, but a universal characteristic is having a far higher number of extra words. I read many graded readers a year, and have seen readers with more than 100 additional words, which makes a mockery of the ‘500 word’ list they supposedly base the story on.  That’s a 20% increase, which means any page will have too many unfamiliar words. That changes the student approach from extensive reading (which should be done reasonably fast) to intensive reading.

In practice, these readers will work well within the particular country of origin, because the authors will know which loan words from English are popular, and which words are very similar in (say) German and English, or Italian and English.  A major criterion in editing the stories will be ‘guessability’.  Teachers of French, Spanish and German always worry about so-called ‘false friends’ which are words that look the same as English words, but which have a different meaning. There are a few, but the number of ‘real friends’ is vastly greater, enabling a writer to predict which words a student will readily guess. However, when these graded readers are transferred to the Middle East or East Asia, they will be more difficult than graded readers which adhere more carefully to their lists. A good rule is to assume they’re a full level higher than they claim to be in non-Romance and non-Germanic language situations, where the guessing rate drops dramatically

My interest in graded reading schemes dates back to the late 1970′s, when we organized a library system for our own students. We concentrated on getting students to read for pleasure. It’s apparent that if students are interested in a title from a level higher, they will be motivated to read it, so a wide range of books in necessary as well as a flexible approach. Levels are a recommendation, not a barrier.

Photo under CC license
One of the most interesting results of our library was that the less we checked on reading, the more books the students borrowed. Eventually, we found that if the school administration issued books instead of the teacher, the students borrowed more books again. Some of our students on intensive courses were reading five or six graded readers a week. It would be nice to produce graphs and charts showing the effect on their English, but we didn’t need to. We knew as teachers that reading – and reading strictly for pleasure – had an effect on the student’s performance in all the skills. The greatest compliment the system had was when students told us they were beginning to read more in their own language as well.

This was written for READ, the magazine of the TESOL Arabia SIG and published in Issue 10 for the TESOL Arabia conference in Dubai. This unedited version was originally published on Peter Viney’s blog:

