Corpora in Adult Ed

What Is a Corpus?

Corpse, marine corps, corporation, and corpulent all derive from the Latin word corpus, meaning body. That Latin word corpus also exists, intact, in English, but rather than an anatomical body, it refers to a body of language. A corpus is a large collection of language, traditionally written, but nowadays, corpora (the Latinate plural) of spoken language can be found.

Corpora in Language Teaching

The big benefit of using a corpus is that it’s data driven, and that data is based on actual language usage. It’s pretty much descriptivist heaven.

When a professor was first explaining to me the value of corpus data, he used this example: if asked to define the phrase par for the course, you’ll find “what is normal or expected in any given circumstances.” But if students depend only on definitions like that, from textbooks or dictionaries or teachers, they’re likely to miss some crucial information. Though that definition may be accurate, a quick corpus search reveals that the phrase is almost always used with a negative connotation, as in, “These tantrums are par for the course.”

As corpora have become more readily available and more representative of spoken, day-to-day language, they have become valuable tools for those in the field of TESOL. Most often, it’s researchers and materials developers, but there are some classroom applications for the corpus as well. Let’s look at the basics of how to use a corpus and check out a couple of introductory techniques for incorporating corpus data into your classroom.

Using a Corpus

When you use a corpus, you’re generally performing a search, just like you would in Google. The difference is that when you Google “kitten in a tree” you’re most likely looking for pictures of kittens in trees or information for getting kittens out of trees. If you search for the phrase “kitten in a tree” in a corpus, what you’re looking for is instances of that actual phrase in use. The language is your end goal.

The Corpus of Contemporary American English (COCA) is my go-to. Search a phrase just like you would anywhere else:

screen-shot-2016-10-23-at-12-29-23-pm

Your results will be every instance of the word tree found in the corpus. You could do the same for a string of words.

screen-shot-2016-10-23-at-12-30-28-pm

Along the left you have the year, type of source, and specific source, and then along the right is the actual context in which the word was found. This isn’t terribly helpful yet, though.

Let’s say I’m an English learner, struggling with prepositions. I want to know which prepositions commonly precede the word table. Let’s select the Collocates tab:

screen-shot-2016-10-23-at-12-44-07-pm

We’re also going to select “prep.ALL” from the [POS] dropdown next to “Collocates.” What this means is we’re searching for the prepositions that most commonly occur with the word table.

The scale of numbers below tells the engine where to search. By selecting the 2 on the left, I’m searching only for prepositions that occur one or two words before the word table. Any prepositions outside of that range or after table will be ignored.

Here are the results:

screen-shot-2016-10-23-at-12-45-09-pmThe frequency at the right tells us that on is by far the most common, with 12,830 hits, and in and at come in second and third.

That’s just a very brief primer, but corpora are extremely powerful tools for getting loads of language data. There are some tutorial videos out there to get you more familiar with using corpora.

Introducing ELs to the Corpus

Beginners

With beginners, I recommend doing the work for them. When presenting students with new vocabulary words, print out the results you get, and help them to notice important patterns related to syntax and collocation.

Intermediate

As students progress, show them how the corpus works, perhaps using an LCD projector while you search, narrating the process as you go.

Advanced

Once students can do some limited searching on their own, give them assignments that they can use the corpus to complete. For instance, design a cloze activity based on simple corpus searches that you have performed.

 

About Robert Sheppard

Robert Sheppard

Over the past 10 years, Rob has explored a variety of roles and contexts in the field. These include the cram-school culture of Taiwan and Korea; IEPs in Boston focused on academic English; advanced conversation and TOEFL prep taught via Skype to students in Japan; and nonprofit, community English programs for immigrants to Greater Boston. He currently serves as sr. director of adult programs at Quincy Asian Resources, a member of the community advisory council at First Literacy, and a curriculum consultant at Boston Global Institute. He has a master’s degree in TESOL from The New School, and his areas of interest include adult ed, pronunciation and grammar instruction, curriculum development, and assessment.

This entry was posted in TESOL Blog and tagged , , , , . Bookmark the permalink.

2 Responses to Corpora in Adult Ed

  1. mura nava says:

    Hi thanks this is a nice primer on corpora; your readers may be interested in the Google Plus Corpus Linguistics (CL) community which shares all things CL related – http://bit.ly/CLcomm
    Ta
    Mura

  2. Ra'anan says:

    Thanks for reaching, I like your idea of printing out to show patterns for a big picture.