Not just a linguistic resource but a unique record of humanity

robbie-love 150Robbie Love is a PhD student at the ESRC Centre for Corpus Approaches to Social Science (CASS) at Lancaster University, where he spent four years working on the Spoken British National Corpus 2014 project.


harry-strawson 150Harry Strawson is a writer living in London and contributed recordings to the Spoken British National Corpus 2014.

Here Robbie and Harry share two different perspectives on the Spoken British National Corpus project ahead of its release next week.

Every day billions of words are uttered in hundreds of languages all over the world. For corpus linguists, that is, people who study the form, use and function of language using specialised computer software, speech is like the golden snitch in a game of Quidditch. It appears to be everywhere around you and yet it is incredibly difficult to capture.

Many examples of writing, especially those found online, are already in a format that can be read on a computer screen. Think e-books, online news articles and tweets. Whereas speech is rarely captured in a permanent form – especially private conversations among family and friends. Words are uttered and then they disappear, leaving no trace.

The result is that linguistic research is often based upon examples of writing and not speech. Transcripts of spoken interactions are rare.

Enter the Spoken British National Corpus 2014 – 11.5 million words of informal British English chit chat collected between 2012 and 2016, fully transcribed for the benefit of linguists, educators and social scientists. It’s a collaborative effort between linguists at the ESRC Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and English language teaching experts at Cambridge University Press – the first of its kind for over two decades.

The contributor

Harry Strawson is one of the Spoken BNC2014 contributors. He is a member of the public paid to use his smartphone to make recordings of conversations with his family and friends.

I began submitting conversations to the corpus in 2015. For over a year I recorded industriously. It was only after I had amassed hours of conversation that I really considered what the recordings might be used for. It was difficult to imagine that the humdrum of my family and friends could really interest anyone else. I was curious about the researchers. I realised that an attentive researcher could end up knowing a great deal about my life: a silent, invisible observer hanging on my every word. Occasionally I had a sense that some kind of murky relationship existed between me and the researcher  – but I was always aware that it was not me per se that the researchers were interested in. They were interested in what my conversations might reveal about the state of spoken British English today.

I find it strange and exciting to know that all those hours of conversation I recorded will remain as sociolinguistic data in the years to come.


The researcher

Robbie Love is a researcher for the Spoken BNC2014, and his PhD thesis about the project is funded by ESRC.

I was involved with every stage of the compilation of the Spoken BNC2014, including its design, promotion, transcription, data processing and analysis.

As recordings came in from contributors – including Harry – they were transcribed by a team at Cambridge University Press. Contrary to expectation, this is the only time when someone would listen to the recordings from start to finish. The transcribers were concentrating on typing up what was said as accurately as possible. Some conversations included a dozen speakers, so this was a difficult job.

When I received the contributions they had been transcribed into Word documents. I wasn’t interested in the audio recordings, but rather checking for errors in the transcription and converting these into a special format for corpus analysis. I wouldn’t even read through the transcripts end to end. As hard as it was, I deliberately detached myself from any individual’s story, for fear of spending every day just reading the conversations and not getting any work done.

1,251 transcripts feature in the corpus. They will be used to better understand the changing nature of spoken British English and help teach learners of English.

Sometimes I sit and read through the list of conversations: the newlywed couple reminiscing about their recent honeymoon, the father and daughter chatting in the car, the grandparents visiting family for the day. I realise that what we have gathered is not just a linguistic resource but a unique and permanent record of humanity. I am lucky to have had a hand in its creation.

The Spoken BNC2014 will be freely available to search from 25 September at

You can follow Robbie on Twitter @lovermob. The British National Corpus is on Twitter @BNC_2014

One thought on “Not just a linguistic resource but a unique record of humanity

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.