Perdida

British National Corpus

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus (collection of texts) in the field of corpus linguistics. The corpus covers British English of the late twentieth century from a wide variety of genres with the intention that it be a representative sample of spoken and written British English of that time.

Of the two parts to the 10-million word spoken corpus, one is a demographic part, containing transcriptions of spontaneous natural conversations made by members of the public and the other a context-governed part, containing transcriptions of recordings made at specific types of meeting and event. All the original recordings transcribed for inclusion in the BNC have been deposited at the British Library Sound Archive.

The corpus is marked up following the recommendations of the Text Encoding Initiative and includes full linguistic annotation and contextual information. The most recent edition, from March 2007, is distributed in XML format along with the Xaira software. It is freely available under a licence and is very widely distributed.

Uses of the BNC

The purpose of a language corpus is to provide language workers with evidence of how language is really used, evidence that can then be used to inform and substantiate individual theories about what words might or should mean. Traditional grammars and dictionaries tell us what a word ought to mean, but only experience can tell us what a word is used to mean. This is why dictionary publishers, grammar writers, language teachers, and developers of natural language processing software alike have been turning to corpus evidence as a means of extending and organizing that experience.

With the development of computing technology able to store and handle massive amounts of linguistic evidence, it has become possible to base linguistic judgment on something far greater and far more varied than any one individual’s personal experience or intuitions. The British National Corpus (BNC) was created in order to offer that possibility to the widest variety of researchers, scholars, teachers, and language enthusiasts

Ultimately, its use is limited only by our imagination; if you have any need for up to 100 million words of modern British English, you can make use of the British National Corpus.

The main uses of the corpus, are as follows:

  • Reference Book Publishing

Dictionaries, grammar books, teaching materials, usage guides, thesauri. Increasingly, publishers are referring to the use they make of corpus facilities: it’s important to know how well their corpora are planned and constructed.Linguistic Research

  • Raw data for studying lexis, syntax, morphology, semantics, discourse analysis, stylistics, sociolinguistics…
  • Artificial Intelligence

Extensive data test bed for program development.

  • Natural language processing

Taggers, parsers, natural language understanding programs, spell checking word lists…

  • English Language Teaching

Syllabus and materials design, classroom reference, independent learner research

Sources:

Deja un comentario

Fill in your details below or click an icon to log in:

Logo de WordPress.com

You are commenting using your WordPress.com account. Log Out / Cambiar )

Twitter picture

You are commenting using your Twitter account. Log Out / Cambiar )

Facebook photo

You are commenting using your Facebook account. Log Out / Cambiar )

Connecting to %s

del.icio.us

Categorías

 

mayo 2012
L M X J V S D
« jun    
 123456
78910111213
14151617181920
21222324252627
28293031  

Twitter

Error: Please make sure the Twitter account is public.

Estadisticas del Blog

  • 81,179 hits
Seguir

Get every new post delivered to your Inbox.