The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus (collection of texts) in the field of corpus linguistics. The corpus covers British English of the late twentieth century from a wide variety of genres with the intention that it be a representative sample of spoken and written British English of that time.
Of the two parts to the 10-million word spoken corpus, one is a demographic part, containing transcriptions of spontaneous natural conversations made by members of the public and the other a context-governed part, containing transcriptions of recordings made at specific types of meeting and event. All the original recordings transcribed for inclusion in the BNC have been deposited at the British Library Sound Archive.
The corpus is marked up following the recommendations of the Text Encoding Initiative and includes full linguistic annotation and contextual information. The most recent edition, from March 2007, is distributed in XML format along with the Xaira software. It is freely available under a licence and is very widely distributed.
The purpose of a language corpus is to provide language workers with evidence of how language is really used, evidence that can then be used to inform and substantiate individual theories about what words might or should mean. Traditional grammars and dictionaries tell us what a word ought to mean, but only experience can tell us what a word is used to mean. This is why dictionary publishers, grammar writers, language teachers, and developers of natural language processing software alike have been turning to corpus evidence as a means of extending and organizing that experience.
With the development of computing technology able to store and handle massive amounts of linguistic evidence, it has become possible to base linguistic judgment on something far greater and far more varied than any one individual’s personal experience or intuitions. The British National Corpus (BNC) was created in order to offer that possibility to the widest variety of researchers, scholars, teachers, and language enthusiasts
Ultimately, its use is limited only by our imagination; if you have any need for up to 100 million words of modern British English, you can make use of the British National Corpus.
The main uses of the corpus, are as follows:
Dictionaries, grammar books, teaching materials, usage guides, thesauri. Increasingly, publishers are referring to the use they make of corpus facilities: it’s important to know how well their corpora are planned and constructed.Linguistic Research
Extensive data test bed for program development.
Taggers, parsers, natural language understanding programs, spell checking word lists…
Syllabus and materials design, classroom reference, independent learner research
Comentarios recientes