I find a curiosity about BNC.
The British National Corpus have a serial of pruducts. One very important is the BNC Baby. It consists of four one-million word samples, each compiled as an example of a particular genre: fiction, newspapers, academic writing and spoken conversation. The texts have the same annotation as the full corpus (part of speech, meta data, etc).