A collection of Chinese corpora and frequency lists frequencies in LCMC.
Note that tokenisation of texts into words follows the rules used in each corpus. Sometimes the results of tokenisation are not compatible, while some "words" in the frequency list of the Internet corpus can be parts of "real" Chinese words. Chinese learners frequently ask about the frequency of individual characters (as this helps to order them in a reasonable sequence for learning). Numerous lists of common characters are available in various dictionaries (Oxford Dictionary, Wenlin or various online sources). They are burberry mall often taken as the absolute, while they obviously depend on the corpus (the list in the Oxford Dictionary, for example, is skewed towards newspaper texts). The Chinese Internet corpus is a snapshot of the Chinese Web from 2005. The frequency list of characters coming burberry women outlet from it might be more general (though still not ideal). The list of characters is available from here. The first column is the rank, the second one is the frequency, which has been normalised per million characters. The three corpora listed above are: Chinese Internet Corpus, 280 million words (tokens). This corpus burberry shirt outlet has been compiled by Serge Sharoff from the Internet in February 2005 along with other Internet corpora (for English, German and Russian).
The Lancaster Corpus of Mandarin Chinese, created by Richard Xiao and Tony McEnery Chinese Business Corpus, 30 million words (tokens). This corpus has been compiled what stores carry burberry by Serge Sharoff from the Internet in 2008 along with other business corpora (for English and Russian).
Prev: burberry accessories
Next: buy burberry clothes online