Friday, April 25, 2014

A table of 5000 most frequent Thai words from Thai National Corpus

I just stumbled into this.

Thai National Corpus (TNC) has a spreadsheet file listing 5000 most frequent 'Thai' words found from sampled texts, downloadable here.

I noticed that some of those are transcribed words from other languages. Also some are not words, but symbols typical to only Thai language, such as  ' ๆ ' , which is a stylized Thai number 2  ' ๒ ' which is used as a 'say the word twice' symbol, and ' ฯ '  which is a symbol to denote that the word (usually a noun) has been truncated from its full name. Some are just numbers, such as 1 2 3 and their corresponding Thai numbers, listing separately.

