As a small private research, I calculated the frequency of words and expressions and here goes the result.

AA.txt  (4177 entries) - one-word list

BB.txt (4278 entries) - 2+(two or more)word list

CC.txt (3667 entries) - 3+(three or more)word list

The English text is randomly collected from various source. Not very balanced and not very clean... but it is sufficient enough for my research. (430,000 spaced tokens)

Just for fun.

Posted by nomota multilingual

