Difference between revisions of "CSC111 Homework 12"
(Created page with '==Using bigrams to identify Language== Instead of single word frequency as we did in Lab 12, we'll use this time the frequency of blocks of 2 characters, also …') |
(→References) |
||
Line 25: | Line 25: | ||
* http://en.wikipedia.org/wiki/Bigram | * http://en.wikipedia.org/wiki/Bigram | ||
− | + | * http://www.cryptograms.org/letter-frequencies.php | |
<br /> | <br /> | ||
<br /> | <br /> |
Revision as of 14:59, 21 April 2010
Using bigrams to identify Language
Instead of single word frequency as we did in Lab 12, we'll use this time the frequency of blocks of 2 characters, also called bigrams.
The frequency of bigrams for the English language is tabulated in | Wikipedia:
th 1.52% en 0.55% ng 0.18% he 1.28% ed 0.53% of 0.16% in 0.94% to 0.52% al 0.09% er 0.94% it 0.50% de 0.09% an 0.82% ou 0.50% se 0.08% re 0.68% ea 0.47% le 0.08% nd 0.63% hi 0.46% sa 0.06% at 0.59% is 0.46% si 0.05% on 0.57% or 0.43% ar 0.04% nt 0.56% ti 0.34% ve 0.04% ha 0.56% as 0.33% ra 0.04% es 0.56% te 0.27% ld 0.02% st 0.55% et 0.19% ur 0.02%
References