Difference between revisions of "CSC111 Homework 12"

From dftwiki3
Jump to: navigation, search
(Created page with '==Using bigrams to identify Language== Instead of single word frequency as we did in Lab 12, we'll use this time the frequency of blocks of 2 characters, also …')
 
(References)
Line 25: Line 25:
  
 
* http://en.wikipedia.org/wiki/Bigram
 
* http://en.wikipedia.org/wiki/Bigram
 
+
* http://www.cryptograms.org/letter-frequencies.php
 
<br />
 
<br />
 
<br />
 
<br />

Revision as of 14:59, 21 April 2010

Using bigrams to identify Language

Instead of single word frequency as we did in Lab 12, we'll use this time the frequency of blocks of 2 characters, also called bigrams.

The frequency of bigrams for the English language is tabulated in | Wikipedia:


th 1.52%       en 0.55%       ng 0.18%
he 1.28%       ed 0.53%       of 0.16%
in 0.94%       to 0.52%       al 0.09%
er 0.94%       it 0.50%       de 0.09%
an 0.82%       ou 0.50%       se 0.08%
re 0.68%       ea 0.47%       le 0.08%
nd 0.63%       hi 0.46%       sa 0.06%
at 0.59%       is 0.46%       si 0.05%
on 0.57%       or 0.43%       ar 0.04%
nt 0.56%       ti 0.34%       ve 0.04%
ha 0.56%       as 0.33%       ra 0.04%
es 0.56%       te 0.27%       ld 0.02%
st 0.55%       et 0.19%       ur 0.02%


References