Letter frequencies

The frequency of letters in text messages has often been studied for use in cryptography, and frequency analysis in particular. An exact analysis of this is not possible, as each person writes slightly differently; however, an approximate ordering of English letters by frequency of use is ETAOIN SHRDL UCMFG YPWBV KXJQZ.

An analysis based on all the words in the Cambridge Encyclopedia gave a word frequency list quite unlike that which shows up in most lists. From most common to least common, it gave EATIN ORSLH DCMUF PGBYW VKXJZQ. Note that more As appeared than Ts. The author stated that the variance from standard lists could be due to the many foreign words often repeated within articles. Note, too, that the frequency of X is greater in this work than that of J.

This brings up an interesting point. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. You cannot talk about x-rays without using frequent Xs, and you cannot use any letter if it is broken on your keyboard. Letter, digraph, trigraph and word frequencies can be used to prove or disprove authorship of long texts. Things like average word and sentence length are also used. Everyone writes differently – Hemingway is not Faulkner, and so on. A precise average usage could only be gleaned by analyzing usage in, say, a number of different chat rooms, or, say, by covertly checking e-mail, or something of that order using a huge mass of differing inputs.

Image:Stop hand.png The factual accuracy of this article is disputed.
Please see the relevant discussion on the talk page.


Contents

Relative frequencies of letters

Image:English-slf.png
Relative frequencies of letters in text.
Image:English-slf2.PNG
Relative frequencies ordered by frequency.
By letter By frequency
Letter Frequency Letter Frequency
a0.08167e0.12702
b0.01492t0.09056
c0.02782a0.08167
d0.04253o0.07507
e0.12702i0.06966
f0.02228n0.06749
g0.02015s0.06327
h0.06094h0.06094
i0.06966r0.05987
j0.00153d0.04253
k0.00772l0.04025
l0.04025c0.02782
m0.02406u0.02758
n0.06749m0.02406
o0.07507w0.02360
p0.01929f0.02228
q0.00095g0.02015
r0.05987y0.01974
s0.06327p0.01929
t0.09056b0.01492
u0.02758v0.00978
v0.00978k0.00772
w0.02360j0.00153
x0.00150x0.00150
y0.01974q0.00095
z0.00074z0.00074

Top 10 beginning of word letters

LetterFrequency
t0.1594
a 0.155
i 0.0823
s 0.0775
o 0.0712
c 0.0597
m 0.0426
f 0.0408
p 0.040
w 0.0382

Top 10 end of word letters

LetterFrequency
e0.1917
s 0.1435
d 0.0923
t 0.0864
n 0.0786
y 0.0730
r 0.0693
o 0.0467
l 0.0456
f 0.0408

Most common digrams (in order)

th, he, in, en, nt, re, er, an, ti, es, on, at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em, ro.

Most common trigrams (in order)

the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men

See also

External link

es:Frecuencia de aparición de letras fr:Fréquence d'apparition des lettres en français