What We Wiki
Methodology
The article What we wiki is based on a statistical analysis of of all the pages of the English Wikipedia captured on 04/02/07, and available here.
The whole collection of Wikipedia pages is scanned by a computer program to find the ten most frequent words and ten most frequent double words appearing in each page. We refer to these words and double-words as concepts. Each concept is associated with a counter that records the number of wiki pages in which it appears. The 200 concepts with the highest page count is shown in the table below. United States with a count of 125449 indicates that this concept was one of the ten most frequent double-words in 125,499 pages. Note that United and States as individual words have different counts because they may not make the ten most frequent words of a wiki page while United States might.
Stop Words
The following stop words are filtered out before the statistics are collected:
a about above across after afterwards again against all almost
alone along already also although always am among amongst amoungst amount
an and another any anyhow anyone anything anyway anywhere are around as
at
back be became because become becomes becoming been before beforehand
behind being below beside besides between beyond bill both bottom but by
call can cannot can’t co computer con could couldn’t cry
de describe detail do done down due during
each eg eight either eleven else elsewhere empty enough etc even ever
every everyone everything everywhere except
few fifteen fify fill find fire first five for former formerly forty found four
from front full further
get give go
had has hasn’t have he hence her here hereafter hereby herein hereupon hers
herself him himself his how however hundred
i ie if in inc indeed interest into is it its itself
keep
last latter latterly least less like ltd
made many may me meanwhile might mill mine more moreover most mostly
move much must my myself
name namely neither never nevertheless next nine no nobody none noone
nor not nothing now nowhere
of off often on once one only onto or other others otherwise our ours
ourselves out over own
part per perhaps please put
rather re
s same see seem seemed seeming seems serious several she should
show side since sincere six sixty so some somehow someone something
sometime sometimes somewhere still such system
take ten than that the their them themselves then thence there
thereafter thereby therefore therein thereupon these they thick
thin third this those though three through throughout thru thus
to together too top toward towards twelve twenty two
un under until up upon us
very via
was we well were what whatever when whence whenever where
whereafter whereas whereby wherein whereupon wherever whether
which while whither who whoever whole whom whose why will with
within without would yet you your yours yourself yourselves
References and Bibliography
- Stacy Schiff, “Now It All, Can Wikipedia conquer expertise?”, The New Yorker, July 31, 2006.
- Martin Hepp, Daniel Bachlechner, and Katharina Siorpaes, “Harvesting Wiki Consensus - Using Wikipedia Entries as ontology Elements,” citeseer.ist.psu.edu/747700.html, 2006
- Don Tapscott and Anthony Williams, Wikinomics: How Mass Collaboration Changes Everything. Portfolio, 2006
- Michael Stube, and Simone Paolo Ponzetto, “WikiRelate! Computing Semantic Relatedness Using Wikipedia,” in Proceedings of the 21st National Conference on Artificial Intelligence, Boston, Mass., 16-20 July, 2006, pp. 1419-1424.
Word and Double-Word Ranking
The table below shows the ranking of the most popular words and double-words.
To search the actual database of concept, click here.
Rank | Word/Double-Word | Number of Pages |
---|---|---|
1 | United States | 125449 |
2 | 2 | 109068 |
3 | 1 | 103454 |
4 | Talk | 98000 |
5 | New | 92490 |
6 | 2005 | 83569 |
7 | Album | 73300 |
8 | 0 | 70149 |
9 | 3 | 67870 |
10 | Film | 61778 |
11 | City | 58813 |
12 | New York | 55902 |
13 | Band | 51047 |
14 | Image | 50318 |
15 | 4 | 49927 |
16 | School | 49450 |
17 | County | 49241 |
18 | United | 48213 |
19 | Style | 46666 |
20 | Music | 46057 |
21 | University | 42003 |
22 | Age | 41681 |
23 | Census | 41586 |
24 | States | 40602 |
25 | 5 | 40533 |
26 | American | 39945 |
27 | Football | 39527 |
28 | 18 | 37817 |
29 | 2007 | 36347 |
30 | Game | 35928 |
31 | 2004 | 35882 |
32 | State | 34716 |
33 | Median Income | 34643 |
34 | Series | 33914 |
35 | Party | 33433 |
36 | District | 33239 |
37 | Town | 31940 |
38 | Population | 31791 |
39 | National | 31764 |
40 | South | 30626 |
41 | User | 30547 |
42 | World | 30153 |
43 | Company | 29519 |
44 | Team | 29233 |
45 | Song | 29051 |
46 | John | 28856 |
47 | Station | 28547 |
48 | Area | 28523 |
49 | War | 27221 |
50 | 6 | 27143 |
51 | Language | 26807 |
52 | States Census | 26458 |
53 | British | 26081 |
54 | Debate | 25157 |
55 | Background | 24987 |
56 | High School | 24678 |
57 | List | 24555 |
58 | Book | 24293 |
59 | Width | 24010 |
60 | North | 23614 |
61 | River | 23500 |
62 | League | 23425 |
63 | High | 23382 |
64 | 7 | 23276 |
65 | College | 23259 |
66 | York | 22873 |
67 | Fair Use | 22685 |
68 | World War | 21895 |
69 | Left | 21819 |
70 | Fair | 21744 |
71 | Australia | 21593 |
72 | Village | 21553 |
73 | Church | 21445 |
74 | Club | 20324 |
75 | Group | 20061 |
76 | 2003 | 20014 |
77 | Born | 20011 |
78 | January | 20006 |
79 | August | 19547 |
80 | Episode | 19495 |
81 | English | 19464 |
82 | March | 19461 |
83 | December | 19458 |
84 | Time | 19415 |
85 | 8 | 19252 |
86 | King | 19222 |
87 | 2007 Utc | 19187 |
88 | House | 19171 |
89 | California | 19147 |
90 | People | 19094 |
91 | Class | 19015 |
92 | United Kingdom | 18829 |
93 | General | 18627 |
94 | 0 | 18270 |
95 | Family | 18132 |
96 | 10 | 18087 |
97 | Century | 18068 |
98 | Canada | 17889 |
99 | Season | 17848 |
100 | 2002 | 17768 |
101 | Island | 17585 |
102 | Park | 17584 |
103 | Election | 17385 |
104 | India | 17377 |
105 | Copyright | 17321 |
106 | Age 18 | 17224 |
107 | July | 17200 |
108 | 65 Years | 17101 |
109 | 2000 | 17075 |
110 | Live | 17042 |
111 | Line | 17034 |
112 | 2001 | 16904 |
113 | February | 16900 |
114 | 9 | 16806 |
115 | Air | 16334 |
116 | October | 16169 |
117 | Government | 15977 |
118 | French | 15864 |
119 | Known | 15729 |
120 | East | 15574 |
121 | Television | 15552 |
122 | Com | 15443 |
123 | West | 15420 |
124 | Text | 15399 |
125 | Art | 15368 |
126 | London | 15350 |
127 | Black | 15315 |
128 | F | 15314 |
129 | September | 15169 |
130 | Radio | 15164 |
131 | Border | 15127 |
132 | Tv | 15055 |
133 | France | 15020 |
134 | November | 14952 |
135 | International | 14885 |
136 | Army | 14857 |
137 | April | 14634 |
138 | D | 14438 |
139 | Nbsp | 14313 |
140 | Isbn 0 | 14216 |
141 | Australian | 14192 |
142 | Color | 14180 |
143 | Articles | 14120 |
144 | German | 14040 |
145 | African American | 14026 |
146 | Font | 13918 |
147 | B | 13902 |
148 | Right | 13900 |
149 | Appropriate | 13777 |
150 | Award | 13755 |
151 | La | 13728 |
152 | History | 13666 |
153 | Character | 13574 |
154 | England | 13540 |
155 | Games | 13463 |
156 | Railway | 13459 |
157 | St | 13426 |
158 | Day | 13364 |
159 | Life | 13354 |
160 | Battle | 13347 |
161 | New Zealand | 13225 |
162 | Released | 13133 |
163 | 12 | 13090 |
164 | Best | 13059 |
165 | Road | 13047 |
166 | Guitar | 13020 |
167 | President | 13019 |
168 | Player | 13017 |
169 | Tv Series | 13002 |
170 | Played | 12960 |
171 | Species | 12876 |
172 | Law | 12800 |
173 | Work | 12772 |
174 | Japan | 12694 |
175 | Size | 12669 |
176 | 1999 | 12515 |
177 | Ii | 12405 |
178 | June | 12336 |
179 | 11 | 12321 |
180 | Rock | 12236 |
181 | American U | 12218 |
182 | Lake | 12038 |
183 | Single | 11944 |
184 | Cup | 11908 |
185 | Wikipedia | 11904 |
186 | 20 | 11766 |
187 | San | 11721 |
188 | Year | 11680 |
189 | Los Angeles | 11619 |
190 | Building | 11544 |
191 | Community | 11495 |
192 | Comments | 11422 |
193 | Uk | 11341 |
194 | Notable | 11338 |
195 | Built | 11322 |
196 | Canadian | 11301 |
197 | Route | 11199 |
198 | Www | 11143 |
199 | Township | 11072 |
200 | Located | 11059 |
201 | Students | 11028 |
202 | Public | 10978 |
203 | 100 Females | 9966 |
204 | Prime Minister | 9132 |
205 | New Jersey | 9113 |
206 | 2004 Utc | 8431 |
207 | General Election | 8317 |
208 | War Ii | 8274 |
209 | Jul-06 | 8161 |
210 | South Wales | 7920 |
211 | Aug-06 | 7865 |
212 | Summer Olympics | 7708 |
213 | Jan-07 | 7686 |
214 | San Francisco | 7565 |
215 | Sep-06 | 7477 |
216 | Jan-06 | 7390 |
217 | Jun-06 | 7374 |
218 | Mar-07 | 7345 |
219 | New South | 7307 |
220 | National Football | 7249 |
221 | World Cup | 7223 |
222 | Football Team | 7190 |
223 | Feb-07 | 7180 |
224 | Dec-05 | 7172 |
225 | Dec-06 | 7145 |
226 | Railway Station | 7131 |
227 | Oct-06 | 7069 |
228 | North Carolina | 7027 |
229 | Civil War | 6954 |
230 | Nov-06 | 6862 |
231 | South Africa | 6594 |
232 | Democratic Party | 6160 |
233 | Air Force | 6156 |
234 | Hong Kong | 6132 |
235 | Mar-06 | 5917 |
236 | Apr-06 | 5853 |
237 | Race United | 5804 |
238 | Feb-06 | 5704 |
239 | 0 0 | 5598 |
240 | State University | 5594 |
241 | Video Game | 5587 |
242 | Supreme Court | 5514 |
243 | British Columbia | 5030 |
244 | Style Color | 4988 |
245 | Hip Hop | 4861 |
246 | University Press | 4818 |
247 | Major League | 4814 |
248 | Science Fiction | 4612 |
249 | World Championship | 4611 |
250 | Parliament Constituency | 4605 |
251 | Football League | 4544 |
252 | Roman Catholic | 4432 |
253 | Jul-05 | 4413 |
254 | Soviet Union | 4362 |
255 | Labour Party | 4213 |
256 | Western Australia | 4191 |
257 | Liberal Party | 4123 |
258 | Conservative Party | 4109 |
259 | Native American | 3878 |
260 | San Diego | 3829 |
261 | North America | 3722 |
262 | Star Trek | 3670 |
263 | Northern Ireland | 3613 |
264 | School District | 3609 |
265 | National Park | 3508 |
266 | Grand Prix | 3435 |
267 | Republican Party | 3418 |
268 | State Route | 3398 |
269 | Vice President | 3372 |
270 | League Baseball | 3288 |
This wiki code has been generated with <a href="http://www.ebruni.it/software/os/i_love_wiki/index.mpl">I love wiki"</a>, an HTML to wiki syntax converter