Difference between revisions of "The Things They Carried"
(→100 Most Frequent Words) |
(→100 Most Frequent Words) |
||
Line 88: | Line 88: | ||
=100 Most Frequent Words= | =100 Most Frequent Words= | ||
<br /> | <br /> | ||
− | :::<source lang="text"> | + | :::<source lang="text" highlight="3"> |
said 413 | said 413 | ||
like 224 | like 224 | ||
Line 191: | Line 191: | ||
</source> | </source> | ||
<br /> | <br /> | ||
+ | |||
=Word Cloud= | =Word Cloud= | ||
<br /> | <br /> | ||
<center>[[Image:ThingsTheyCarried.png]]</center> | <center>[[Image:ThingsTheyCarried.png]]</center> | ||
<br /> | <br /> |
Revision as of 22:14, 2 October 2015
--D. Thiebaut (talk) 22:39, 2 October 2015 (EDT)
Source
Python Program for Word Frequencies
# compute top 100 most frequent word in document import string doc = "TheThingsTheyCarried.txt" stopwords = """a about above across after afterwards again against all almost alone along already also although always am among amongst amoungst amount an and another any anyhow anyone anything anyway anywhere are around as at back be became because become becomes becoming been before beforehand behind being below beside besides between beyond bill both bottom but by call can cannot cant co computer con could couldnt cry de describe detail do done down due during each eg eight either eleven else elsewhere empty enough etc even ever every everyone everything everywhere except few fifteen fify fill find fire first five for former formerly forty found four from front full further get give go had has hasnt have he hence her here hereafter hereby herein hereupon hers herse" him himse" his how however hundred i ie if in inc indeed interest into is it its itse" keep last latter latterly least less ltd made many may me meanwhile might mill mine more moreover most mostly move much must my myse" name namely neither never nevertheless next nine no nobody none noone nor not nothing now nowhere of off often on once one only onto or other others otherwise our ours ourselves out over own part per perhaps please put rather re same see seem seemed seeming seems serious several she should show side since sincere six sixty so some somehow someone something sometime sometimes somewhere still such system take ten than that the their them themselves then thence there thereafter thereby therefore therein thereupon these they thick thin third this those though three through throughout thru thus to together too top toward towards twelve twenty two un under until up upon us very via was we well were what whatever when whence whenever where whereafter whereas whereby wherein whereupon wherever whether which while whither who whoever whole whom whose why will with within without would yet you your yours yourself yourselves dont got just did didnt im """ def displayStops(): s = "" for a in stopwords.split(): s = s+a+" " if len( s )>60: print(s) s = "" print( s ) def main(): global doc, stopwords text = open( doc, "r" ).read() stopwords = set( stopwords.lower().split() ) dico = {} exclude = set(string.punctuation) text = ''.join(ch for ch in text if ch not in exclude) for word in text.lower().split(): if word in stopwords: continue try: dico[word] += 1 except: dico[word] = 1 list = [] for key in dico.keys(): list.append( (dico[key], key) ) list.sort() list.reverse() words = [k for (n,k) in list] print( "\n".join( words[0:100] ) ) #displayStops() main()
100 Most Frequent Words
said 413 like 224 war 181 carried 156 man 148 rat 147 things 146 night 136 time 119 right 114 way 112 kiowa 109 eyes 109 away 105 sanders 104 old 101 know 98 field 97 head 96 remember 94 say 92 dead 90 id 86 took 85 tell 84 story 82 little 82 felt 80 maybe 77 went 76 later 73 cross 73 azar 71 himself 70 kept 69 dark 69 looked 68 long 68 hed 68 real 66 mitchell 66 bowker 65 thought 64 hard 64 came 63 river 62 thing 61 norman 61 good 61 told 60 thats 60 make 60 feel 60 stories 58 water 57 new 55 body 55 years 53 wanted 53 think 53 place 53 look 53 lieutenant 53 day 53 tried 52 guys 52 young 51 morning 51 men 51 knew 51 jimmy 51 love 50 kiley 50 inside 50 want 49 sound 48 fossie 48 bad 48 myself 47 guy 47 dobbins 47 come 47 true 46 face 46 moved 45 sure 44 life 44 hands 44 white 43 rain 43 talk 42 stood 42 shit 42 mary 42 lake 42 kind 42 high 42 gone 42 vietnam 41 linda 41
Word Cloud