The Things They Carried

From dftwiki3
Revision as of 21:39, 2 October 2015 by Thiebaut (talk | contribs) (Created page with "--~~~~ ---- =Source= <onlysmith> Full text: https://corysnow.files.wordpress.com/2009/12/ttc-full-text.pdf </onlysmith> =Python Program for Word Frequencies= <br /> :::<sourc...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

--D. Thiebaut (talk) 22:39, 2 October 2015 (EDT)


Source


This section is only visible to computers located at Smith College

Python Program for Word Frequencies


# compute top 100 most frequent word in document
import string


doc = "TheThingsTheyCarried.txt"
stopwords = """a about above across after afterwards again against all almost 
alone along already also although always am among amongst amoungst 
amount an and another any anyhow anyone anything anyway anywhere 
are around as at back be became because become becomes becoming 
been before beforehand behind being below beside besides between 
beyond bill both bottom but by call can cannot cant co computer 
con could couldnt cry de describe detail do done down due during 
each eg eight either eleven else elsewhere empty enough etc even 
ever every everyone everything everywhere except few fifteen 
fify fill find fire first five for former formerly forty found 
four from front full further get give go had has hasnt have he 
hence her here hereafter hereby herein hereupon hers herse" him 
himse" his how however hundred i ie if in inc indeed interest 
into is it its itse" keep last latter latterly least less ltd 
made many may me meanwhile might mill mine more moreover most 
mostly move much must my myse" name namely neither never nevertheless 
next nine no nobody none noone nor not nothing now nowhere of 
off often on once one only onto or other others otherwise our 
ours ourselves out over own part per perhaps please put rather 
re same see seem seemed seeming seems serious several she should 
show side since sincere six sixty so some somehow someone something 
sometime sometimes somewhere still such system take ten than 
that the their them themselves then thence there thereafter thereby 
therefore therein thereupon these they thick thin third this 
those though three through throughout thru thus to together too 
top toward towards twelve twenty two un under until up upon us 
very via was we well were what whatever when whence whenever 
where whereafter whereas whereby wherein whereupon wherever whether 
which while whither who whoever whole whom whose why will with 
within without would yet you your yours yourself yourselves dont 
got just did didnt im
"""
def displayStops():
    s = ""
    for a in stopwords.split():
        s = s+a+" "
        if len( s )>60:
            print(s)
            s = ""
    print( s )


def main():
    global doc, stopwords
    text = open( doc, "r" ).read()

    stopwords = set( stopwords.lower().split() )
    dico = {}
    exclude = set(string.punctuation)
    text = ''.join(ch for ch in text if ch not in exclude)

    for word in text.lower().split():
        if word in stopwords: continue
        try:
            dico[word] += 1
        except:
            dico[word] = 1

    list = []
    for key in dico.keys():
        list.append( (dico[key], key) )

    list.sort()
    list.reverse()
    words = [k for (n,k) in list]
    print( "\n".join( words[0:100] ) )

#displayStops()
main()


100 Most Frequent Words


said
like
war
carried
man
rat
things
night
time
right
way
kiowa
eyes
away
sanders
old
know
field
head
remember
say
dead
id
took
tell
story
little
felt
maybe
went
wasnt
later
cross
azar
himself
kept
dark
looked
long
hed
real
mitchell
bowker
thought
hard
came
river
thing
norman
good
told
thats
make
feel
stories
water
new
body
years
wanted
think
place
look
lieutenant
day
tried
guys
young
morning
men
knew
jimmy
love
kiley
inside
want
sound
fossie
bad
myself
guy
dobbins
come
true
face
moved
sure
life
hands
white
rain
talk
stood
shit
mary
lake
kind
high
gone
vietnam