Difference between revisions of "The Things They Carried"

From dftwiki3
Jump to: navigation, search
(Most Frequent 2-Grams)
Line 256: Line 256:
 
<br />
 
<br />
 
<br />
 
<br />
[[Category:Data-Visualization]][[Category:Kahn]]
+
[[Category:Data Visualization]][[Category:Kahn Institute]]

Revision as of 07:57, 3 October 2015

--D. Thiebaut (talk) 22:39, 2 October 2015 (EDT)



OBrienTheThingsTheyCarried.png



Source


This section is only visible to computers located at Smith College


Stats


  • 4668 lines
  • 66025 words
  • 382094 characters


Python Program for Word Frequencies


# compute top 100 most frequent word in document
import string


doc = "TheThingsTheyCarried.txt"
stopwords = """a about above across after afterwards again against all almost 
alone along already also although always am among amongst amoungst 
amount an and another any anyhow anyone anything anyway anywhere 
are around as at back be became because become becomes becoming 
been before beforehand behind being below beside besides between 
beyond bill both bottom but by call can cannot cant co computer 
con could couldnt cry de describe detail do done down due during 
each eg eight either eleven else elsewhere empty enough etc even 
ever every everyone everything everywhere except few fifteen 
fify fill find fire first five for former formerly forty found 
four from front full further get give go had has hasnt have he 
hence her here hereafter hereby herein hereupon hers herse" him 
himself his how however hundred i ie if in inc indeed interest 
into is it its itself keep last latter latterly least less ltd 
made many may me meanwhile might mill mine more moreover most 
mostly move much must my myself name namely neither never nevertheless 
next nine no nobody none noone nor not nothing now nowhere of 
off often on once one only onto or other others otherwise our 
ours ourselves out over own part per perhaps please put rather 
re same see seem seemed seeming seems serious several she should 
show side since sincere six sixty so some somehow someone something 
sometime sometimes somewhere still such system take ten than 
that the their them themselves then thence there thereafter thereby 
therefore therein thereupon these they thick thin third this 
those though three through throughout thru thus to together too 
top toward towards twelve twenty two un under until up upon us 
very via was we well were what whatever when whence whenever 
where whereafter whereas whereby wherein whereupon wherever whether 
which while whither who whoever whole whom whose why will with 
within without would yet you your yours yourself yourselves dont 
got just did didnt im
"""
def displayStops():
    s = ""
    for a in stopwords.split():
        s = s+a+" "
        if len( s )>60:
            print(s)
            s = ""
    print( s )


def main():
    global doc, stopwords
    text = open( doc, "r" ).read()

    stopwords = set( stopwords.lower().split() )
    dico = {}
    exclude = set(string.punctuation)
    text = ''.join(ch for ch in text if ch not in exclude)

    for word in text.lower().split():
        if word in stopwords: continue
        try:
            dico[word] += 1
        except:
            dico[word] = 1

    list = []
    for key in dico.keys():
        list.append( (dico[key], key) )

    list.sort()
    list.reverse()
    words = [k for (n,k) in list]
    print( "\n".join( words[0:100] ) )

#displayStops()
main()


100 Most Frequent Words


  1 said                  413
  2 like                  224
  3 war                   181
  4 carried               156
  5 man                   148
  6 rat                   147
  7 things                146
  8 night                 136
  9 time                  119
 10 right                 114
 11 way                   112
 12 kiowa                 109
 13 eyes                  109
 14 away                  105
 15 sanders               104
 16 old                   101
 17 know                   98
 18 field                  97
 19 head                   96
 20 remember               94
 21 say                    92
 22 dead                   90
 23 id                     86
 24 took                   85
 25 tell                   84
 26 story                  82
 27 little                 82
 28 felt                   80
 29 maybe                  77
 30 went                   76
 31 later                  73
 32 cross                  73
 33 azar                   71
 34 himself                70
 35 kept                   69
 36 dark                   69
 37 looked                 68
 38 long                   68
 39 hed                    68
 40 real                   66
 41 mitchell               66
 42 bowker                 65
 43 thought                64
 44 hard                   64
 45 came                   63
 46 river                  62
 47 thing                  61
 48 norman                 61
 49 good                   61
 50 told                   60
 51 thats                  60
 52 make                   60
 53 feel                   60
 54 stories                58
 55 water                  57
 56 new                    55
 57 body                   55
 58 years                  53
 59 wanted                 53
 60 think                  53
 61 place                  53
 62 look                   53
 63 lieutenant             53
 64 day                    53
 65 tried                  52
 66 guys                   52
 67 young                  51
 68 morning                51
 69 men                    51
 70 knew                   51
 71 jimmy                  51
 72 love                   50
 73 kiley                  50
 74 inside                 50
 75 want                   49
 76 sound                  48
 77 fossie                 48
 78 bad                    48
 79 myself                 47
 80 guy                    47
 81 dobbins                47
 82 come                   47
 83 true                   46
 84 face                   46
 85 moved                  45
 86 sure                   44
 87 life                   44
 88 hands                  44
 89 white                  43
 90 rain                   43
 91 talk                   42
 92 stood                  42
 93 shit                   42
 94 mary                   42
 95 lake                   42
 96 kind                   42
 97 high                   42
 98 gone                   42
 99 vietnam                41
100 linda                  41


Word Cloud


100 Most Frequent Words


ThingsTheyCarried.png


10 Most Frequent Words


ThingsTheyCarried10.png


Most Frequent 2-Grams


  • List of 2-grams with a frequency count higher than, or equal to 20.


 65: mitchell - sanders
 51: norman - bowker
 49: rat - kiley
 43: jimmy - cross
 40: things - carried
 34: mary - anne
 32: henry - dobbins
 31: rat - said
 29: shook - head
 28: dave - jensen
 25: old - man
 25: lieutenant - cross
 24: war - story
 24: ted - lavender
 24: sanders - said
 22: lieutenant - jimmy
 21: years - old
 21: curt - lemon
 20: true - war
 20: mark - fossie