CSC111 Homework 8 2011

From dftwiki3
Jump to: navigation, search

--D. Thiebaut 20:23, 8 November 2011 (EST)



This assignment is due on 11/15/11 evening, at midnight. It will be easiest for you to work on this assignment on beowulf, and not on your laptop/desktop.


Problem: Working with Files

  getcopy 1000best.html
  • Check the contents of your directory with ls: you should see the new file. It is fairly large, and contains roughly 320,000 characters.

Your assignment

  • Write a program called hw8.py that will
    • ask the user for the year she was born in (or some other year),
    • open the html file, read it, and print out on the screen all the movies that came out in that year, sorted in alphabetical order.
    • store this sorted list to a text file, called movies_nnnn.txt, where nnnn will be the year selected by the user.

Example

  • The user input is underlined:
python3.2 hw8.py
 
Selecting movies for which year? 1933
 
Movies that came out in 1933:

Cavalcade  
Dinner at Eight 
Duck Soup 
King Kong 
Little Women 
State Fair 
The Private Life of Henry VIII 
Zero for Conduct 

Saving movies to file movies_1933.txt
  • At the end of the program a new file will be in the current directory, and its name will be movies_1933.txt

Helpful Hints

HTML format

  • You will find out by looking at the raw html code of the page that it is filled with html tags. The list of movies is embedded is not easy to find. Here is the beginning of the list, as it appears in the file:
<td><a href="http://movies.nytimes.com/movie/358/A-Nous-la-Liberte/overview">A Nous la Liberte (1932)</a></td>
<td class="smartlink"><a href="#" bluekey="Up4yHhM70NjGxfSdSLf9%22oMVD7MemyJIdFp322ed7EZAz4JqryOohTYaqwjSxfhSx5w8Mb"></a></td>
</tr>
<tr>
<td><a href="http://movies.nytimes.com/movie/265451/About-Schmidt/overview">About Schmidt (2002)</a></td>
<td class="smartlink"><a href="#" bluekey="Up4yHhM70NjGxfSdSLf9%22oMVD7MemyJIsQC3SYeixIJyRSZthWCkF5hvd6PUcWfJl5w46Tl"></a></td>
</tr>
  • your code will have to find the movies by locking on the different tags, and extracting the movie title from the surrounding html code.

The String find() method

  • the find() method is described fully in the python doc: http://docs.python.org/py3k/library/stdtypes.html#string-methods
  • The find() method accepts a second argument, which is optional. This argument is the location in the string where the searching must start. This is useful if you want to start searching not from the beginning of the string, but from a different place.
  • Example


text = """
age: 35    value: 345,
age: 77    value:   1,
age: 23    value:  16,
age:  3    value:  -1"""

# display all the ages
start = 0

while True: # we create an endless loop
    index = text.find( "age:", start )

    # found a new "age:" string?
    if index == -1:
        # not found
        break
    
    # we advance start to just past where we found the "age:" string
    # and we take a slice of 4 characters that will contain the age.
    start = index + len( "age:" )
    print( text[ start: start+4 ] )


  • Figure out how the code works. It should help you solve this homework...