--D. Thiebaut 20:23, 8 November 2011 (EST)

This assignment is due on 11/15/11 evening, at midnight. It will be easiest for you to work on this assignment on beowulf, and not on your laptop/desktop.

Problem: Working with Files

Check this page out: http://www.nytimes.com/ref/movies/1000best.html
(If this page is not available, you can see a cached copy of it on beowulf: http://maven.smith.edu/~111a/1000best.html )
It contains, according to the New York Times, the 1000 best movies ever made.
The whole Web page is available for you to download to your 111a-xx account. All you need to do is login to your 111a-xx account, and type the following command at the prompt:

  getcopy 1000best.html

Check the contents of your directory with ls: you should see the new file. It is fairly large, and contains roughly 320,000 characters.

Your assignment

Write a program called hw8.py that will
- ask the user for the year she was born in (or some other year),
- open the html file, read it, and output all the movies that came out in that year, sorted in alphabetical order.
- store this list to a text file, called movies_nnnn.txt, where nnnn will be the year selected by the user.

Example

The user input is underlined:

python3.2 hw8.py
 
Selecting movies for which year? 1933
 
Movies that came out in 1933:

Cavalcade  
Dinner at Eight 
Duck Soup 
King Kong 
Little Women 
State Fair 
The Private Life of Henry VIII 
Zero for Conduct 

Saving movies to file movies_1933.txt

At the end of the program a new file will be in the current directory, and its name will be movies_1933.txt

Helpful Hints

HTML format

You will find out by looking at the raw html code of the page that it is filled with html tags. The list of movies is embedded is not easy to find. Here is the beginning of the list, as it appears in the file:

<td><a href="http://movies.nytimes.com/movie/358/A-Nous-la-Liberte/overview">A Nous la Liberte (1932)</a></td>
<td class="smartlink"><a href="#" bluekey="Up4yHhM70NjGxfSdSLf9%22oMVD7MemyJIdFp322ed7EZAz4JqryOohTYaqwjSxfhSx5w8Mb"></a></td>
</tr>
<tr>
<td><a href="http://movies.nytimes.com/movie/265451/About-Schmidt/overview">About Schmidt (2002)</a></td>
<td class="smartlink"><a href="#" bluekey="Up4yHhM70NjGxfSdSLf9%22oMVD7MemyJIsQC3SYeixIJyRSZthWCkF5hvd6PUcWfJl5w46Tl"></a></td>
</tr>

your code will have to find the movies by locking on the different tags, and extracting the movie title from the surrounding html code.

The String find() method

the find() method is described fully in the python doc: http://docs.python.org/py3k/library/stdtypes.html#string-methods
The find() method accepts a second argument, which is optional. This argument is the location in the string where the searching must start. This is useful if you want to start searching not from the beginning of the string, but from a different place.
Example

text = """
age: 35    value: 345,
age: 77    value:   1,
age: 23    value:  16,
age:  3    value:  -1"""

# display all the ages
start = 0

while True: # we create an endless loop
    index = text.find( "age:", start )

    # found a new "age:" string?
    if index == -1:
        # not found
        break
    
    # we advance start to just past where we found the "age:" string
    # and we take a slice of 4 characters that will contain the age.
    start = index + len( "age:" )
    print( text[ start: start+4 ] )

Figure out how the code works. It should help you solve this homework...

CSC111 Homework 8 2011

Contents

Problem: Working with Files

Your assignment

Example

Helpful Hints

HTML format

The String find() method

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools