Difference between revisions of "CSC111 Make-up Homework 8 2011"

From dftwiki3
Jump to: navigation, search
(Tricky NYT)
(Your Assignment)
Line 33: Line 33:
 
* Your program cannot crash.  If the input file does not exist, your program will just stop after telling the user that the file is missing.  If  the user enters for the year "two thousand," "2oo1," 20001", or just nothing, your program should ''keep on asking the user for a year'' until the user gets it right.  Then the program displays the titles for that year.
 
* Your program cannot crash.  If the input file does not exist, your program will just stop after telling the user that the file is missing.  If  the user enters for the year "two thousand," "2oo1," 20001", or just nothing, your program should ''keep on asking the user for a year'' until the user gets it right.  Then the program displays the titles for that year.
 
* '''New''': your program will figure out what html code sandwiches the titles, and will use these strings to extract all the title.  The trick for this is to assume that a particular movie is ''so good'', that it will always be in the top 1000 movies.  Then your program can look for that movie, find the sandwiching tags, then use these to extract all the movies.
 
* '''New''': your program will figure out what html code sandwiches the titles, and will use these strings to extract all the title.  The trick for this is to assume that a particular movie is ''so good'', that it will always be in the top 1000 movies.  Then your program can look for that movie, find the sandwiching tags, then use these to extract all the movies.
:We'll assume that that best movies of all time is ''Cat on a Hot Tin Roof (1958)'' (but you can write your program using another one as the title of the best movies of all times).
+
:We'll assume that this best movies of all time is ''Cat on a Hot Tin Roof (1958)'' (but you can write your program using another one as the title of the best movies of all times).
  
 
=Submission=
 
=Submission=

Revision as of 12:46, 22 November 2011

--D. Thiebaut 12:30, 22 November 2011 (EST)


This assignment is optional. If you decide to do it, you can work on it and submit it any time until the last day of class. I will use the highest of the grades you got for Homework 8 and this one as the grade for Homework 8.

This homework has to be done individually, not in pair programming mode.

Its due date is 4:00 p.m. on the last day of class this semester.

Tricky NYT

  • The setup is the same as for Homework 8.
  • However, the NYT, knowing that people have a tendency to lift their Web pages and process them to extract information from them, has decided to regularly change the HTML tags they use to sandwich the movie titles. So, for example, one month the movie titles may appear in the HTML page as follows:
...
<td><a href="http://movies.nytimes.com/movie/358/A-Nous-la-Liberte/overview">A Nous la Liberte (1932)</a></td>
...
and the next month the information may be coded as follows:
...
<td><div number="358" section="title">A Nous la Liberte (1932)</div></td>
...
  • Furthermore, new movies are added to the list regularly, so we can't just create a simple text file with the 1000 titles and use it as the source of information, as the list evolves with time.

Your Assignment

  • Your assignment is the same as for Homework 8. You have a file containing all the titles, coded in HTML, and you ask the user for a year, then print all the titles for that year.
  • Your program cannot crash. If the input file does not exist, your program will just stop after telling the user that the file is missing. If the user enters for the year "two thousand," "2oo1," 20001", or just nothing, your program should keep on asking the user for a year until the user gets it right. Then the program displays the titles for that year.
  • New: your program will figure out what html code sandwiches the titles, and will use these strings to extract all the title. The trick for this is to assume that a particular movie is so good, that it will always be in the top 1000 movies. Then your program can look for that movie, find the sandwiching tags, then use these to extract all the movies.
We'll assume that this best movies of all time is Cat on a Hot Tin Roof (1958) (but you can write your program using another one as the title of the best movies of all times).

Submission

  • Name your program makeup8.py, and submit it as follows:
 rsubmit hw8 makeup8.py