Difference between revisions of "CSC352 Homework 2"

From dftwiki3
Jump to: navigation, search
(Problem #2)
(Problem #2)
Line 38: Line 38:
 
First, Google.  Or Yahoo.  The problem is that if we start writing/debugging a program that probes Google or Yahoo constantly, they will soon discover the non-human nature of the probe and will block the IP address of where the requests come from.  So, instead, I have created a mini repository of text files, about 60,000 of them, taken from public Web sites, and I indexed all of them with [http://swish-e.org swish-e], and put them on one of my computers.
 
First, Google.  Or Yahoo.  The problem is that if we start writing/debugging a program that probes Google or Yahoo constantly, they will soon discover the non-human nature of the probe and will block the IP address of where the requests come from.  So, instead, I have created a mini repository of text files, about 60,000 of them, taken from public Web sites, and I indexed all of them with [http://swish-e.org swish-e], and put them on one of my computers.
  
 +
<p>
 
<onlysmith>
 
<onlysmith>
 
     http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=valentine
 
     http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=valentine
 
</onlysmith>
 
</onlysmith>
 +
<p>
  
 
There is no Web page with a form to allow you to search.  Instead you put the word or words of interest at the end of the URL, plug the URL into your browser, and ''voila''!  Swish-e will return to you a list of 20 (this can be changed if we want) files that it has indexed that best match the search terms.   
 
There is no Web page with a form to allow you to search.  Instead you put the word or words of interest at the end of the URL, plug the URL into your browser, and ''voila''!  Swish-e will return to you a list of 20 (this can be changed if we want) files that it has indexed that best match the search terms.   

Revision as of 18:27, 11 February 2010

This assignment is due on 2/18/10.




Problem #1

Take the multiprocessing version of the N-Queens program we saw in class and fix it. It is currently inefficient, as we saw in class. In particular, it does not need the foundOne boolean, and it has to wait for all the child processes to finish before it can finish executing. Instead, change the code so that all the processes are "told" to stop when one solution is found. This way the parent process will not have to wait around.

Once your program works, measure its best of four execution times and plot it for N=10 to 25. In other words, run it 4 times for N=10, and keep the best time. 4 times for N=11 and keep the best time, etc. Plot the best times obtained.

You may find the following page useful for generating the list of best times.

Compare these execution times to those you obtained for your improved threaded versions.

Put the listing and your comments in a pdf and submit it in your CSC352 account as follows:

   submit hw2 hw2a.pdf







Problem #2

This problem will get you started on Project #1 for this class.

The main idea is that every so often, you might be in a situation where given some information, you would like to have available the most pertinent Web pages dealing with that information. I don't mean that you just go to Google and search for some keyword of interest. For example, recently the blogs and news outlets started talking about Google Buzz, an attempt by Google to enter the world of social networking. Wouldn't it be nice if we had a system that would check the most frequent words appearing in some technology-related blogs, and collect automatically the Web pages in Google that talk mostly of Google Buzz, and make a list of them?

That's what we are going to work on. <P> But we'll start small. <P> First, Google. Or Yahoo. The problem is that if we start writing/debugging a program that probes Google or Yahoo constantly, they will soon discover the non-human nature of the probe and will block the IP address of where the requests come from. So, instead, I have created a mini repository of text files, about 60,000 of them, taken from public Web sites, and I indexed all of them with swish-e, and put them on one of my computers. <p>

This section is only visible to computers located at Smith College

<p>

There is no Web page with a form to allow you to search. Instead you put the word or words of interest at the end of the URL, plug the URL into your browser, and voila! Swish-e will return to you a list of 20 (this can be changed if we want) files that it has indexed that best match the search terms.