Difference between revisions of "CSC352 Homework 4 2013"

From dftwiki3
Jump to: navigation, search
(Computing Image-Access Statistics)
Line 9: Line 9:
 
</bluebox>
 
</bluebox>
  
 +
<br />
 +
<br />
 
=Computing Image-Access Statistics=
 
=Computing Image-Access Statistics=
 +
<br />
  
 
First, you should take a look at the information in the [http://cs.smith.edu/classwiki/index.php/CSC352_Project:_Image_Repository_2013 Class Wiki]] on the image repository and how to gather important files for us.
 
First, you should take a look at the information in the [http://cs.smith.edu/classwiki/index.php/CSC352_Project:_Image_Repository_2013 Class Wiki]] on the image repository and how to gather important files for us.

Revision as of 14:42, 21 October 2013

--D. Thiebaut (talk) 14:19, 21 October 2013 (EDT)




This homework deals with writing code in C and becoming proficient at processing text using C, which requires some pointer operations. It also contributes some important functionality to our project. The homework is due on the 31st of Oct., at 11:59 p.m. You may work in pair on this homework, or by yourself.



Computing Image-Access Statistics


First, you should take a look at the information in the Class Wiki] on the image repository and how to gather important files for us.

Of interest to us are the files that contain access statistics for all pages and files in the Mediawiki projects. They are available for free download from dumps.wikipedia.org. Note that wikipedia is just one of the projects supported by Mediawiki, which is the foundation that hosts wikipedia. There are many other projects. This is important because the pages of statistics contain frequency for many projects, and not just wikipedia.

To get a feel for what they contain, download a couple of them.

  • Login to beowulf or beowulf2 (or work on your mac)
  • type the following commands (user input in bold face):
 cd
 mkdir temp
 cd temp
 wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz
 --2013-10-21 14:41:02--  http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz
 Resolving dumps.wikimedia.org... 208.80.152.185
 Connecting to dumps.wikimedia.org|208.80.152.185|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 80093452 (76M) [application/x-gzip]
 Saving to: `pagecounts-20130101-000000.gz'

 100%  [=========================...=============================================>] 80,093,452  6.89M/s   in 8.9s    
 
 2013-10-21 14:41:12 (8.59 MB/s) - `pagecounts-20130101-000000.gz' saved [80093452/80093452]