Difference between revisions of "CSC352 Homework 4 2013"
(Created page with "--~~~~ ---- <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Category:CSC352Category:CCategory:MPI") |
|||
Line 3: | Line 3: | ||
+ | <br /> | ||
+ | <bluebox> | ||
+ | This homework deals with writing code in C and becoming proficient at processing text using C, which requires some pointer operations. It also contributes some important functionality to our project. | ||
+ | The homework is due on the 31st of Oct., at 11:59 p.m. You may work in pair on this homework, or by yourself. | ||
+ | </bluebox> | ||
+ | =Computing Image-Access Statistics= | ||
+ | First, you should take a look at the information in the [http://cs.smith.edu/classwiki/index.php/CSC352_Project:_Image_Repository_2013 Class Wiki]] on the image repository and how to gather important files for us. | ||
+ | Of interest to us are the files that contain access statistics for all pages and files in the Mediawiki projects. '''They are available for free download from [http://dumps.wikipedia.org/ dumps.wikipedia.org]'''. Note that '''wikipedia''' is just one of the projects supported by '''Mediawiki''', which is the foundation that hosts wikipedia. There are many other projects. This is important because the pages of statistics contain frequency for many projects, and not just wikipedia. | ||
+ | |||
+ | To get a feel for what they contain, download a couple of them. | ||
+ | |||
+ | * Login to beowulf or beowulf2 (or work on your mac) | ||
+ | * type the following commands (user input in bold face): | ||
+ | |||
+ | '''cd''' | ||
+ | '''mkdir temp''' | ||
+ | '''cd temp''' | ||
+ | '''wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz''' | ||
+ | --2013-10-21 14:41:02-- http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz | ||
+ | Resolving dumps.wikimedia.org... 208.80.152.185 | ||
+ | Connecting to dumps.wikimedia.org|208.80.152.185|:80... connected. | ||
+ | HTTP request sent, awaiting response... 200 OK | ||
+ | Length: 80093452 (76M) [application/x-gzip] | ||
+ | Saving to: `pagecounts-20130101-000000.gz' | ||
+ | |||
+ | 100% [=========================...=============================================>] 80,093,452 6.89M/s in 8.9s | ||
+ | |||
+ | 2013-10-21 14:41:12 (8.59 MB/s) - `pagecounts-20130101-000000.gz' saved [80093452/80093452] | ||
+ | |||
<br /> | <br /> |
Revision as of 14:42, 21 October 2013
--D. Thiebaut (talk) 14:19, 21 October 2013 (EDT)
This homework deals with writing code in C and becoming proficient at processing text using C, which requires some pointer operations. It also contributes some important functionality to our project. The homework is due on the 31st of Oct., at 11:59 p.m. You may work in pair on this homework, or by yourself.
Computing Image-Access Statistics
First, you should take a look at the information in the Class Wiki] on the image repository and how to gather important files for us.
Of interest to us are the files that contain access statistics for all pages and files in the Mediawiki projects. They are available for free download from dumps.wikipedia.org. Note that wikipedia is just one of the projects supported by Mediawiki, which is the foundation that hosts wikipedia. There are many other projects. This is important because the pages of statistics contain frequency for many projects, and not just wikipedia.
To get a feel for what they contain, download a couple of them.
- Login to beowulf or beowulf2 (or work on your mac)
- type the following commands (user input in bold face):
cd mkdir temp cd temp wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz --2013-10-21 14:41:02-- http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz Resolving dumps.wikimedia.org... 208.80.152.185 Connecting to dumps.wikimedia.org|208.80.152.185|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 80093452 (76M) [application/x-gzip] Saving to: `pagecounts-20130101-000000.gz' 100% [=========================...=============================================>] 80,093,452 6.89M/s in 8.9s 2013-10-21 14:41:12 (8.59 MB/s) - `pagecounts-20130101-000000.gz' saved [80093452/80093452]