Difference between revisions of "CSC352 Notes 2013"
m (Thiebaut moved page CSC352 Notes to CSC352 Notes 2013) |
|||
Line 126: | Line 126: | ||
* Total size should be 2.310 TB. | * Total size should be 2.310 TB. | ||
+ | |||
+ | =Download the page statistics= | ||
+ | |||
+ | ==Links of Interest== | ||
+ | * http://stats.grok.se/ | ||
+ | * http://stats.grok.se/about | ||
+ | * http://dom.as/ | ||
+ | * http://dumps.wikimedia.org/other/pagecounts-raw/ | ||
+ | * http://dumps.wikimedia.org/other/pagecounts-raw/2013/ | ||
+ | * started downloading all files from above link to hadoop0:/media/dominique/3TB/mediawiki/statistics/ | ||
+ | * wgetStats.sh | ||
+ | #! /bin/bash | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-010000.gz | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-020001.gz | ||
+ | ... | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-210000 | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-220000 | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-230000 | ||
---- | ---- |