Difference between revisions of "CSC352 Notes 2013"
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<onlydft> | <onlydft> | ||
+ | <br /> | ||
+ | <center> | ||
+ | <font color="red"> | ||
+ | '''See [[CSC352 2017 DT's Notes| this page]] for 2017 updated notes''' | ||
+ | </font> | ||
+ | </center> | ||
+ | <br /> | ||
+ | __TOC__ | ||
+ | <br /> | ||
+ | =Resources 2013= | ||
+ | ==Rocco's Presentation 10/10/13== | ||
+ | * libguides.smith.edu/content.php?pid=510405 | ||
+ | * idea: | ||
+ | ** for paper, start getting the thread, collage, packing, parallel image processing. | ||
+ | ** approaches. | ||
+ | ** intro: what has been done in the field | ||
+ | * Citation database: Web of Science | ||
+ | * Ref Works & Zotero can help maintain citations | ||
+ | * 5-College catalogs | ||
+ | * Worldcat is the world catalog for books | ||
+ | * Web of Science: can get information on references and also who's publishing in the field or which institutions are publishing in the given area. | ||
+ | * Discover searches other databases. | ||
+ | * Library Guide (Albany), super guide for libraries. | ||
+ | * [http://VideoLectures.net videolectures.net] | ||
+ | |||
+ | <br /> | ||
+ | ==Hadoop== | ||
+ | <br /> | ||
+ | * [[CSC352_MapReduce/Hadoop_Class_Notes | DT's Class notes on Hadoop/MapReduce]] | ||
+ | * [http://www.umiacs.umd.edu/~jimmylin/cloud-2008-Fall/index.html Cloud Computing notes from UMD] (2008, old) | ||
− | |||
==On-Line== | ==On-Line== | ||
Line 73: | Line 102: | ||
* [[http://cs.smith.edu/dftwiki/images/BerkeleyBootCampAug2013_DFTNotes.pdf | My notes from the Berkeley Boot-Camp on-line Workshop]], Aug 2013. | * [[http://cs.smith.edu/dftwiki/images/BerkeleyBootCampAug2013_DFTNotes.pdf | My notes from the Berkeley Boot-Camp on-line Workshop]], Aug 2013. | ||
+ | =Update 2015: Downloading images to Hadoop0= | ||
+ | <br /> | ||
+ | * Rsync from http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/en/xxx where xxx is 0, 1, 2, ... 9, a, b, c, d, e, f. | ||
+ | <br /> | ||
=Downloading All Wikipedia Images= | =Downloading All Wikipedia Images= | ||
* From [http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia] | * From [http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia] | ||
Line 126: | Line 159: | ||
* Total size should be 2.310 TB. | * Total size should be 2.310 TB. | ||
+ | |||
+ | =Download the page statistics= | ||
+ | |||
+ | ==Links of Interest== | ||
+ | * http://stats.grok.se/ | ||
+ | * http://stats.grok.se/about | ||
+ | * http://dom.as/ | ||
+ | * http://dumps.wikimedia.org/other/pagecounts-raw/ | ||
+ | * http://dumps.wikimedia.org/other/pagecounts-raw/2013/ | ||
+ | * started downloading all files from above link to hadoop0:/media/dominique/3TB/mediawiki/statistics/ | ||
+ | * wgetStats.sh | ||
+ | #! /bin/bash | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-010000.gz | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-020001.gz | ||
+ | ... | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-210000 | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-220000 | ||
+ | wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-230000 | ||
---- | ---- |