Difference between revisions of "CSC352 Notes 2013"

From dftwiki3
Jump to: navigation, search
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<onlydft>
 
<onlydft>
 +
<br />
 +
<center>
 +
<font color="red">
 +
'''See [[CSC352 2017 DT's Notes| this page]] for 2017 updated notes'''
 +
</font>
 +
</center>
 +
<br />
 +
__TOC__
 +
<br />
 +
=Resources 2013=
 +
==Rocco's Presentation 10/10/13==
 +
* libguides.smith.edu/content.php?pid=510405
 +
* idea:
 +
** for paper, start getting the thread, collage, packing, parallel image processing. 
 +
** approaches.
 +
** intro: what has been done in the field
 +
* Citation database: Web of Science
 +
* Ref Works & Zotero can help maintain citations
 +
* 5-College catalogs
 +
* Worldcat is the world catalog for books
 +
* Web of Science: can get information on references and also who's publishing in the field or which institutions are publishing in the given area.
 +
* Discover searches other databases.
 +
* Library Guide (Albany), super guide for libraries.
 +
* [http://VideoLectures.net videolectures.net]
 +
 +
<br />
 +
==Hadoop==
 +
<br />
 +
* [[CSC352_MapReduce/Hadoop_Class_Notes | DT's Class notes on Hadoop/MapReduce]]
 +
* [http://www.umiacs.umd.edu/~jimmylin/cloud-2008-Fall/index.html Cloud Computing notes from UMD] (2008, old)
  
=Resources 2013=
 
 
==On-Line==
 
==On-Line==
  
Line 73: Line 102:
 
* [[http://cs.smith.edu/dftwiki/images/BerkeleyBootCampAug2013_DFTNotes.pdf | My notes from the Berkeley Boot-Camp on-line Workshop]], Aug 2013.
 
* [[http://cs.smith.edu/dftwiki/images/BerkeleyBootCampAug2013_DFTNotes.pdf | My notes from the Berkeley Boot-Camp on-line Workshop]], Aug 2013.
  
 +
=Update 2015: Downloading images to Hadoop0=
 +
<br />
 +
* Rsync from http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/en/xxx where xxx is 0, 1, 2, ... 9, a, b, c, d, e, f.
 +
<br />
 
=Downloading All Wikipedia Images=
 
=Downloading All Wikipedia Images=
 
* From [http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia]
 
* From [http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia]
Line 126: Line 159:
  
 
* Total size should be 2.310 TB.
 
* Total size should be 2.310 TB.
 +
 +
=Download the page statistics=
 +
 +
==Links of Interest==
 +
* http://stats.grok.se/
 +
* http://stats.grok.se/about
 +
* http://dom.as/
 +
* http://dumps.wikimedia.org/other/pagecounts-raw/
 +
* http://dumps.wikimedia.org/other/pagecounts-raw/2013/
 +
* started downloading all files from above link to hadoop0:/media/dominique/3TB/mediawiki/statistics/
 +
* wgetStats.sh
 +
#! /bin/bash
 +
wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-000000.gz
 +
wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-010000.gz
 +
wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130101-020001.gz
 +
...
 +
wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-210000
 +
wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-220000
 +
wget http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/projectcounts-20130131-230000
  
 
----
 
----

Latest revision as of 10:24, 8 December 2016


...