Difference between revisions of "CSC352 Notes 2013"

From dftwiki3
Jump to: navigation, search
Line 44: Line 44:
 
=Downloading All Wikipedia Images=
 
=Downloading All Wikipedia Images=
 
* [http://wikimedia.wansec.com/other/pagecounts-raw/ Page View Statistics for Wikimedia projects]
 
* [http://wikimedia.wansec.com/other/pagecounts-raw/ Page View Statistics for Wikimedia projects]
 +
* The main information about the dumps and the format is here: https://wikitech.wikimedia.org/wiki/Dumps/media
 +
:::''Tarballs are generated on a server provided by Your.org and made available from that mirror. The rsynced copy of the media itself and an rsynced copy of the above files (image/imagelinks/redirs info) is used as input to createmediatarballs.py to create two series of tarballs per wiki, one containing all locally uploaded media and the other containing all media uploaded to commons and used on the wiki.<br />One series of tarballs (with names looking like, e.g., enwiki-20120430-remote-media-1.tar, enwiki-20120430-remote-media-2.tar, and so on for remote media, and enwiki-20120430-local-media-1.tar, enwiki-20120430-local-media-2.tar and so on for local media), should contain all media for a given project. We bundle up the media into tarballs of 100k files per tarball for convenience of the downloader.<br />''
 +
 +
** Dumps are here: ftp://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/
 +
** The size of all the all the media media for 20121201 is 172 GB for the local dumps, and 2.153 TB for the remote dumps.  Total = 2.3 TB.
 +
 
----
 
----
  

Revision as of 11:53, 14 August 2013


...