Difference between revisions of "CSC352 Notes 2013"
Line 43: | Line 43: | ||
=Downloading All Wikipedia Images= | =Downloading All Wikipedia Images= | ||
− | * From http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia | + | * From [http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia] |
:::''Where are images and uploaded files<br /><br />Images and other uploaded media are available from mirrors in addition to being served directly from Wikimedia servers. Bulk download is currently (as of September 2012) available from mirrors but not offered directly from Wikimedia servers. See the list of current mirrors.<br /><br />Unlike most article text, images are not necessarily licensed under the GFDL & CC-BY-SA-3.0. They may be under one of many free licenses, in the public domain, believed to be fair use, or even copyright infringements (which should be deleted). In particular, use of fair use images outside the context of Wikipedia or similar works may be illegal. Images under most licenses require a credit, and possibly other attached copyright information. This information is included in image description pages, which are part of the text dumps available from dumps.wikimedia.org. In conclusion, download these images at your own risk (Legal)'' | :::''Where are images and uploaded files<br /><br />Images and other uploaded media are available from mirrors in addition to being served directly from Wikimedia servers. Bulk download is currently (as of September 2012) available from mirrors but not offered directly from Wikimedia servers. See the list of current mirrors.<br /><br />Unlike most article text, images are not necessarily licensed under the GFDL & CC-BY-SA-3.0. They may be under one of many free licenses, in the public domain, believed to be fair use, or even copyright infringements (which should be deleted). In particular, use of fair use images outside the context of Wikipedia or similar works may be illegal. Images under most licenses require a credit, and possibly other attached copyright information. This information is included in image description pages, which are part of the text dumps available from dumps.wikimedia.org. In conclusion, download these images at your own risk (Legal)'' | ||
− | * [http://wikimedia.wansec.com/other/pagecounts-raw/ Page View Statistics for Wikimedia projects] | + | * [http://wikimedia.wansec.com/other/pagecounts-raw/ Page View Statistics for Wikimedia projects] at |
− | * The main information about the dumps and the format is here: https://wikitech.wikimedia.org/wiki/Dumps/media | + | wikimedia.wansec.com/other/pagecounts-raw/ |
+ | * The main information about the dumps and the format is here: [https://wikitech.wikimedia.org/wiki/Dumps/media https://wikitech.wikimedia.org/wiki/Dumps/media] | ||
:::''Tarballs are generated on a server provided by Your.org and made available from that mirror. The rsynced copy of the media itself and an rsynced copy of the above files (image/imagelinks/redirs info) is used as input to createmediatarballs.py to create two series of tarballs per wiki, one containing all locally uploaded media and the other containing all media uploaded to commons and used on the wiki.<br />One series of tarballs (with names looking like, e.g., enwiki-20120430-remote-media-1.tar, enwiki-20120430-remote-media-2.tar, and so on for remote media, and enwiki-20120430-local-media-1.tar, enwiki-20120430-local-media-2.tar and so on for local media), should contain all media for a given project. We bundle up the media into tarballs of 100k files per tarball for convenience of the downloader.<br />'' | :::''Tarballs are generated on a server provided by Your.org and made available from that mirror. The rsynced copy of the media itself and an rsynced copy of the above files (image/imagelinks/redirs info) is used as input to createmediatarballs.py to create two series of tarballs per wiki, one containing all locally uploaded media and the other containing all media uploaded to commons and used on the wiki.<br />One series of tarballs (with names looking like, e.g., enwiki-20120430-remote-media-1.tar, enwiki-20120430-remote-media-2.tar, and so on for remote media, and enwiki-20120430-local-media-1.tar, enwiki-20120430-local-media-2.tar and so on for local media), should contain all media for a given project. We bundle up the media into tarballs of 100k files per tarball for convenience of the downloader.<br />'' | ||
− | ** Dumps are here: ftp://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/ | + | ** Dumps are here: [ftp://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/ ftp://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/] |
** The size of all the all the media media for 20121201 is 172 GB for the local dumps, and 2.153 TB for the remote dumps. Total = 2.3 TB. | ** The size of all the all the media media for 20121201 is 172 GB for the local dumps, and 2.153 TB for the remote dumps. Total = 2.3 TB. | ||