Difference between revisions of "Data Visualization"

From dftwiki3
Jump to: navigation, search
(Introduction)
(Introduction)
Line 473: Line 473:
 
http://www.charlierose.com/shows/2008/05/07/1/design-and-the-elastic-mind
 
http://www.charlierose.com/shows/2008/05/07/1/design-and-the-elastic-mind
  
== Final Version ==
 
 
<small>May 3rd, 2008 by Allie </small><div class="entry">
 
 
So the final version of this project is ''almost'' as interactive as it is meant to be. Due to some unforeseen complications I wasn&#8217;t able to implement the functionality that would allow a user to submit the name of a wikipedia page and then have the data about that page displayed. Currently the visualization statically displays the revision information about the page titled &#8220;Diebold.&#8221; The information about the revisions on the Diebold page is retrieved from a MySQL database and displayed.
 
 
[[Image:viewwikiedits.png]]
 
 
The source code is linked on the page at the bottom.
 
 
[http://maven.smith.edu/~thiebaut/IS_blog/abellew/FinalVersion Final Version]
 
 
== Progress on 4/22! To do for last meeting 4/29&#8230; ==
 
 
<small>April 22nd, 2008 by admin </small><div class="entry">
 
 
A good reference for counting quantities in mysql databases can be<br /> found here:<br />[http://dev.mysql.com/doc/refman/5.0/en/counting-rows.html  http://dev.mysql.com/doc/refman/5.0/en/counting-rows.html]<br /> Here are some queries that can be useful for getting information<br /> about wikipedia pages and their contributors
 
 
* get the page Id of a page with a particular title:
 
select pageId, title from pages where title like 'Maria Callas';
 
select pageId, title from pages where title like 'king kong%';
 
* Use the % sign sparingly, otherwise it will match a lot of stuff we may not be interested in.
 
* get all the contributors that edited a page with a given ''PageId''<nowiki>: </nowiki>
 
select Id, revisionId, pageId, contributorId, comment
 
from revisions where pageId = 10;
 
* get the number of revisions made by each contributor on a given page with Id ''PageId ''(here 1000):
 
select `contributorId`, count(*) from `revisions` where
 
          `PageId`=1000 group by `contributorId`;
 
* get the number of revisions made by each contributor on the page with title &#8220;''Maria Callas''&#8220;:
 
select contributorId, count(*) from revisions where
 
PageId=( select pageId from pages where title like 'Maria Callas'
 
limit 1 ) group by contributorId;
 
Note that the &#8221;&#8217;''limit 1''&#8221;&#8217; forces the subquery to return only 1 page-Id. If one<br /> wants to catch all the contributors to all the pages that start with<br />'' Maria Callas'', then we can try something like this:
 
select contributorId, count(*) from revisions where PageId in
 
( select pageId from pages where title like 'Maria Callas%' )
 
group by contributorId;
 
Notice the introduction of the keyword '''in''' and the '''%-sign''', and the removal of ''limit 1''.
 
* '''Be careful, though, this query took several minutes to execute on a 3 GHz Pentium server!''' This is because there are many pages (20+) with ''Maria Callas'' in their title, and for each one we get a list of contributors, then merge all the data together&#8230; But it would take longer to do this in Php or Processing, so it pays to make the mysql server do the work. (One way to make the work go faster is to cleverly index the database&#8230; I&#8217;ll check on whether the indexing of the data can be improved at some point&#8230;)
 
* Finally, we can ''sort the results'' by count, so that the most prolific contributors are listed first. This way we can pick only the top ''N'' contributors, or only those who contributed more than ''R'' revisions.
 
select contributorId, count(*) as theCount from
 
          revisions where PageId = ( select pageId from pages where title like
 
          'Maria Callas' limit 1 ) group by contributorId order by theCount desc;
 
Note that we give a temporary name to the result of count(*), ''theCount'', so that we can specify what to sort the returned result on.
 
 
== To Do for next week: ==
 
 
# Start from a string containing the name of a wikipage, say &#8220;Maria Callas&#8221;, and create a graph with the rectangle in the middle showing the page name, and circles exploding around showing the contributors. Put the contributor Id in the circle. Make the size of the circle or the size of the link proportional to the number of contributions
 
# Create a table on the side of the graph with statistics about the page. For example, the page title, the total number of revisions, the total number of contributors, and maybe the top contributor.
 
# Create a mouse-over or a mouse-click event that will display in the status box information about what the mouse is pointing to.
 
 
== To Do for 4/22/08]==
 
 
<small>April 21st, 2008 by admin </small><div class="entry">
 
 
I should have posted it last week, but working from memory, this is what I remember us agreeing upon.
 
 
* We want something that may not have the bells and whistles, but that can grab information (a wikipedia page and its contributors) from a mySql database
 
* Display the page at the center of a Processing graph
 
* Display the contributors as circles around the page
 
* Show a measure of the amount of contribution from a contributor to the page (number of lines of edits, for example, or number of times contributor modified the page)
 
* Have some labeling system so that we can find out what the title of the page is, and who the contributors are. It might be too confusing to have the names inside the circles, so an alternative could be to have numbers in each circle and a table on the side indicating what name is associated with each number.
 
* Have a clickable map, so that clicking on a contributor could trigger some action such as going to the database and fetching more information, such as all the pages that have been contributed to by this person.
 
  
 
==Tree-maps: another interesting visual display of information ==
 
==Tree-maps: another interesting visual display of information ==
Line 544: Line 482:
 
[[Image:harddisk_treemap.png]]
 
[[Image:harddisk_treemap.png]]
  
== Development Paused for the Week ==
 
 
<small>April 7th, 2008 by Allie </small><div class="entry">
 
 
So while looking at my calendar for the week I realized that this weekend is Collaborations! So instead of doing development I made the poster for Collaborations and had the opportunity to reflect on the different aspects of the project. More and more I&#8217;m surprised that there isn&#8217;t a large presence of research being done on the subject, especially by Google.
 
 
So here is the current poster, which is in need of editing before it goes to the printer.
 
 
[http://maven.smith.edu/~abellew/4.8.2008/wikipediaPoster.pdf Collaborations Poster]
 
  
 
==  Interesting word chart ==
 
==  Interesting word chart ==
Line 563: Line 492:
  
 
[[Image:obamaclinton.png]]
 
[[Image:obamaclinton.png]]
 
== New version of exploding circles ==
 
 
<small>April 1st, 2008 by admin </small><div class="entry">
 
 
Just finished going over the code with Allie, and we got a nice smooth display.
 
 
Here&#8217;s the [http://cs.smith.edu/~thiebaut/IS_blog/abellew/05.01.2008 link]
 
 
Ideas for what to implement for next week:
 
 
# Make the dimensions of the display constant and everything else depends on it. We might want to keep in mind that with some graphs, their might be so many circles that scaling might become important. Check the Processing documentation for how scaling can be done. (Scaling means that the window has a geometry of, say, 500 x 500, but that we are actually using a mathematical world that might be 1000×1000)
 
# Put all the properties of the circles in arrays. These arrays eventually will be filled by a query to the database. But right now we might want to have an array of strings which will be shown inside the circles, and an array of numbers defining the connectivity of the circle (wikipedia contributor) to the center square (wikipedia page). You might want to use this number to define the color of the circles, the size of the circles, or the width of the edges (or a combination of them).
 
# Look at ways to make the circles or the square clickable, so that clicking on one will cause the browser to present new information.
 
# Look at ways to show information on mouse-over events. If the mouse moves over a circle, it would be nice to have a box give more information about this circle.
 
 
==  Geometry is complicated&#8230;  ==
 
 
<small>March 31st, 2008 by Allie </small><div class="entry">
 
 
Here are my three latest trials. I was able to implement the circles having text on them but for some reason the text would not show within the web browser. It has to do with how processing renders text and I&#8217;m not sure how to get around it.
 
 
4 Circles around a Square: [http://www.cs.smith.edu/~abellew/4.1.2008/4/ Link 1]
 
 
12 Circles around a Square: [http://www.cs.smith.edu/~abellew/4.1.2008/12/ Link 2]
 
 
24 Circles around a Square: [http://www.cs.smith.edu/~abellew/4.1.2008/24/ Link 3]
 
  
 
==  Some Graphviz Examples  ==
 
==  Some Graphviz Examples  ==
Line 600: Line 502:
  
 
[[Image:graphviz_flickr.jpg]]
 
[[Image:graphviz_flickr.jpg]]
 
==  To do for 4/1/08  ==
 
 
<small>March 25th, 2008 by admin </small><div class="entry">
 
 
Good job on today&#8217;s applets.
 
 
For next week, here is what we are shooting for:
 
 
# square in the middle
 
# 10 circles distributed on a wheel around the center square. They may start all overlapping over the middle square, and quickly move out on the spokes of the wheel to settle at safe distances from the center square and from each other
 
# the circles have random dimensions
 
# the circles are connected to the center square by edges
 
# for speed, we may want to redraw the circles in gray before redrawing them in white to prevent filling the whole background every time.
 
 
==  Fun with Applets ==
 
 
<small>March 24th, 2008 by Allie </small><div class="entry">
 
 
So spring break is over, my flu isn&#8217;t yet, but I was still able to get all our goals accomplished! There are two examples of applets. The first is with static, overlapping circles and then the second is with moving circles. An unexpected side affect of moving the circles to avoid colliding is that the circles aren&#8217;t redrawn a new one is just drawn for every frame. I&#8217;m not sure how to fix that just yet. Also after reading the collision avoidance section I just decided to write all my own code. Since I&#8217;m currently taking about 120mg of sudafed there is something wrong with my algorithm that makes the circles lock in perpetual motion if they sit right on top of each other. It&#8217;s probably an easy fix though.
 
 
I would have gone ahead with drawing lines and labels but I wanted to learn how to un-draw the objects first before adding even more visual clutter to the window.
 
 
First trial with multiple circles: http://www.cs.smith.edu/~abellew/multipleCircles/
 
 
Second trial with moving circles: http://www.cs.smith.edu/~abellew/movingCircles/
 
 
== To do for 3/25/08  ==
 
 
<small>March 11th, 2008 by admin </small><div class="entry">
 
 
1) study if we can put two circles (ellipses) on a plan, overlapping, and have them move away from each other until they do not overlap. Investigate how much programming is involved, and whether the movement can be automatically controlled by Processing
 
 
2) Create a &#8220;daisy&#8221; diagram of a few nodes, with one node in the center of the star, and several nodes around. Each node should have a label, and the nodes are connected to the center node with links/edges of varying width.
 
 
3) Figure out how to generate an applet from a Processing program.
 
 
==  A Processing Program  ==
 
 
<small>March 11th, 2008 by admin </small><div class="entry">
 
<code><pre>
 
// ellipses on springs
 
int ellipses = 5;
 
float[]x = new float[ellipses];
 
float[]y = new float[ellipses];
 
float[]w = new float[ellipses];
 
float[]h = new float[ellipses];
 
float[]angle = new float[ellipses];
 
float[]frequency = new float[ellipses];
 
float[]amplitude = new float[ellipses];
 
float[]strokeWt = new float[ellipses];
 
float[]damping = new float[ellipses];
 
int springSegments = 24;
 
int springWidth = 8;
 
 
void setup() {
 
  size(600, 400);
 
  frameRate(30);
 
  smooth();
 
  fill(0);
 
  setSpring();
 
}
 
 
void draw() {
 
  background(255);
 
  for (int i=0; i<ellipses; i++) {
 
      createSpring(x[i], y[i], w[i], h[i], strokeWt[i]);
 
      noStroke();
 
      fill(0);
 
      // draw ellipses
 
      ellipse(x[i], y[i], 50, 50);
 
      // spring behavior
 
      y[i] = y[i]+cos(radians(angle[i]))*amplitude[i];
 
      angle[i]+=frequency[i];
 
      amplitude[i]*=damping[i];
 
  }
 
  // press the mouse to reset
 
  if (mousePressed) {
 
    setSpring();
 
  }
 
}
 
 
void setSpring() {
 
  for (int i=0; i<ellipses; i++) {
 
    // size approximates mass
 
    w[i] = random(20, 70);
 
    h[i] = w[i];
 
    // stroke weight approximates
 
    // spring strength (resistance)
 
    strokeWt[i] = random(1, 4);
 
    x[i] = ((width/(ellipses+1))*i)+width/(ellipses+1)-w[i]/2.0;
 
    y[i] = (w[i]*3)/strokeWt[i];
 
    angle[i] = 0;
 
    // spring speed
 
    frequency[i] = strokeWt[i]*4;
 
    // amplitude based on mass/spring strength
 
    amplitude[i] = (w[i]*1.5)/strokeWt[i];
 
    // calculate damping based on stroke weight
 
    // simulates resistance of spring thickness
 
    switch(round(strokeWt[i])) {
 
      case 1:
 
        damping[i] = .99;
 
        break;
 
      case 2:
 
        damping[i] = .98;
 
        break;
 
      case 3:
 
        damping[i] = .97;
 
        break;
 
      case 4:
 
        damping[i] = .96;
 
        break;
 
    }
 
  }
 
}
 
 
// plot spring
 
void createSpring(float x, float y, float w, float h, float strokeWt) {
 
  stroke(50);
 
  strokeWeight(strokeWt);
 
  for (int i=0; i<springSegments; i++) {
 
    // for spring end segment
 
    if (i==springSegments-1) {
 
        line(x+w/2+springWidth, (y/springSegments)*i, x+w/2, (y/springSegments)*(i+1));
 
    }
 
    else {
 
      // alternate spring bend left/right
 
      if (i%2==0) {
 
          line(x+w/2-springWidth, (y/springSegments)*i, x+w/2-springWidth, (y/springSegments)*(i+1));
 
      }
 
    }
 
  }
 
}
 
</pre></code>
 
 
The resulting applet: [processing/index.html applet]
 
 
==  To do list for 3/11/08  ==
 
 
<small>March 4th, 2008 by admin </small><div class="entry">
 
 
We are following the Processing path.
 
 
From Allie&#8217;s exploration of Processing, it seems that we can represent a star graph in 3-D with springs linking the outside nodes to the node at the center of the star, and use non-collision attributes of the nodes to make sure they do not overlap in space. It seems that using a &#8220;force field&#8221; around the nodes would force them to be some distance away from each other in a pleasing way.
 
 
The different ideas we discussed:
 
 
* on a mouse-over event over a node, a box opens up with information about the node visited, and a link that can bring up a new page, or a new graph
 
* we can use the mouse to &#8216;move&#8217; the graph around and see what is &#8220;behind&#8221;
 
* we could have a series of checkboxes, or text input boxes that would allow for interesting filtering of the data:
 
** We can block all the contributors belonging to the same IP group together (all the contributors working at MS, for example), in one big node
 
** we can color-tag all the contributors that have a particular status: working at a given company, having contributed in the last week/month, having contributed to other pages
 
* We could also organize the nodes on some kind of geodesic space around the center node
 
 
==  Processing + PHP  ==
 
 
<small>February 25th, 2008 by Allie </small><div class="entry">
 
 
So I did a search for &#8220;php&#8221; within the http://processing.org domain and received a number of interesting results, the ''most'' interesting being this forum post about a trick to using php requests for MySQL data.
 
 
[http://processing.org/discourse/yabb_beta/YaBB.cgi?board=Integrate;action=display;num=1133759630 forum post 1]
 
 
This post is also interesting (notably Reply #6):
 
 
[http://processing.org/discourse/yabb/YaBB.cgi?board=Integrate;action=display;num=1106908696 forum post 2]
 
 
Processing is just the visual framework that will work for this project I think and since now I know it&#8217;s possible to connect it to MySQL through PHP we can move ahead! This is very exciting indeed and since it&#8217;s already being used in the department it&#8217;s a great tool to perpetuate.
 
  
 
==  Showing the time variation of various quantities ==
 
==  Showing the time variation of various quantities ==
Line 799: Line 534:
 
Check http://www.processing.org for more info and examples.
 
Check http://www.processing.org for more info and examples.
  
==  Something new to explore!  ==
 
 
<small>February 21st, 2008 by admin </small><div class="entry">
 
 
Just attended a talk today by Ben Fry in the Art Department. Super stuff. Wish you had seen it, Allie.
 
 
Ben is the co-author of a language called &#8220;processing&#8221; (http://processing.org). He showed some very interesting animation and 3-D graphs that seem perfect for what we want to do. The graphics look pretty spectacular. I am not just sure how much coding is behind all the example.
 
 
I would like to change the &#8220;todo&#8221; list for this coming Tuesday and have you explore Processing instead. By the way, Processing is the language used in CSC106, taught by Eitan and Thomas, so several seniors including Jordan and Stephanie might be good people to brainstorm with&#8230;
 
 
Happy exploring!!!
 
 
==  To do for 2/26/08  ==
 
 
<small>February 19th, 2008 by admin </small><div class="entry">
 
 
# Modify the current display page and remove the frames
 
# Explore how to get better interaction from the SVG file when we click on nodes
 
# Aim for our next goal: a form with a box at the top where we can enter keywords (Hillary Clinton, say), and a submit button. Clicking on the button triggers a php program that generates a star graph with Hillary Clinton in the middle and the 10, 20, N most active contributors all around. Somehow we need to convey the scale of the number of edits. Probably a scale on the side (which we won&#8217;t worry about making user-modifiable right now), and links of varying width or color (or both) linking the contributors to the center node. The number 10, 20 or N might also be numbers in a text-entry box of the form.
 
# Clicking on a contributor node will bring up a new graph with this contributor in the center of a star, and the top 10, 20, N pages it has contributed to all around. We&#8217;ll probably want to see the &#8220;Hillary Clinton&#8221; page as part of these 10, 20, N pages, even if its ranking does not place it in this group.
 
# Explore stored procedures under mySql 5.0&#8230; Also, do not hesitate to make mySql do the dirty work, i.e. counting the number of contributors:
 
 
 
 
select '''count(''' `contributorId` ''')''' from `tablex`
 
where `contributorId`=''N'' and `pageId`=''P''
 
 
== How to generate PNG and SVG images of graphs in Php using Neato or Dot  ==
 
 
<small>February 19th, 2008 by admin </small><div class="entry">
 
 
This does generate PNG and SVG files for directed and undirected graphs.
 
 
First, you must login to tango.csc.smith.edu, as Graphviz is only installed on tango, and not on beowulf. Use your regular Linux login information to connect.
 
 
Next, cd to '''/var/www/html/abellew/''' to make it your working directory, and create the two subdirectories specified below.
 
 
Create a subdirectory in your working directory called '''Image. '''Make sure it is world-readable. This directory will contain a Php file containing a php class that takes care of generating png images from various GraphViz commands.
 
 
Create another subdirectory called '''images''' that is world readable '''and <u>writable</u>'''. This is where the Php class will store the png and svg image files.
 
 
Copy the file [http://tango.csc.smith.edu/thiebaut/Image/GraphViz.txt GraphViz.php] into the Image subdirectory. It&#8217;s the Php class we&#8217;re going to use. Make sure it is world-readable.
 
 
Create a [http://tango.csc.smith.edu/thiebaut/graphviz.php test file] called '''graphviztest.php''' in your working directory (of which Image and images are subdirectories). Make sure this file is world-readable as well.
 
 
Point your browser to this new address:
 
 
      http://tango.csc.smith.edu/abellew/graphviztest.php
 
 
and verify that you get a page with two pictures of the same graph. The top image is in png, the bottom one in svg. Verify that you can click on the nodes of the svg image and follow links (although they have the bad problem of opening up in the embedded frame, not the whole browser window&#8230; [[Image:icon_sad.gif]]
 
 
==  Example Php program to access wikipedia history  ==
 
 
<small>February 19th, 2008 by admin </small><div class="entry">
 
 
The following [[IS_getWikiInfo | page ]]  contains Php code to retrieve pages id from the wikipedia history database, along with the contributors to a page with a given Id. (The page will take several seconds to load as the query is performing a search for keywords in the 11 million pages in the database).
 
 
Note that the current version retrieves only the contributors for 1 page, but that with little effort we can change the query to retrieve the contributors to a list of several pages Ids.
 
 
In order to run the program you must have a copy of the accessvars.php file, shown below:
 
 
<?php
 
//--------------------------------------------------------------
 
// MySql variables
 
//--------------------------------------------------------------
 
$params = array( 'host'    => "tango.csc.smith.edu",
 
        'database' => "enwikihistory2",
 
        'table'    => "pages",
 
        'user'    => "yourmysqlloginname",
 
        'passwd'  => "yourmysqlpassword" );
 
?>
 
 
==  An Interactive Page  ==
 
 
<small>February 19th, 2008 by Allie </small><div class="entry">
 
[[Image:SvgExample.svg | left]]
 
 
This week I was able to make a new page with the three specified frames showing the title of the wiki page in the top, the svg in the middle and potential contributor information in the bottom. I was able to give the nodes the appropriate links except that because of the frame structure the links target the frame in which they are clicked instead of a different frame or new window.
 
 
I thought about the different ways to create a scale and these were my thoughts:
 
 
* There are two ways to go about the scale. The first way is including the scale in the svg and the other is to have the scale live in another part of the webpage.
 
* If the scale should be part of the svg I&#8217;m not sure how to implement it using the DOT language.
 
* If the scale should be in another part of the webpage then what kind of control should it be? Also, how is it going to communicate with the page creating/running the svg, via PHP, JavaScript or maybe something else?
 
 
So I didn&#8217;t complete the scale part of this week&#8217;s tasks because I bogged myself down with the greater end goals. Also in order to get the correct scale requires a complex SQL Select statement and some post-processing. I looked up some static information in the database and wrote the topmost frame to look some revision information similar to what will be needed for the scale. So while I didn&#8217;t get a scale, this is foundation for what will need to be decided/done in the future.
 
 
Also in order to keep as many versions of this project as might be helpful I am storing each week&#8217;s work in a separate folder denoted by month.day.year so this week&#8217;s new work (any pages/code that I edited) is stored here (click on &#8220;viz.html&#8221;):
 
http://www.cs.smith.edu/~abellew/2.18.2008<br />
 
  
 +
 
==  Another network navigation site  ==
 
==  Another network navigation site  ==
  
Line 910: Line 557:
  
 
Can be found here: [http://abeautifulwww.com/2007/02/10/35-great-visualizations/ abeautifulwww.com]
 
Can be found here: [http://abeautifulwww.com/2007/02/10/35-great-visualizations/ abeautifulwww.com]
 
 
 
==  To do for 2/19/08 ==
 
 
<small>February 12th, 2008 by admin </small><div class="entry">
 
 
# Make SVG '''clickable '''with nodes pointing to other pages
 
# Put SVG in a '''system of 3 frames''', with middle frame showing the svg graph, top slim frame showing a title, and bottom slim frame showing an image from the wikipage (static image for right now). When user clicks on a node, information about this node shows up in bottom frame
 
# Question: is SVG the only format that allows for interaction with user (clickable)?
 
# Investigate creating a '''scale '''to show what information is represented by the link. If using color, then need a color scale. If using line width as measure of # of contributions, then show a scale with 4 or 5 thicknesses and what # they represent. Use a linear map between # of contributions and thickness for right now.
 
# Watch '''Tamara Munzner'''&#8217;s video, and explore her web site and her group&#8217;s web site
 
# Semester-long project: look for turn-key software systems for displaying graphs.
 
  
 
    
 
    
Line 948: Line 583:
  
 
   
 
   
== Progress on project! ==
 
 
<small>February 11th, 2008 by Allie </small><div class="entry">
 
 
So far I have created a hap-harzard couple of Python objects to parse a text file with data in it. I definitely spent more time this week on function rather than commenting but I am guessing most of my work so far will be changed/adjusted as the programming language changes (from Python to PHP) and as the complexity of my task increases. Here is a link to all the files that I worked on/created:
 
 
http://maven.smith.edu/~thiebaut/IS_blog/abellew/2.18.2008/
 
 
<br /> pluginspage=&#8221;http://www.adobe.com/svg/viewer/install/&#8221; />Html code to embed svg in html:
 
 
 
[[Image:neatoTrial.svg]]
 
 
<code><pre>
 
<embed src="http://www.cs.smith.edu/~abellew/2.12.2008/neatoTrial.svg"
 
width="600" height="1000" type="image/svg+xml"
 
pluginspage="http://www.adobe.com/svg/viewer/install/" />
 
</pre></code>
 
 
==  Graph Visualization Specifics  ==
 
 
<small>February 11th, 2008 by Allie </small><div class="entry">
 
 
The graph visualizer in Silverlight, while supporting physics algorithms to display the graph in the most rigid fashion, we are not interested in rigidity but rather the best way to visually represent the information for extrapolation. Graphviz is the best option for accomplishing that goal.
 
 
For the actual graph.. in order to quickly and easily identify which Wikipedia users contributed most to an article I would like to use a color bar from blue (less contribution) to red (more contribution) on the line connecting the user to an article. Using a color scheme like this instead of varying degrees of line thickness is more intuitive for detecting levels of activity.
 
  
 
==  Neat graphical representation of activity in Wikipedia ==
 
==  Neat graphical representation of activity in Wikipedia ==
Line 998: Line 607:
 
# Another interesting [http://services.alphaworks.ibm.com/manyeyes/view/SmAgULsOtha60LnP8mRuL2- plot] by ManyEyes
 
# Another interesting [http://services.alphaworks.ibm.com/manyeyes/view/SmAgULsOtha60LnP8mRuL2- plot] by ManyEyes
  
==  To do for 2/12/08  ==
 
  
<small>February 5th, 2008 by admin </small><div class="entry">
 
 
Create a text file with the following entries
 
 
# title of the page (this will be in a circle in the middle of the graph)
 
# the link to the page (this will be called when we click on the circle)
 
# a collection of 20 triplets (3-line blocks)
 
 
<blockquote>
 
 
# contributor name
 
# <nowiki># of contributions to the current page</nowiki>
 
# link to the contributor (this will be a link to a php page which will get the Id of the contributor)
 
 
</blockquote>
 
 
Generate from this a dot file
 
 
process the dot file to get an svg file
 
 
put the svg file on your web site
 
 
install the svg plugin for your browser
 
 
display the graph
 
  
 
==  Visual Representation Options  ==
 
==  Visual Representation Options  ==
Line 1,034: Line 617:
 
http://silverlight.net/showcase/default.aspx
 
http://silverlight.net/showcase/default.aspx
  
==  On Borges and Wikipedia  ==
 
 
<small>January 29th, 2008 by admin </small><div class="entry">
 
 
<span class="italic">In 1940, Borges wrote:</span>
 
 
<blockquote>
 
 
<span class="italic">Who, singular or plural, invented Tlön? The plural is, I suppose, inevitable, since the hypothesis of a single inventor — some infinite Leibniz working in obscurity and self-effacement — has been unanimously discarded. It is conjectured that this ‘brave new world’ is the work of a secret society of astronomers, biologists, engineers, metaphysicians, poets, chemists, algebrists, moralists, painters, geometers, &#8230; guided and directed by some shadowy man of genius. There are many men adept in those diverse disciplines, but few capable of imagination — fewer still capable of subordinating imagination to a rigorous and systematic plan. The plan is so vast that the contribution of each writer is infinitesimal.</span>
 
 
</blockquote>
 
 
Not too bad a description of Wikipedia!<br /> More on this in a 01/06/08 NYT article [http://www.nytimes.com/2008/01/06/books/06cohenintro.html?_r=2&ex=1357275600&en=60e1cdf8ddf72c63&ei=5088&partner=rssnyt&emc=rss&oref=slogin&oref=slogin <br /> Borges and the Foreseeable Future].
 
  
 
== Animation of the history of a wikipedia page ==
 
== Animation of the history of a wikipedia page ==
Line 1,069: Line 639:
 
Select &#8220;Load Map&#8221;/&#8221;Popular&#8221; and pick an entry. You will see a network of connections appearing. The network shows the people that belong to different boards of companies. As you move your mouse over some of the entries you are given a menu to search or delete the item. Also very neat, you can ckick on an item and move it around with the mouse while retaining the existing connections.
 
Select &#8220;Load Map&#8221;/&#8221;Popular&#8221; and pick an entry. You will see a network of connections appearing. The network shows the people that belong to different boards of companies. As you move your mouse over some of the entries you are given a menu to search or delete the item. Also very neat, you can ckick on an item and move it around with the mouse while retaining the existing connections.
  
I would like you to develop a '''program''' (web based) that would take data from a mysql '''database''' and display the '''graph''' of the relations existing between the data. TheyRule shows how somebody decided to do it (and very nicely at that), but other options exist.
+
   
 
 
The data I have is a huge collection of the edits that have been done to the english wikipedia pages. I have a database with 3 tables, and you can access them by going to this URL: [http://maven.smith.edu/~thiebaut/wikihistory/menu.php http://cs.smith.edu/~thiebaut/wikihistory].
 
 
 
Click on the '''pages''' table. It contains the title of ALL the wikipedia pages and their Id, as generated by wikipedia.
 
 
 
Click on the '''contributors''' table. It contains the list of all the contributors who have edited a page in wikipedia. 3 different pieces of information can define a contributor: a name, an Id, or an IP address. Unfortunately, wikipedia doesn&#8217;t force contributors to enter their name (in fact, some contributors are computer programs, which do not have names, so contributors may or may not have any information recorded for up to two of these fields.
 
 
 
Click on the '''revisions''' table. It contains all the revisions performed. Each revision (or edit) is identified by an '''Id''' number (which I generate when populating this table), a '''revisionId''' which is the Id the revision has in the wikipedia database, the Id of the page on which this revision was done '''(pageId),''' the Id of the contributor '''(contributorId)''' who made the edit (this is the same contributor field used in the contributors table), a '''comment '''indicating what the edit was about, a '''textlength''' field which indicates how many characters were in the edit, and finally the '''date''' the edit was done.
 
 
 
I would like to be able to have a web page where I could enter the Id of a few pages (for example the pages corresponding to the current presidential candidates), and have a webpage showing the graph of all the contributors to the different pages, and whether a given contributor contributed to more than just one page. Similarly to the way &#8220;TheyRule&#8221; works.
 
 
 
A nice collection of software tools that should probably be used is the [http://www.graphviz.org/ Graphviz] set. Go take a look at it. It generates SVG graphs, and the major browsers have plugins to visualize SVG graphs.
 
 
 
== Welcome  ==
 
 
 
<small>January 28th, 2008 by admin </small><div class="entry">
 
 
 
Welcome to DT&#8217;s Independent study blog.
 
 
 
<hr>
 
  
 
<i>This mediawiki page was generated by [http://www.ebruni.it/software/os/i_love_wiki/index.mpl  I love wiki], an HTML to wiki syntax converter that took the html version of the blog and translate it into wiki syntax</i>
 
<i>This mediawiki page was generated by [http://www.ebruni.it/software/os/i_love_wiki/index.mpl  I love wiki], an HTML to wiki syntax converter that took the html version of the blog and translate it into wiki syntax</i>

Revision as of 09:24, 13 October 2008

Introduction

This page originated as a Wordpress Blog documenting the progress on an Independent Study by Allison Bellew in Spring 08. Allie's work is currently continued by Christine Grascia.

The collection of posts is organized from the most recent (at the top) to the oldest (at the bottom). It might make better sence, then, to start at the bottom of the page!
Additions are continuously made, documenting interesting discoveries regarding visual displays of information.
While this page is not available for anonymous edits, feel free to send comments, suggestions and/or discoveries to thiebaut@cs.smith.edu.



Contents