CSC400 Independent Study Allie Bellew

From dftwiki3
Revision as of 09:18, 13 October 2008 by Thiebaut (talk | contribs) (New page: This page originated as a Wordpress Blog documenting the progress on an Independent Study by Allison Bellew in Spring 08. Allie's work is currently continued by [[CSC400_Fall_08_Wikipedia ...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page originated as a Wordpress Blog documenting the progress on an Independent Study by Allison Bellew in Spring 08. Allie's work is currently continued by Christine Grascia. The collection of posts is organized from the most recent (at the top) to the oldest (at the bottom). It might make better sence, then, to start at the bottom of the page! Additions are continuously made, documenting interesting discoveries regarding visual displays of information. While this page is not available for anonymous edits, feel free to send comments, suggestions and/or discoveries to thiebaut@cs.smith.edu.

Final Version

May 3rd, 2008 by Allie

So the final version of this project is almost as interactive as it is meant to be. Due to some unforeseen complications I wasn’t able to implement the functionality that would allow a user to submit the name of a wikipedia page and then have the data about that page displayed. Currently the visualization statically displays the revision information about the page titled “Diebold.” The information about the revisions on the Diebold page is retrieved from a MySQL database and displayed.

Viewwikiedits.png

The source code is linked on the page at the bottom.

Final Version

Progress on 4/22! To do for last meeting 4/29…

April 22nd, 2008 by admin

A good reference for counting quantities in mysql databases can be
found here:
http://dev.mysql.com/doc/refman/5.0/en/counting-rows.html
Here are some queries that can be useful for getting information
about wikipedia pages and their contributors

  • get the page Id of a page with a particular title:
select pageId, title from pages where title like 'Maria Callas';
select pageId, title from pages where title like 'king kong%';
  • Use the % sign sparingly, otherwise it will match a lot of stuff we may not be interested in.
  • get all the contributors that edited a page with a given PageId:
select Id, revisionId, pageId, contributorId, comment
from revisions where pageId = 10;
  • get the number of revisions made by each contributor on a given page with Id PageId (here 1000):
select `contributorId`, count(*) from `revisions` where
         `PageId`=1000 group by `contributorId`;
  • get the number of revisions made by each contributor on the page with title “Maria Callas“:
select contributorId, count(*) from revisions where
PageId=( select pageId from pages where title like 'Maria Callas'
limit 1 ) group by contributorId;

Note that the ”’limit 1”’ forces the subquery to return only 1 page-Id. If one
wants to catch all the contributors to all the pages that start with
Maria Callas, then we can try something like this:

select contributorId, count(*) from revisions where PageId in
( select pageId from pages where title like 'Maria Callas%' )
group by contributorId;

Notice the introduction of the keyword in and the %-sign, and the removal of limit 1.

  • Be careful, though, this query took several minutes to execute on a 3 GHz Pentium server! This is because there are many pages (20+) with Maria Callas in their title, and for each one we get a list of contributors, then merge all the data together… But it would take longer to do this in Php or Processing, so it pays to make the mysql server do the work. (One way to make the work go faster is to cleverly index the database… I’ll check on whether the indexing of the data can be improved at some point…)
  • Finally, we can sort the results by count, so that the most prolific contributors are listed first. This way we can pick only the top N contributors, or only those who contributed more than R revisions.
select contributorId, count(*) as theCount from
         revisions where PageId = ( select pageId from pages where title like
         'Maria Callas' limit 1 ) group by contributorId order by theCount desc;

Note that we give a temporary name to the result of count(*), theCount, so that we can specify what to sort the returned result on.

To Do for next week:

  1. Start from a string containing the name of a wikipage, say “Maria Callas”, and create a graph with the rectangle in the middle showing the page name, and circles exploding around showing the contributors. Put the contributor Id in the circle. Make the size of the circle or the size of the link proportional to the number of contributions
  2. Create a table on the side of the graph with statistics about the page. For example, the page title, the total number of revisions, the total number of contributors, and maybe the top contributor.
  3. Create a mouse-over or a mouse-click event that will display in the status box information about what the mouse is pointing to.

To Do for 4/22/08]

April 21st, 2008 by admin

I should have posted it last week, but working from memory, this is what I remember us agreeing upon.

  • We want something that may not have the bells and whistles, but that can grab information (a wikipedia page and its contributors) from a mySql database
  • Display the page at the center of a Processing graph
  • Display the contributors as circles around the page
  • Show a measure of the amount of contribution from a contributor to the page (number of lines of edits, for example, or number of times contributor modified the page)
  • Have some labeling system so that we can find out what the title of the page is, and who the contributors are. It might be too confusing to have the names inside the circles, so an alternative could be to have numbers in each circle and a table on the side indicating what name is associated with each number.
  • Have a clickable map, so that clicking on a contributor could trigger some action such as going to the database and fetching more information, such as all the pages that have been contributed to by this person.

Tree-maps: another interesting visual display of information

April 10th, 2008 by admin

From http://lifehacker.com/software/disk-space/geek-to-live–visualize-your-hard-drive-usage-219058.php

Harddisk treemap.png

Development Paused for the Week

April 7th, 2008 by Allie

So while looking at my calendar for the week I realized that this weekend is Collaborations! So instead of doing development I made the poster for Collaborations and had the opportunity to reflect on the different aspects of the project. More and more I’m surprised that there isn’t a large presence of research being done on the subject, especially by Google.

So here is the current poster, which is in need of editing before it goes to the printer.

Collaborations Poster

Interesting word chart

April 1st, 2008 by admin

http://www.neoformix.com/2008/ObamaClintonSpeechContrast.html

Interesting comparison of two speeches…

Obamaclinton.png

New version of exploding circles

April 1st, 2008 by admin

Just finished going over the code with Allie, and we got a nice smooth display.

Here’s the link

Ideas for what to implement for next week:

  1. Make the dimensions of the display constant and everything else depends on it. We might want to keep in mind that with some graphs, their might be so many circles that scaling might become important. Check the Processing documentation for how scaling can be done. (Scaling means that the window has a geometry of, say, 500 x 500, but that we are actually using a mathematical world that might be 1000×1000)
  2. Put all the properties of the circles in arrays. These arrays eventually will be filled by a query to the database. But right now we might want to have an array of strings which will be shown inside the circles, and an array of numbers defining the connectivity of the circle (wikipedia contributor) to the center square (wikipedia page). You might want to use this number to define the color of the circles, the size of the circles, or the width of the edges (or a combination of them).
  3. Look at ways to make the circles or the square clickable, so that clicking on one will cause the browser to present new information.
  4. Look at ways to show information on mouse-over events. If the mouse moves over a circle, it would be nice to have a box give more information about this circle.

Geometry is complicated…

March 31st, 2008 by Allie

Here are my three latest trials. I was able to implement the circles having text on them but for some reason the text would not show within the web browser. It has to do with how processing renders text and I’m not sure how to get around it.

4 Circles around a Square: Link 1

12 Circles around a Square: Link 2

24 Circles around a Square: Link 3

Some Graphviz Examples

March 30th, 2008 by admin

Just found this while looking for ways to represent the CS curriculum as a graph. I think our direction using Processing is good, and I don’t want to go back to Graphviz, but looking at ways people are using graphing packages to show relationships is interesting, no matter what package they use.

http://www.flickr.com/search/?q=graphviz&w=all&s=int

Graphviz flickr.jpg

To do for 4/1/08

March 25th, 2008 by admin

Good job on today’s applets.

For next week, here is what we are shooting for:

  1. square in the middle
  2. 10 circles distributed on a wheel around the center square. They may start all overlapping over the middle square, and quickly move out on the spokes of the wheel to settle at safe distances from the center square and from each other
  3. the circles have random dimensions
  4. the circles are connected to the center square by edges
  5. for speed, we may want to redraw the circles in gray before redrawing them in white to prevent filling the whole background every time.

Fun with Applets

March 24th, 2008 by Allie

So spring break is over, my flu isn’t yet, but I was still able to get all our goals accomplished! There are two examples of applets. The first is with static, overlapping circles and then the second is with moving circles. An unexpected side affect of moving the circles to avoid colliding is that the circles aren’t redrawn a new one is just drawn for every frame. I’m not sure how to fix that just yet. Also after reading the collision avoidance section I just decided to write all my own code. Since I’m currently taking about 120mg of sudafed there is something wrong with my algorithm that makes the circles lock in perpetual motion if they sit right on top of each other. It’s probably an easy fix though.

I would have gone ahead with drawing lines and labels but I wanted to learn how to un-draw the objects first before adding even more visual clutter to the window.

First trial with multiple circles: http://www.cs.smith.edu/~abellew/multipleCircles/

Second trial with moving circles: http://www.cs.smith.edu/~abellew/movingCircles/

To do for 3/25/08

March 11th, 2008 by admin

1) study if we can put two circles (ellipses) on a plan, overlapping, and have them move away from each other until they do not overlap. Investigate how much programming is involved, and whether the movement can be automatically controlled by Processing

2) Create a “daisy” diagram of a few nodes, with one node in the center of the star, and several nodes around. Each node should have a label, and the nodes are connected to the center node with links/edges of varying width.

3) Figure out how to generate an applet from a Processing program.

A Processing Program

March 11th, 2008 by admin
// ellipses on springs
int ellipses = 5;
float[]x = new float[ellipses];
float[]y = new float[ellipses];
float[]w = new float[ellipses];
float[]h = new float[ellipses];
float[]angle = new float[ellipses];
float[]frequency = new float[ellipses];
float[]amplitude = new float[ellipses];
float[]strokeWt = new float[ellipses];
float[]damping = new float[ellipses];
int springSegments = 24;
int springWidth = 8;

void setup() {
  size(600, 400);
  frameRate(30);
  smooth();
  fill(0);
  setSpring();
}

void draw() {
  background(255);
  for (int i=0; i<ellipses; i++) {
      createSpring(x[i], y[i], w[i], h[i], strokeWt[i]);
      noStroke();
      fill(0);
      // draw ellipses
      ellipse(x[i], y[i], 50, 50);
      // spring behavior
      y[i] = y[i]+cos(radians(angle[i]))*amplitude[i];
      angle[i]+=frequency[i];
      amplitude[i]*=damping[i];
   }
  // press the mouse to reset
  if (mousePressed) {
     setSpring();
  }
}

void setSpring() {
  for (int i=0; i<ellipses; i++) {
    // size approximates mass
    w[i] = random(20, 70);
    h[i] = w[i];
    // stroke weight approximates
    // spring strength (resistance)
    strokeWt[i] = random(1, 4);
    x[i] = ((width/(ellipses+1))*i)+width/(ellipses+1)-w[i]/2.0;
    y[i] = (w[i]*3)/strokeWt[i];
    angle[i] = 0;
    // spring speed
    frequency[i] = strokeWt[i]*4;
    // amplitude based on mass/spring strength
    amplitude[i] = (w[i]*1.5)/strokeWt[i];
    // calculate damping based on stroke weight
    // simulates resistance of spring thickness
    switch(round(strokeWt[i])) {
      case 1:
        damping[i] = .99;
        break;
      case 2:
        damping[i] = .98;
        break;
      case 3:
        damping[i] = .97;
        break;
      case 4:
        damping[i] = .96;
        break;
     }
   }
}

// plot spring
void createSpring(float x, float y, float w, float h, float strokeWt) {
   stroke(50);
   strokeWeight(strokeWt);
   for (int i=0; i<springSegments; i++) {
     // for spring end segment
     if (i==springSegments-1) {
        line(x+w/2+springWidth, (y/springSegments)*i, x+w/2, (y/springSegments)*(i+1));
     }
     else {
       // alternate spring bend left/right
       if (i%2==0) {
          line(x+w/2-springWidth, (y/springSegments)*i, x+w/2-springWidth, (y/springSegments)*(i+1));
       }
     }
  }
}

The resulting applet: [processing/index.html applet]

To do list for 3/11/08

March 4th, 2008 by admin

We are following the Processing path.

From Allie’s exploration of Processing, it seems that we can represent a star graph in 3-D with springs linking the outside nodes to the node at the center of the star, and use non-collision attributes of the nodes to make sure they do not overlap in space. It seems that using a “force field” around the nodes would force them to be some distance away from each other in a pleasing way.

The different ideas we discussed:

  • on a mouse-over event over a node, a box opens up with information about the node visited, and a link that can bring up a new page, or a new graph
  • we can use the mouse to ‘move’ the graph around and see what is “behind”
  • we could have a series of checkboxes, or text input boxes that would allow for interesting filtering of the data:
    • We can block all the contributors belonging to the same IP group together (all the contributors working at MS, for example), in one big node
    • we can color-tag all the contributors that have a particular status: working at a given company, having contributed in the last week/month, having contributed to other pages
  • We could also organize the nodes on some kind of geodesic space around the center node

Processing + PHP

February 25th, 2008 by Allie

So I did a search for “php” within the http://processing.org domain and received a number of interesting results, the most interesting being this forum post about a trick to using php requests for MySQL data.

forum post 1

This post is also interesting (notably Reply #6):

forum post 2

Processing is just the visual framework that will work for this project I think and since now I know it’s possible to connect it to MySQL through PHP we can move ahead! This is very exciting indeed and since it’s already being used in the department it’s a great tool to perpetuate.

Showing the time variation of various quantities

February 24th, 2008 by admin

Today’s NYT (2/24/08) shows an interesting graph of the money made by different movies in 2007. It’s an interesting way to show time-variation of several tens of quantities.

The graph is interactive, as the mouse is moved over the different movies, some information is displayed, as well as the length of their duration. http://www.nytimes.com/
Ebbflow.jpg

Interesting links related to Processing

February 22nd, 2008 by admin
Yahoo Burst
Similarity
Valence

(Note: I will keep adding more links as the time comes, so please keep checking this post often Icon smile.gif




Click here to see an applet in action


Check http://www.processing.org for more info and examples.

Something new to explore!

February 21st, 2008 by admin

Just attended a talk today by Ben Fry in the Art Department. Super stuff. Wish you had seen it, Allie.

Ben is the co-author of a language called “processing” (http://processing.org). He showed some very interesting animation and 3-D graphs that seem perfect for what we want to do. The graphics look pretty spectacular. I am not just sure how much coding is behind all the example.

I would like to change the “todo” list for this coming Tuesday and have you explore Processing instead. By the way, Processing is the language used in CSC106, taught by Eitan and Thomas, so several seniors including Jordan and Stephanie might be good people to brainstorm with…

Happy exploring!!!

To do for 2/26/08

February 19th, 2008 by admin
  1. Modify the current display page and remove the frames
  2. Explore how to get better interaction from the SVG file when we click on nodes
  3. Aim for our next goal: a form with a box at the top where we can enter keywords (Hillary Clinton, say), and a submit button. Clicking on the button triggers a php program that generates a star graph with Hillary Clinton in the middle and the 10, 20, N most active contributors all around. Somehow we need to convey the scale of the number of edits. Probably a scale on the side (which we won’t worry about making user-modifiable right now), and links of varying width or color (or both) linking the contributors to the center node. The number 10, 20 or N might also be numbers in a text-entry box of the form.
  4. Clicking on a contributor node will bring up a new graph with this contributor in the center of a star, and the top 10, 20, N pages it has contributed to all around. We’ll probably want to see the “Hillary Clinton” page as part of these 10, 20, N pages, even if its ranking does not place it in this group.
  5. Explore stored procedures under mySql 5.0… Also, do not hesitate to make mySql do the dirty work, i.e. counting the number of contributors:


select count( `contributorId` ) from `tablex` 
where `contributorId`=N and `pageId`=P

How to generate PNG and SVG images of graphs in Php using Neato or Dot

February 19th, 2008 by admin

This does generate PNG and SVG files for directed and undirected graphs.

First, you must login to tango.csc.smith.edu, as Graphviz is only installed on tango, and not on beowulf. Use your regular Linux login information to connect.

Next, cd to /var/www/html/abellew/ to make it your working directory, and create the two subdirectories specified below.

Create a subdirectory in your working directory called Image. Make sure it is world-readable. This directory will contain a Php file containing a php class that takes care of generating png images from various GraphViz commands.

Create another subdirectory called images that is world readable and writable. This is where the Php class will store the png and svg image files.

Copy the file GraphViz.php into the Image subdirectory. It’s the Php class we’re going to use. Make sure it is world-readable.

Create a test file called graphviztest.php in your working directory (of which Image and images are subdirectories). Make sure this file is world-readable as well.

Point your browser to this new address:

     http://tango.csc.smith.edu/abellew/graphviztest.php

and verify that you get a page with two pictures of the same graph. The top image is in png, the bottom one in svg. Verify that you can click on the nodes of the svg image and follow links (although they have the bad problem of opening up in the embedded frame, not the whole browser window… File:Icon sad.gif

Example Php program to access wikipedia history

February 19th, 2008 by admin

The following page contains Php code to retrieve pages id from the wikipedia history database, along with the contributors to a page with a given Id. (The page will take several seconds to load as the query is performing a search for keywords in the 11 million pages in the database).

Note that the current version retrieves only the contributors for 1 page, but that with little effort we can change the query to retrieve the contributors to a list of several pages Ids.

In order to run the program you must have a copy of the accessvars.php file, shown below:

<?php
//--------------------------------------------------------------
// MySql variables
//--------------------------------------------------------------
$params = array( 'host'     => "tango.csc.smith.edu",
       'database' => "enwikihistory2",
       'table'    => "pages",
       'user'     => "yourmysqlloginname",
       'passwd'   => "yourmysqlpassword" );
?>

An Interactive Page

February 19th, 2008 by Allie
SvgExample.svg

This week I was able to make a new page with the three specified frames showing the title of the wiki page in the top, the svg in the middle and potential contributor information in the bottom. I was able to give the nodes the appropriate links except that because of the frame structure the links target the frame in which they are clicked instead of a different frame or new window.

I thought about the different ways to create a scale and these were my thoughts:

  • There are two ways to go about the scale. The first way is including the scale in the svg and the other is to have the scale live in another part of the webpage.
  • If the scale should be part of the svg I’m not sure how to implement it using the DOT language.
  • If the scale should be in another part of the webpage then what kind of control should it be? Also, how is it going to communicate with the page creating/running the svg, via PHP, JavaScript or maybe something else?

So I didn’t complete the scale part of this week’s tasks because I bogged myself down with the greater end goals. Also in order to get the correct scale requires a complex SQL Select statement and some post-processing. I looked up some static information in the database and wrote the topmost frame to look some revision information similar to what will be needed for the scale. So while I didn’t get a scale, this is foundation for what will need to be decided/done in the future.

Also in order to keep as many versions of this project as might be helpful I am storing each week’s work in a separate folder denoted by month.day.year so this week’s new work (any pages/code that I edited) is stored here (click on “viz.html”): http://www.cs.smith.edu/~abellew/2.18.2008

Another network navigation site

February 17th, 2008 by admin

http://www.tinrocket.com/


Grabbed.jpg

Graphing the history of a wikipedia page

February 12th, 2008 by admin
Discover mag.jpg

Generated by Martin Wattenberg and
described in “Studying Cooperation and Conflict between Authors
with history flow Visualizations”, 2004 (link).

35 Great Visualizations

February 12th, 2008 by admin

Can be found here: abeautifulwww.com


To do for 2/19/08

February 12th, 2008 by admin
  1. Make SVG clickable with nodes pointing to other pages
  2. Put SVG in a system of 3 frames, with middle frame showing the svg graph, top slim frame showing a title, and bottom slim frame showing an image from the wikipage (static image for right now). When user clicks on a node, information about this node shows up in bottom frame
  3. Question: is SVG the only format that allows for interaction with user (clickable)?
  4. Investigate creating a scale to show what information is represented by the link. If using color, then need a color scale. If using line width as measure of # of contributions, then show a scale with 4 or 5 thicknesses and what # they represent. Use a linear map between # of contributions and thickness for right now.
  5. Watch Tamara Munzner’s video, and explore her web site and her group’s web site
  6. Semester-long project: look for turn-key software systems for displaying graphs.


Competition on Visual Network Dynamics

February 12th, 2008 by admin

Competition on visualizing network dynamics

2007, Queens, NY

Some interesting designs for representing large networks.


Must-watch video!

February 12th, 2008 by admin

Tamara Munzner of U. British Columbia presents a talk at Google titled 15 Views of a Node Link Graph: An Information Visualization Portfolio


http://video.google.com/videoplay?docid=-6229232330597040086 & q=type%3Agoogle+engEDU

It’s one-hour long, but worth it. It would be nice to see if some of the software she demonstrates for exploring graphs is available…

Tamara’s Web site and group’s
site have good information.


Progress on project!

February 11th, 2008 by Allie

So far I have created a hap-harzard couple of Python objects to parse a text file with data in it. I definitely spent more time this week on function rather than commenting but I am guessing most of my work so far will be changed/adjusted as the programming language changes (from Python to PHP) and as the complexity of my task increases. Here is a link to all the files that I worked on/created:

http://maven.smith.edu/~thiebaut/IS_blog/abellew/2.18.2008/


pluginspage=”http://www.adobe.com/svg/viewer/install/” />Html code to embed svg in html:


NeatoTrial.svg

<embed src="http://www.cs.smith.edu/~abellew/2.12.2008/neatoTrial.svg"
width="600" height="1000" type="image/svg+xml"
pluginspage="http://www.adobe.com/svg/viewer/install/" />

Graph Visualization Specifics

February 11th, 2008 by Allie

The graph visualizer in Silverlight, while supporting physics algorithms to display the graph in the most rigid fashion, we are not interested in rigidity but rather the best way to visually represent the information for extrapolation. Graphviz is the best option for accomplishing that goal.

For the actual graph.. in order to quickly and easily identify which Wikipedia users contributed most to an article I would like to use a color bar from blue (less contribution) to red (more contribution) on the line connecting the user to an article. Using a color scheme like this instead of varying degrees of line thickness is more intuitive for detecting levels of activity.

Neat graphical representation of activity in Wikipedia

February 11th, 2008 by admin
Windowslivewritervisualizingthepowerstruggleinwikipedia-f7c7wikivislowres74.jpg

Click here for full size image.
Very interesting and artistic way to depict activity in the wikipedia pages.
For more information, check Bruce Herr’s http://abeautifulwww.com/2007/05/20/visualizing-the-power-struggle-in-wikipedia/
or “Visualizing the ‘Power Struggle’ in Wikipedia”

A nicer web-2.0 type graph where the user can zoom in and out can be found here:

http://scimaps.org/maps/wikipedia/

Another nice image representing graphically the geography and activity by domain name


Cctld 1200.jpg

Interesting visualization packages

February 5th, 2008 by admin
  1. ManyEyes by IBM: Link
  2. Another interesting plot by ManyEyes

To do for 2/12/08

February 5th, 2008 by admin

Create a text file with the following entries

  1. title of the page (this will be in a circle in the middle of the graph)
  2. the link to the page (this will be called when we click on the circle)
  3. a collection of 20 triplets (3-line blocks)
  1. contributor name
  2. # of contributions to the current page
  3. link to the contributor (this will be a link to a php page which will get the Id of the contributor)

Generate from this a dot file

process the dot file to get an svg file

put the svg file on your web site

install the svg plugin for your browser

display the graph

Visual Representation Options

January 29th, 2008 by Allie

So here is a link to Microsoft Silverlight’s “Showcase” page where some Silverlight applications are available for demo. I don’t want to initially create a large web application but Silverlight graphics can be inserted inline with HTML code easily.

http://silverlight.net/showcase/default.aspx

On Borges and Wikipedia

January 29th, 2008 by admin

In 1940, Borges wrote:

Who, singular or plural, invented Tlön? The plural is, I suppose, inevitable, since the hypothesis of a single inventor — some infinite Leibniz working in obscurity and self-effacement — has been unanimously discarded. It is conjectured that this ‘brave new world’ is the work of a secret society of astronomers, biologists, engineers, metaphysicians, poets, chemists, algebrists, moralists, painters, geometers, … guided and directed by some shadowy man of genius. There are many men adept in those diverse disciplines, but few capable of imagination — fewer still capable of subordinating imagination to a rigorous and systematic plan. The plan is so vast that the contribution of each writer is infinitesimal.

Not too bad a description of Wikipedia!
More on this in a 01/06/08 NYT article
Borges and the Foreseeable Future
.

Animation of the history of a wikipedia page

January 29th, 2008 by admin

Here is a cool link to a page showing an animation of the life of a wikipedia page. This is done by Jon Udell.

They Rule & Wikipedia

January 28th, 2008 by admin

Here’s a way to get started with the idea.

First go to the site TheyRule.net and play with the system.

Theyrule2.png

Select “Load Map”/”Popular” and pick an entry. You will see a network of connections appearing. The network shows the people that belong to different boards of companies. As you move your mouse over some of the entries you are given a menu to search or delete the item. Also very neat, you can ckick on an item and move it around with the mouse while retaining the existing connections.

I would like you to develop a program (web based) that would take data from a mysql database and display the graph of the relations existing between the data. TheyRule shows how somebody decided to do it (and very nicely at that), but other options exist.

The data I have is a huge collection of the edits that have been done to the english wikipedia pages. I have a database with 3 tables, and you can access them by going to this URL: http://cs.smith.edu/~thiebaut/wikihistory.

Click on the pages table. It contains the title of ALL the wikipedia pages and their Id, as generated by wikipedia.

Click on the contributors table. It contains the list of all the contributors who have edited a page in wikipedia. 3 different pieces of information can define a contributor: a name, an Id, or an IP address. Unfortunately, wikipedia doesn’t force contributors to enter their name (in fact, some contributors are computer programs, which do not have names, so contributors may or may not have any information recorded for up to two of these fields.

Click on the revisions table. It contains all the revisions performed. Each revision (or edit) is identified by an Id number (which I generate when populating this table), a revisionId which is the Id the revision has in the wikipedia database, the Id of the page on which this revision was done (pageId), the Id of the contributor (contributorId) who made the edit (this is the same contributor field used in the contributors table), a comment indicating what the edit was about, a textlength field which indicates how many characters were in the edit, and finally the date the edit was done.

I would like to be able to have a web page where I could enter the Id of a few pages (for example the pages corresponding to the current presidential candidates), and have a webpage showing the graph of all the contributors to the different pages, and whether a given contributor contributed to more than just one page. Similarly to the way “TheyRule” works.

A nice collection of software tools that should probably be used is the Graphviz set. Go take a look at it. It generates SVG graphs, and the major browsers have plugins to visualize SVG graphs.

Welcome

January 28th, 2008 by admin

Welcome to DT’s Independent study blog.


This mediawiki page was generated by I love wiki, an HTML to wiki syntax converter that took the html version of the blog and translate it into wiki syntax