CSC220 Lab 6 MySQL -- 2010

From dftwiki3
Jump to: navigation, search

--D. Thiebaut 01:54, 20 October 2010 (UTC)




This lab deals with indexes, and how they affect operations on tables.

Create a New Table

  • Use PhpMyAdmin on hadoop110 or hadoop101 to create a new table. Call it lab6
    • Create a integer field, auto increment, primary index, called Id
    • Create a varchar field, with 40 characters, utf8-general-ci for collation, called Name



CSC220Lab6Table.png



  • Create the following php program in your main directory (no need for it to be in your public_html directory)
<?php

// fillTable.php
// D. Thiebaut
// fills a table with $N random strings
// The table name is $tableName
//

//----------------------------------------------------------
// Globals.  Enter your user information here
//----------------------------------------------------------
$user        ="220a-xx";   // EDIT THIS LINE!
$passwd      = "xxxx";       // EDIT THIS LINE TOO!
$host        ="xgridmac.dyndns.org";
$database    = "220a-xx";   // EDIT THIS LINE AS WELL!

$N = 500000;
$tableName = "lab6";


//----------------------------------------------------------
// create a random word of 15 characters
//----------------------------------------------------------
function randomWord() {
  $validchars = "abcdfghjkmnpqrstvwxyz";
  $word  = "";
  for ( $i=0; $i<15; $i++ ) 
     $word .= substr( $validchars, rand(0, strlen($validchars)-1), 1 );
  return $word;
}

//----------------------------------------------------------
// Main Program
//----------------------------------------------------------
//--- connect to DB ---
mysql_connect( $host, $user, $passwd ) or die( "Unable to connect to db\n\n" );
print "==> Connected to database\n";

//--- select database ---
$result = mysql_select_db( $database );

//--- Insert  $N items in table---
$magic = rand( 100, $N/3 );
for ( $i=0; $i < $N; $i++ ) {
  $qty  = rand( ); // random number
  $name = randomWord(); //. strval( rand( ) );
  if ( $i==$magic ) $temp = $name;
  if ( $i==$magic+10 ) $name = $temp;
  $query = "INSERT INTO  `$tableName` (`Name`)  VALUES ( '$name' )";
  $result = mysql_query( $query );

  //--- give feedback to the user ---
  ( $i % 1000 != 0 )  or print $i . "\n";
}

//--- done.  close connection ---
mysql_close();

?>

  • Edit the program to reflect your user name and password.
  • Run the program:
  php fillTable.php
  • verify that it outputs the number of items it is inserting in the table as it goes along. Depending on the time of day, and how many people are inserting in the database, this may take a minute or two.

Searching the Table

  • For each question below, generate a query that outputs the answer, and record each query's execution time. Note: if phpmyadmin does not report the execution time of a query, try prefixing it with set profiling=1; in the query window. For example:
         set profiling=1;
         select count(*) from `users`;
Question 1
What are the first ten names stored in the table, when ordered in ascending alphabetical order. How long does it take to get the answer?


Question 2
Display all the names that start with the letter b. Again, measure the execution time.


Question 3
How many different names start with the letter b


Question 4
Are all the names unique?



Building An Index

  • An index is normally defined at the time a table is created. You can also create an index after the fact, as we'll do now. Use PhpMyAdmin, select the lab6 table, and display its structure.
  • Locate the Create an Index box, and click Go


CSC220CreateAnIndex.png



  • Create an index named nameIndex, (not unique), on the field Name (no size specified, so that it uses the full size of Name).


CSC220CreateAnIndex2.png


  • Check that the structure of the table now reflects the new index.

Faster Queries?

  • Run the same four queries you generated for the four questions above, and measure their execution time. How much faster, on the average are your queries executing now?

Something New

  • Try the following query on your table
select  *, left( `Name`, 1 ) as `first` from `lab6`;


What is the left function computing?
  • Use this new knowledge to show how many different names start with 'a', with 'b', with 'c'... and with 'z'. In other words, generate the list of 26 records where one field is a letter of the alphabet, and the other the number of `Names` in `lab6` that start with that letter.

Challenge of the Day

  • At least two `Names` are indentical in the table `lab6`. What are the names, and with what `Id` are they associated?


Solutions

  • Note 1: My table only had 5000 entries
  • Note 2: you probably experienced long delays (2 seconds or so) for the SELECT COUNT DISTINCT query. This is apparently a well documented feature (read bug) of mysql. See for example http://bugs.mysql.com/bug.php?id=21849
Question 1
SELECT `Name` FROM `lab6` order by `Name` asc limit 10
Question 2
SELECT Name FROM `lab6` where left( `Name`, 1)='b'
Question 3
SELECT Count( Name ) FROM `lab6` where left( `Name`, 1)='b'
Question 4
Select count( Name ) from `lab6`
-- output = 5000
Select count( distinct Name ) from `lab6` 
-- output = 4999

-- so the answer is No!
Something New
SELECT left( `Name`,1), Count( Name ) FROM `lab6` group by left( `Name`, 1)

-- 
-- left( `Name`,1)	Count( Name )
-- a	254
-- b	246
-- c	235
-- d	250
-- f	243
-- g	229
-- h	247
-- j	218
-- k	252
-- m	232
-- n	221
-- p	230
-- q	222
-- r	235
-- s	237
-- t	266
-- v	233
-- w	236
-- x	246
-- y	228
-- z	240
Challenge of the Day
select t1.Id, t1.Name 
       from lab6 as t1 join lab6 as t2 
       on t1.name=t2.name and t1.Id!=t2.Id 

--			700	gmzjhwxfbkfygmj
--			710	gmzjhwxfbkfygmj
--
-- The php program that loads up the data in the table 
-- picks a name at random and inserts it twice in the table, at indexes
-- that are 10 apart.   So while it is impossible to predict what two 
-- items will be replicated in your table, you can be sure that the difference
-- between their indexes will be 10...