CSC220 Homework 5 Solution 2010
--D. Thiebaut 21:53, 24 October 2010 (UTC)
Problem #1
Questions 1-9
- Main answers from Alex, with contributions from Aigerim, Katie and Millie.
-- hw5b.sql
-- Alex Cheng (220a-ag)
-- 10/20/10
-- Answers to Homework 5, Part A
-- Question 1: returns all names and ids of records where name contains "processor"
SELECT * FROM `Names` WHERE `Name` LIKE "%processor%";
-- Question 2: returns all ids and names of records that contains "compute"
SELECT * FROM `Names` WHERE `Name` LIKE "%compute%" ORDER BY `Name`;
-- Question 3: same as #2 but returns last 10 entries of alphabetically sorted list
SELECT * FROM `Names` WHERE `Name` LIKE "%compute%" ORDER BY `Name` LIMIT 10;
-- Question 4: same as #2 but returns last 10 entries of alphabetically sorted list
SELECT * FROM `Names` WHERE `Name` LIKE "%compute%" ORDER BY `Name` DESC LIMIT 10;
-- Question 5: returns total number of records in table Names
SELECT COUNT(*) FROM `Names`;
-- Question 6: returns total number of records in table Connections2 whose Id1 field is 470
SELECT COUNT(*) FROM `Connections2` WHERE `Id1`="470";
-- Question 7: returns ids of all entries that are connected to entry ASCII (this solution from Aigerim)
SELECT DISTINCT `Id2` FROM `Connections2` WHERE `Id1` IN (
`Id` from `Names` WHERE `Name`='ASCII' )
-- Question 8: returns names of entries associated with ids from #7 (this solution from Katie + Millie)
SELECT DISTINCT `Name` FROM `Names` WHERE `Id` in
(SELECT `Id2` FROM `Connections2` WHERE `Id1` in
(SELECT `Id` FROM `Names` WHERE `Name`="ASCII"));
-- Question 9: same as #8 but returns last 5 entries when list is alpha ordered
SELECT `Name` FROM `Names` WHERE `Id` in (SELECT `Id2` FROM `Connections2`
WHERE `Id1` in
(SELECT `Id` FROM `Names` WHERE `Name`="ASCII"))
ORDER BY `Name` DESC LIMIT 5;
Question 10
Option 1
Here's an ingenious solution by Betsy. It shows some good research, but it is not full-proof, as by some strange fluke, there could be duplicate entries that are such that the sums might be the same, while the graph is still directed. But it's a very nice example of the use of variables... Nice job!
-- Question 10
-- Generates a query that returns @a = the sum of the column Id1, @b = the sum of the
-- column Id2 and @d = the difference of their sums.@d tells us whether Connections2
-- is direct or not. If Connections2 is direct then Id1 and Id2 should have exactly the
-- same entries. So if we find the sum of the entries of each column, they should be the
-- same number. We can verify this by taking their difference.
-- Thus if @d == 0, Connections2 is direct. Otherwise, it is not.
SET @a := (SELECT SUM(`Id1`) FROM `Connections2`),
@b := (SELECT SUM(`Id2`) FROM `Connections2`);
SELECT @a, @b, @d := @a-@b;
Option 2
- To answer this question it is easier to work with a smaller graph, figure out the queries, and then apply them to the Connections2 table. Here I am using a different database, and where I create a new play table also called Connections2. It will simplify applying the queries I develop here to the original Connections2 table.
- Here's the graph I'm using:
(1) <-----> (2) <-----> (3)
|
|
V
(4) -----> (5)
-- Notice that the graph is '''directed''', as there is an edge from 2 to 4,
-- and from 4 to 5, but no edges from 4 to 2 or from 5 to 4.
-- Let's see how we can figure out the directed property of this graph.
-- There are probably some more sophisticated and or simpler ways to perform this, but this works! :-)
CREATE TABLE IF NOT EXISTS `Connections2` (
`id1` int(1) NOT NULL,
`id2` int(1) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `Connections2` (`id1`, `id2`) VALUES
(1, 2),
(2, 1),
(2, 3),
(3, 2),
(2, 4),
(4, 5),
(2, 1),
(2, 3);
-- Notice that I have created '''duplicate''' edges (the last two),
-- to emulate the situation in the homework, where the original Connections2 table
-- also contains duplicates.
-- let's figure out how many rows we have:
SELECT count( * ) FROM `Connections2`;
-- output = 8
-- how many of these are unique?
SELECT count( distinct Id1, Id2) FROM `Connections2`;
-- output = 6 , so we have 6 unique edges in our graph
-- if this were an odd number we'd know for sure that the
-- graph is directed, as it would mean we'd have one edge
-- that does not have an inverse. Here we could have two
-- such edges, in which case we still don't know
-- Let's create a temporary table that is like connections2
-- but without duplicates
CREATE TABLE tempConn SELECT DISTINCT * FROM Connections2;
-- contents is
-- 1 2
-- 2 1
-- 2 3
-- 3 2
-- 2 4
-- 4 5
-- create another version of tempConn, i.e.
CREATE TABLE tempConn2 SELECT * FROM tempConn;
-- contents is
-- 1 2
-- 2 1
-- 2 3
-- 3 2
-- 2 4
-- 4 5
-- Join the two tables on rows where Id1, Id2 of one table is equal
-- to the Id2, Id1 of the other. In other words, join together the
-- edges that are going from Node A to Node B, and from Node B to Node A
select tempConn.Id1, tempConn.Id2, tempConn2.Id1 as Id1b, tempConn2.Id2 as Id2b
from tempConn join tempConn2
on tempConn.Id2 = tempConn2.Id1 and tempConn.Id1= tempConn2.Id2;
-- output
-- 2 1 1 2
-- 1 2 2 1
-- 3 2 2 3
-- 2 3 3 2
-- Now do the same thing with an outer join: If there are edges that
-- do not have reciprocating edges, then NULLs will show up in the table
-- Let's try a left join:
select tempConn.Id1, tempConn.Id2, tempConn2.Id1 as Id1b, tempConn2.Id2 as Id2b
from tempConn left join tempConn2
on tempConn.Id2 = tempConn2.Id1 and tempConn.Id1= tempConn2.Id2;
-- output
-- 2 1 1 2
-- 1 2 2 1
-- 3 2 2 3
-- 2 3 3 2
-- 2 4 NULL NULL
-- 4 5 NULL NULL
-- Ah! There are some edges that do not have matching reciprocal edges.
-- The graph IS therefore a directed graph!
-- We can delete the extra tables
DROP TABLE tempConn;
DROP TABLE tempConn2;