Difference between revisions of "CSC231 Final Exam Solutoins 2012"

From dftwiki3
Jump to: navigation, search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
--[[User:Thiebaut|D. Thiebaut]] 14:37, 20 December 2012 (EST)
 
--[[User:Thiebaut|D. Thiebaut]] 14:37, 20 December 2012 (EST)
 
----
 
----
<onlysmith>
+
<onlydft>
  
 
This is just a sketch of the solution for the exam, but enough to give you an idea of what you had to do.
 
This is just a sketch of the solution for the exam, but enough to give you an idea of what you had to do.
Line 69: Line 69:
 
  ratio = 1610 to 817 = 50.745342%
 
  ratio = 1610 to 817 = 50.745342%
  
For the bound on the compression ratio. i.e. the best possible compression, that would be 255 identical emoticons.  Let's see what that would give if the emoticons are n character long, excluding ''('' and '')'':
+
For the bound on the compression ratio. i.e. the best possible compression, see what happens with a string of 1 icon: (*), compressed as 1 1 *, or 3 bytes.  So there's no compression since we go from a 3-byte icon to 3 bytes of compressed storage.  Note that this is also true of icons of the form (xx), since they'd get compressed to 1 2 xx which is also 4 bytes.
 +
For a string of 2 icons, we get (*)(*), which compresses to 2 1 *, so 4 uncompressed bytes to 3 compressed bytes.  A little better.  So the more repeated icons we have, the better the ratio.  The most repetition we can compress is 255 icons, since that's the largest positive unsigned number we can store in a byte.
 +
 
 +
So the best compressed ratio would be 255 identical emoticons.  Let's see what that would give if the emoticons are n character long, excluding ''('' and '')'':
  
 
* 255 times (''n''+2) characters, as in (xxx):  255 * (''n''+2)
 
* 255 times (''n''+2) characters, as in (xxx):  255 * (''n''+2)
Line 509: Line 512:
  
 
Nobody saw that you could use a look-up table for this question!  Would have made your program so much simpler!!!
 
Nobody saw that you could use a look-up table for this question!  Would have made your program so much simpler!!!
 +
 +
table      equ    $
 +
            db    'A','A','A','A', 0x00
 +
            db    'A','A','A','C', 0x01
 +
            ''etc...''
 +
 +
You look at all patterns of 4 'A', 'C', 'G', and 'T' characters and figure out their compressed hexadecimal equivalent (a quick Python hack can give you that), and you will get 256 patterns.  Sort them in the right order, and your program simply takes groups of 4 characters from the string, puts them in eax, scans the table (every 5 bytes) for 32-bit values equal to eax, and when found, takes the 5th byte and prints it in hex. 
 +
 +
To make sure the program deals correctly with strings whose length is not a multiple of 4 bytes, then always add "AAA" at the end of the DNA string and that should do the trick!
  
 
But the great majority got at least 90% of the binary pattern to print out... Good!
 
But the great majority got at least 90% of the binary pattern to print out... Good!
</onlysmith>
+
</onlydft>
  
 
<br />
 
<br />

Latest revision as of 07:14, 11 December 2014

--D. Thiebaut 14:37, 20 December 2012 (EST)



...