Difference between revisions of "CSC231 Final Exam Solutoins 2012"

From dftwiki3
Jump to: navigation, search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
--[[User:Thiebaut|D. Thiebaut]] 14:37, 20 December 2012 (EST)
 
--[[User:Thiebaut|D. Thiebaut]] 14:37, 20 December 2012 (EST)
 
----
 
----
<onlysmith>
+
<onlydft>
  
 
This is just a sketch of the solution for the exam, but enough to give you an idea of what you had to do.
 
This is just a sketch of the solution for the exam, but enough to give you an idea of what you had to do.
Line 69: Line 69:
 
  ratio = 1610 to 817 = 50.745342%
 
  ratio = 1610 to 817 = 50.745342%
  
For the bound on the compression ratio. i.e. the best possible compression, that would be 255 identical emoticons.  Let's see what that would give if the emoticons are n character long, excluding ''('' and '')'':
+
For the bound on the compression ratio. i.e. the best possible compression, see what happens with a string of 1 icon: (*), compressed as 1 1 *, or 3 bytes.  So there's no compression since we go from a 3-byte icon to 3 bytes of compressed storage.  Note that this is also true of icons of the form (xx), since they'd get compressed to 1 2 xx which is also 4 bytes.
 +
For a string of 2 icons, we get (*)(*), which compresses to 2 1 *, so 4 uncompressed bytes to 3 compressed bytes.  A little better.  So the more repeated icons we have, the better the ratio.  The most repetition we can compress is 255 icons, since that's the largest positive unsigned number we can store in a byte.
 +
 
 +
So the best compressed ratio would be 255 identical emoticons.  Let's see what that would give if the emoticons are n character long, excluding ''('' and '')'':
  
 
* 255 times (''n''+2) characters, as in (xxx):  255 * (''n''+2)
 
* 255 times (''n''+2) characters, as in (xxx):  255 * (''n''+2)
Line 118: Line 121:
 
;; /////////////////////////////////////
 
;; /////////////////////////////////////
  
;; I wrote a python program to count the number of characters in the output: the answer I got was 1606 (I deleted the extra spaces between
+
;;;;;; (note from DT: see answers above code...)
; the dance emoticons on the 10th line. I wrote another program to count the number of bytes in the data section. For this input, I got 818
 
; bytes of memory storage. Storing 1606 characters would require 1606 bytes, but using this compression algorithm the data section of this
 
; program only requiers 808 bytes.  Thus, for this input, the compression factor is 818 / 1616, about %50.
 
; In other words, the data is stored in half the space that would be required to include the full ouput.
 
;
 
; The largest output we can store in 3 bytes is a one character emoticon printed 254 times. This data would require
 
; 3 bytes to store: 254,3,'d'. Thus, the lowest possible value is 3 / (254  * 3) = 0.0039.
 
; Regardless of how many lines of data there are, if they all adhere to these numbers the total compression ratio will always be 0.0039.
 
; I believe that the length of the string is irrelevent. If say, the emoticon text (i.e. 'beer' or 'dance') is very long, say 254
 
; characters, then we have the equation (2 + 252) / (254 * (252 + 2)) = .00393 (about the same). The numerator must increase because we have to spend
 
; 254 bytes to store the 254 character emoticon. So I believe that .0039 is the best compression ratio we can get with this algorithm.
 
; The number 254 is used because bytes can only store the numbers 0-254. The length of string is limited because the length must be stored
 
; in the second byte of our encoding.
 
 
 
  
  
Line 506: Line 495:
 
<br />
 
<br />
  
 +
=Problem 2=
 +
==Question 1==
 +
The problem is that FP numbers have to be denormalized before being added, and the smallest number in magnitude must have its mantissa shifted right until its exponent matches the exponent of the largest number, in magnitude.
 +
 +
It is possible that the result is equal to the largest number if the smallest number is shifted so much that it can't affect the least significant bit of the mantissa of the larger number.
 +
 +
==Question 2==
 +
The rule should be to sort the numbers in increasing order, and start adding with the smallest numbers first.  This can be seen as the better solution if the list of numbers is  X Y Y Y Y Y Y, where the number Y is smaller than the least significant bit of the mantissa of X.  In this case X + Y will be equal to X, and so adding X to any large sequence of Ys will yield X.
 +
 +
If, however, adding Y to itself many times generates a sum that becomes larger than the least significant bit of the mantissa of X, then adding all the Ys first then adding that to X will yield a more "correct" result.
 +
 +
==Question 3==
 +
Check the exponents of the numbers and see if the difference in exponents between the largest and the smallest exponent is larger than the number of bits in the mantissa.  If not, then add up, otherwise sort them first.
 +
 +
=Problem 3=
 +
 +
Nobody saw that you could use a look-up table for this question!  Would have made your program so much simpler!!!
 +
 +
table      equ    $
 +
            db    'A','A','A','A', 0x00
 +
            db    'A','A','A','C', 0x01
 +
            ''etc...''
 +
 +
You look at all patterns of 4 'A', 'C', 'G', and 'T' characters and figure out their compressed hexadecimal equivalent (a quick Python hack can give you that), and you will get 256 patterns.  Sort them in the right order, and your program simply takes groups of 4 characters from the string, puts them in eax, scans the table (every 5 bytes) for 32-bit values equal to eax, and when found, takes the 5th byte and prints it in hex. 
 +
 +
To make sure the program deals correctly with strings whose length is not a multiple of 4 bytes, then always add "AAA" at the end of the DNA string and that should do the trick!
  
</onlysmith>
+
But the great majority got at least 90% of the binary pattern to print out... Good!
 +
</onlydft>
  
 
<br />
 
<br />

Latest revision as of 07:14, 11 December 2014

--D. Thiebaut 14:37, 20 December 2012 (EST)



...