Difference between revisions of "CSC231 Analysis of the Towers of Hanoi"
(→Instruction Count) |
(→Instruction Count) |
||
Line 346: | Line 346: | ||
::''MoveDisks is called to move 4 disks from A to B, using C as an extra peg. To do so it calls itself once to move 3 disks from A to C, to put them "out of the way", and uses B as a temporary peg to do so. Then it moves 1 disk from A to B. Then it has to move the 3 disks that were on C back to B where the belong, and it uses A as an extra peg in the process.'' | ::''MoveDisks is called to move 4 disks from A to B, using C as an extra peg. To do so it calls itself once to move 3 disks from A to C, to put them "out of the way", and uses B as a temporary peg to do so. Then it moves 1 disk from A to B. Then it has to move the 3 disks that were on C back to B where the belong, and it uses A as an extra peg in the process.'' | ||
− | == | + | ==Estimating the Number of Instructions Executed== |
Let's figure out how many instructions are executed by the different parts: | Let's figure out how many instructions are executed by the different parts: | ||
Line 417: | Line 417: | ||
+ | ==Processor Approximation== | ||
+ | We are now going to make an approximation about the Pentium, that it executes '''1 instruction every cycle'''. Another performance measured by computer architects is CPI, or ''cycle per instruction'', which in this case would also be equal to 1. | ||
+ | Such an assumption is pretty safe with the current modern Pentium processors. | ||
+ | |||
+ | ==Speed of our Machine== | ||
+ | |||
+ | To find the speed at which the processor on our machine (beowulf) is operating, we ask Linux: | ||
+ | |||
+ | '''cat /proc/cpuinfo ''' | ||
+ | processor : 0 | ||
+ | vendor_id : AuthenticAMD | ||
+ | cpu family : 15 | ||
+ | model : 65 | ||
+ | model name : Dual-Core AMD Opteron(tm) Processor 2222 SE | ||
+ | stepping : 3 | ||
+ | microcode : 0x6d | ||
+ | cpu MHz : '''2992.478''' | ||
+ | cache size : 1024 KB | ||
+ | physical id : 0 | ||
+ | ... | ||
+ | |||
+ | Beowulf's processor operates at 2.99 GHz. Its cycle time is thus 1/2.99E9 seconds. | ||
<br /> | <br /> | ||
+ | |||
+ | ==Modeling with Excel== | ||
+ | |||
+ | We now have all the data to predict the approximate execution time of our program for different values of ''N''. We have created an eXcel table with the equation for the number of instructions as a function of N and multiplied by the cycle time: | ||
+ | |||
+ | <center>[[Image:HanoiPredictingExecutionTime.png|500px]]</center> | ||
<br /> | <br /> |
Revision as of 21:41, 25 November 2012
--D. Thiebaut 16:24, 25 November 2012 (EST)
Contents
A High-Level Version in Python
# hanoi.py (Python V 2.7)
# D. Thiebaut
# implements the game of hanoi in python.
# uses recursion.
#
# to run, type
# python hanoi.py
# at the command line
def moveDisk( source, dest, extra, n ):
if n==1:
print "move disk from %s to %s" % ( source, dest )
return
# more than 1 disk...
moveDisk( source, extra, dest, n-1)
print "move disk from %s to %s" % ( source, dest )
moveDisk( extra, dest, source, n-1)
def main():
while True:
n = input( "# of disks? " )
moveDisk( "A", "B", "C", n )
main()
Top Down Approach: Version 1
Source
In this version all we do is simply set a main program that calls the function moveDisks and passes it
- the number of disk to move (in al)
- the source peg (in bl)
- the destination peg (in cl)
- and the extra or temporary peg (in dl)
The moveDisks function prints a single move, testing the printMove function that prints "X Y" on a line by itself, indicating that a disk must be move from X to Y.
;;; hanoi1.asm
;;; D.T.
;;; Towers of Hanoi, top down...
;;; assumptions:
;;; al = n Number of disk
;;; bl = Source character identifying the source
;;; cl = Dest character identifying the destination
;;; dl = Temp character identifying the extra peg
section .data
moveStr db "X --> X", 10
moveStrLen equ $-moveStr
source equ moveStr + 0
dest equ moveStr + 6
section .text
global _start
;;; ------------------------------------------------------------
;;; M A I N
;;; ------------------------------------------------------------
_start: mov al, 4 ; move 4 disks
mov bl, 'A' ; from A
mov cl, 'B' ; to B
mov dl, 'C' ; using C as extra
call moveDisks
;;; return to OS
mov eax, 1
mov ebx, 0
int 0x80
;;; ------------------------------------------------------------
;;; moveDisks:
;;; assumptions:
;;; al = n Number of disk
;;; bl = Source character identifying the source
;;; cl = Dest character identifying the destination
;;; dl = Temp character identifying the extra peg
;;; Does not return anything
;;; does not modify registers
;;; ------------------------------------------------------------
moveDisks: push ax
push bx
push cx
push dx
mov byte[source], bl
mov byte[dest], cl
call printMove
pop dx
pop cx
pop bx
pop ax
ret
;;; ------------------------------------------------------------
;;; printMove
;;; Prints the string moveStr on the screen
;;; Does not alter any register
;;; ------------------------------------------------------------
printMove: pushad
mov eax, 4
mov ebx, 1
mov ecx, moveStr
mov edx, moveStrLen
int 0x80
popad
ret
Output
[231a@beowulf ~/handout]$ nasmld hanoi1 A --> B
Top Down Version 2
In this version we fully flesh out the recursive function. It would be hard to go half way with the implementation of moveDisks.
Source
;;; hanoi2.asm
;;; D.T.
;;; Towers of Hanoi, top down...
;;; assumptions:
;;; al = n Number of disk
;;; bl = Source character identifying the source
;;; cl = Dest character identifying the destination
;;; dl = Temp character identifying the extra peg
%define N 6 ; number of disks to move
section .data
moveStr db "X X", 10
moveStrLen equ $-moveStr
source equ moveStr + 0
dest equ moveStr + 2
section .text
global _start
;;; ------------------------------------------------------------
;;; M A I N
;;; ------------------------------------------------------------
_start: mov al, N ; move 4 disks
mov bl, 'A' ; from A
mov cl, 'B' ; to B
mov dl, 'C' ; using C as extra
call moveDisks
;;; return to OS
mov eax, 1
mov ebx, 0
int 0x80
;;; ------------------------------------------------------------
;;; moveDisks:
;;; assumptions:
;;; al = n Number of disk
;;; bl = Source character identifying the source
;;; cl = Dest character identifying the destination
;;; dl = Temp character identifying the extra peg
;;; Does not return anything
;;; does not modify registers
;;; ------------------------------------------------------------
moveDisks: push ax
push bx
push cx
push dx
cmp al, 1
ja .recurse
;;; only 1 to move: move from source to dest!
mov byte[source], bl
mov byte[dest], cl
call printMove
jmp .return
.recurse:
;;; move n-1 disk from source to temp
dec al ; al <-- n-1
xchg cl, dl ; swap dest and temp
call moveDisks
xchg cl, dl ; swap back dest and temp
;;; move 1 from source to dest
push ax
mov al, 1
mov byte[source], bl
mov byte[dest], cl
call printMove
pop ax
;;; move n-1 disk from temp to dest
xchg bl, dl ; make temp the source
call moveDisks
.return: pop dx
pop cx
pop bx
pop ax
ret
;;; ------------------------------------------------------------
;;; printMove
;;; Prints the string moveStr on the screen
;;; Does not alter any of the registers
;;; ------------------------------------------------------------
printMove: pushad
mov eax, 4
mov ebx, 1
mov ecx, moveStr
mov edx, moveStrLen
;;; int 0x80
popad
ret
Note here that we are using a macro, N, defined by the %define statement. This method of defining a constant is interesting as it allows us to modify the constant at assembly time.
Here's how.
- comment out the %define statement in the program (this will result in an undefined variable if we assemble as usual)
- when assembling the program, provide a value for the macro N as follows:
nasm -f elf -F stabs -dN=4 hanoi2.asm ; ld -melf_i386 -o hanoi2 hanoi2.o ; ./hanoi2
- The result will be that your program will be assembled just as if you had included the line
%define N 4
- inside your program.
The section below shows how to make the program run for several values of N:
Output
[231a@beowulf ~/handout]$ nasm -f elf -dN=2 -F stabs hanoi2.asm ; ld -melf_i386 -o hanoi2 hanoi2.o ; ./hanoi2
A C
A B
C B
[231a@beowulf ~/handout]$ nasm -f elf -dN=3 -F stabs hanoi2.asm ; ld -melf_i386 -o hanoi2 hanoi2.o ; ./hanoi2
A B
A C
B C
A B
C A
C B
A B
[231a@beowulf ~/handout]$ nasm -f elf -dN=4 -F stabs hanoi2.asm ; ld -melf_i386 -o hanoi2 hanoi2.o ; ./hanoi2
A C
A B
C B
A C
B A
B C
A C
A B
C B
C A
B A
C B
A C
A B
C B
Trick
If we want to count the number of moves for a given number of disk, we can pipe the output of the ./hanoi2 program in the command wc, which stands for word-count. wc outputs the number of lines, number of words, and number of characters in the text given to it.
Below are the same commands as above, but filtered by wc:
[231a@beowulf ~/handout]$ nasm -f elf -dN=2 -F stabs hanoi2.asm ; ld -melf_i386 -o hanoi2 hanoi2.o ; ./hanoi2 | wc
3 6 12
[231a@beowulf ~/handout]$ nasm -f elf -dN=3 -F stabs hanoi2.asm ; ld -melf_i386 -o hanoi2 hanoi2.o ; ./hanoi2 | wc
7 14 28
[231a@beowulf ~/handout]$ nasm -f elf -dN=4 -F stabs hanoi2.asm ; ld -melf_i386 -o hanoi2 hanoi2.o ; ./hanoi2 | wc
15 30 60
We see that the number of moves for 2 disks is 3, for 3 disks it's 7, for 4 disks it's 15, for n disks, it's 2n-1.
Estimating the Execution time
Call Tree of MoveDisks( 4, 'A', 'B', 'C' )
N, Source, Dest, Extra
4,A,B,C
|
+---------------------------+-------------------------------+
| | |
3,A,C,B 1,A,B,C 3,C,B,A
| |
+-------------+------------+ +------------+-------------+
| | | | | |
2,A,B,C 1,A,C,B 2,B,C,A 2,C,A,B 1,C,B,A 2,A,B,C
| | | |
+--------+--------+ +--------+--------+ +--------+--------+ +--------+--------+
| | | | | | | | | | | |
1,A,C,B 1,A,B,C 1,C,B,A 1,B,A,C 1,B,C,A 1,A,C,B 1,C,B,A 1,C,A,B 1,B,A,C 1,A,C,B 1,A,B,C 1,C,B,A
| | | | | | | |
1,A,C,B 1,C,B,A 1,B,A,C 1,A,C,B 1,C,B,A 1,B,A,C 1,A,C,B 1,C,B,A
Let's look at the top two levels of the tree:
4,A,B,C
|
+---------------------------+-------------------------------+
| | |
3,A,C,B 1,A,B,C 3,C,B,A
| |
We read it as follows.
- MoveDisks is called to move 4 disks from A to B, using C as an extra peg. To do so it calls itself once to move 3 disks from A to C, to put them "out of the way", and uses B as a temporary peg to do so. Then it moves 1 disk from A to B. Then it has to move the 3 disks that were on C back to B where the belong, and it uses A as an extra peg in the process.
Estimating the Number of Instructions Executed
Let's figure out how many instructions are executed by the different parts:
Tree part | Number of instructions |
---|---|
4,A,B,C |
Start function: 4 pushes + 1 cmp + 1 jump = 6 |
3,A,C,B |
Start recursion: 1 dec, 2 xchng, 1 call = 4 |
1,A,B,C |
Print 1 move: 6 instructions + printMove code = 6 + 7 + int 80 |
3,C,B,A |
Start 2nd recursion: 1 xchang, 1 call, 4 pops, 1 ret = 7 |
We can see that we can figure out the number of instructions accurately except for the int 0x80 instruction which is a total unknown. We have no idea how many instructions Linux is executing to print the string on the screen. So let's just comment out this instruction and see what we get.
N, Source, Dest, Extra 4,A,B,C |6 +---------------------------+-------------------------------+ |4 |13 |7 3,A,C,B 1,A,B,C 3,C,B,A |6 |6 +-------------+------------+ +------------+-------------+ |4 |13 |7 |4 |13 |7 2,A,B,C 1,A,C,B 2,B,C,A 2,C,A,B 1,C,B,A 2,A,B,C |6 |6 |6 |6 +--------+--------+ +--------+--------+ +--------+--------+ +--------+--------+ |4 |13 |7 |4 |13 |7 |4 |13 |7 |4 |13 |7 1,A,C,B 1,A,B,C 1,C,B,A 1,B,A,C 1,B,C,A 1,A,C,B 1,C,B,A 1,C,A,B 1,B,A,C 1,A,C,B 1,A,B,C 1,C,B,A |6 |6 |6 |6 |6 |6 |6 |6 1,A,C,B 1,C,B,A 1,B,A,C 1,A,C,B 1,C,B,A 1,B,A,C 1,A,C,B 1,C,B,A 9+7 9+7 9+7 9+7 9+7 9+7 9+7 9+7
Which simplifies to this:
+--------------( )--------------+ (6+4+13+7)x1 | | | +------( )------+ +------( )------+ (6+4+13+7)x2 | | | | | | +--( )--+ +--( )--+ +--( )--+ +--( )--+ (6+4+13+7)x4 | | | | | | | | | | | | ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (6+9+7)x8 | | | | | | | |
The total number of instructions is approximately (6+4+13+7) x ( 1 + 2 + 4 + 8 ) = (6+4+13+7) x 15 = (30) x ( 24 - 1 )
The tree above is for N=4 disks. The expression above simplifies to 30 x ( 2N - 1 ) ≈ 30 x 2N.
Processor Approximation
We are now going to make an approximation about the Pentium, that it executes 1 instruction every cycle. Another performance measured by computer architects is CPI, or cycle per instruction, which in this case would also be equal to 1. Such an assumption is pretty safe with the current modern Pentium processors.
Speed of our Machine
To find the speed at which the processor on our machine (beowulf) is operating, we ask Linux:
cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2222 SE stepping : 3 microcode : 0x6d cpu MHz : 2992.478 cache size : 1024 KB physical id : 0 ...
Beowulf's processor operates at 2.99 GHz. Its cycle time is thus 1/2.99E9 seconds.
Modeling with Excel
We now have all the data to predict the approximate execution time of our program for different values of N. We have created an eXcel table with the equation for the number of instructions as a function of N and multiplied by the cycle time: