Python is an object-oriented,
interpreted language. Internally to the Python interpreter, a
Python program is converted to bytecode and interpreted using a
virtual machine. Most modern programming languages have support for
high-level abstractions while the instructions of a virtual machine
are closer to the machine language instructions supported by
hardware architectures, making the interpretation of bytecode
easier than interpretation of the original source program. The
advantage of virtual machine implementations results from dividing
the mapping from high-level abstractions to low-level machine
instructions into two parts: high-level abstractions to bytecode
and bytecode to machine instructions.
While bytecode is a higher level
abstraction than machine language, it is not greatly so. As
programmers, if we understand how the underlying machine executes
our programs, we better equip ourselves to make good choices about
how we program. Just as importantly, having an understanding of how
programs are executed can help us diagnose problems when things go
wrong.
This chapter introduces assembly
language programming in the bytecode language of the Python virtual
machine. The Python virtual machine is an internal component of the
Python interpreter and is not available to use directly. Instead, a
bytecode interpreter called JCoCo has been developed that mimics a
subset of the behavior of the Python 3.2 virtual machine. Instead
of writing bytecode files directly, JCoCo supports a Python virtual
machine assembly language.
While learning assembly language, we’ll
limit ourselves to a subset of Python. JCoCo supports boolean
values, integers, strings, floats, tuples, lists, and dictionaries.
It supports class and function definitions and function calls. It
also supports most of the instructions of the Python virtual
machine including support for conditional execution, iteration, and
exception handling. It does not support importing modules or module
level code. JCoCo differs from Python by requiring a main function
where execution of a JCoCo assembled program begins.
To run an assembly language program it
must first be assembled, then it can be executed. The JCoCo virtual
machine includes the assembler so assembly isn’t a separate step.
An assembly language programmer writes a program in the JCoCo
assembly language format, providing it to JCoCo, which then
assembles and interprets the program.
The main difference between JCoCo
assembly language and bytecode is the presence of labels in the
assembly language format. Labels are the targets of instructions
that change the normal sequence of execution of instructions.
Instructions like branch and jump instructions are much easier to
decipher if it says “jump to loop1” rather than “jump to address
63”. Of course, bytecode instructions are encoded as numbers
themselves, so the assembler translates “jump to loop1” to
something like “48 63” which of course would require a manual to
decipher.
Learning to program in assembly isn’t
all that hard once you learn how constructs like while loops, for
loops, if-then statements, function definitions, and function calls
are implemented in assembly language. String and list manipulation
is another skill that helps if you have examples to follow. A
disassembler is a tool that
will take a machine language program and produce an assembly
language version of it. Python includes a module called
dis that includes a
disassembler. When you write a Python program it is parsed and
converted to bytecode when read by the interpreter. The
dis module disassembler
produces an assembly language program from this bytecode. JCoCo
includes its own disassembler which uses the Python dis module and produces output suitable
for the JCoCo virtual machine.
The existence of the disassembler for
JCoCo means that learning assembly language is as easy as writing a
Python program and running it through the disassembler to see how
it is implemented in assembly language. That means you can discover
how Python is implemented while learning assembly language! Because
Python’s virtual machine is not guaranteed to be backwards
compatible, you must use Python 3.2 when disassembling programs so
make sure that version 3.2 is installed on your system. To test
this you can try typing “python3.2” in a terminal window in your
favorite operating system. If it says command not found, you likely
don’t have Python 3.2 installed. In that case you can download it
from http://python.org or directly from
the JCoCo website at http://cs.luther.edu/~leekent/JCoCo.
The rest of this chapter introduces you to assembly language
programming using the JCoCo virtual machine.
You can download the full binary
implementation of the JCoCo virtual machine by going to
http://cs.luther.edu/~leekent/JCoCo.
Download the zip file containing a coco shell script which runs the
Java Virtual Machine on the JCoCo jar file. You can also go to
github and get the source code for the reduced functionality JCoCo
project at http://github.com/kentdlee/JCoCo.
3.1 Overview of the JCoCo VM
JCoCo, like Python, is a virtual
machine, or interpreter, for bytecode instructions. JCoCo is
written in Java using object-oriented principles and does not store
its instructions in actual bytecode format. Instead, it reads an
assembly language file and assembles it building an internal
representation of the program as a sequence of functions each with
their own sequence of bytecode instructions. CoCo is another
implementation of this virtual machine, implemented in C++. You can
find documentation on the C++ version at http://cs.luther.edu/~leekent/CoCo.
JCoCo is backwards compatible with CoCo, but JCoCo does provide
some additional functionality including the ability to define
classes, create objects, and utilize single inheritance which are
not used extensively in this text. Additionally, JCoCo provides an
interactive command-line debugger that can be used for debugging
JCoCo assembly language programs.
Most of the material presented in this
chapter is true of either JCoCo or CoCo. Chap. 6 again revisits JCoCo as a target
language for Small, but
either JCoCo or CoCo will work as the Small compiler target.
A JCoCo program, like programs in
other programming languages, utilizes a run-time stack to store information
about each function called while the program is executing. Each
function call in a JCoCo program results in a new stack frame
object being created and pushed onto the run-time stack. When a
function returns, its corresponding stack frame is popped from the
run-time stack and discarded. Figure 3.1 depicts four active
function calls. Function A called function B, which called function
C, which called function D before any of the functions returned.
The top of the stack is at the top of Fig. 3.1. Each stack frame
contains all local variables that are defined in the function. Each
stack frame also contains two additional stacks, an operand stack and a block stack.

Fig.
3.1
The JCoCo virtual machine
JCoCo, like the
Python virtual machine, is a stack based architecture. This means
that operands for instructions are pushed onto an operand stack. Virtual machine
instructions pop their operands from the operand stack, do their
intended operation, and push their results onto the operand stack.
Most CPUs are not stack based. Instead they have general purpose
registers for holding intermediate results. Stack based
architectures manage the set of intermediate results as a stack
rather than forcing the programmer to keep track of which registers
hold which results. The stack abstraction makes the life of an
assembly language programmer a little easier. The operand stack is
used by the virtual machine to store all intermediate results of
instruction execution. This style of computation has been in use a
long time, from Hewlett Packard mainframe computers of the 1960’s
through the 1980’s to calculators still made by Hewlett Packard
today. The Java Virtual Machine, or JVM, is another example of a
stack machine.
The other stack
utilized by JCoCo is a block
stack. The block stack keeps track of exit points for blocks
of code within a JCoCo function. When a loop is entered, the exit
address of the loop is pushed onto the block stack. The
instructions of each function are at zero-based offsets from the
beginning of the function, so we can think of each function having
its own instruction address space starting at 0. By storing each
loop’s exit point address on the block stack, if a break
instruction is executed inside a loop, the exit point of the loop
can be found and the execution of the break instruction will jump
to that address. Exception handlers also push the address of the
handler onto the block stack. If an exception occurs, execution
jumps to the exception handler by popping the address from the
block stack. When a loop or try block is exited, the corresponding
block stack address is popped from the block stack.
A program counter, or PC, is responsible
for holding the address of the next instruction to be executed. The
machine proceeds by fetching an instruction from the code,
incrementing the PC, and executing the fetched instruction.
Execution proceeds this way until a RETURN_VALUE instruction is
executed or an exception occurs. When a function call is executed,
the current program counter is stored in the stack frame until the
called function returns, when the PC is restored to the next
instruction in the current stack frame. This is depicted in
Fig. 3.1
with the arrows from the stack frames to the code of their
corresponding functions.
When an exception occurs, if no
matching exception handler is found, execution of the function
terminates and control is passed to the previously called function
where the exception continues to propagate back until a matching
exception handler is found. If no matching handler is found, the
complete traceback of the exception is printed. If no exception
occurs during the running of a program, execution terminates when
the main function executes the RETURN_VALUE instruction.
The specification
for JCoCo, including all instructions, global functions, and the
complete assembly language BNF supported by JCoCo can be found in
Appendix A. The rest of
this chapter examines various Python language constructs and the
corresponding assembly language that implement these constructs.
JCoCo assembly language can be learned by examining Python code and
learning how it is implemented in assembly language. The rest of
this chapter proceeds in this fashion.
3.2 Getting Started
JCoCo includes a disassembler that
works with Python 3.2 to disassemble Python programs into JCoCo
assembly language programs, providing a great way to learn assembly
language programming using the JCoCo virtual machine. Consider the
following Python program that adds 5 and 6 together and prints the
sum to the screen.

Running this with python 3.2 as
follows produces this output. Note that the 1 argument is required
to get assembly output because of the code on lines 10-13 of the
Python program.
The disassembler prints the assembly language program to standard
output, which is usually the screen. The second run of the
addtwo.py program redirects the standard output to a file called
addtwo.casm. The casm is
the extension chosen for JCoCo assembly language files and stands
for CoCo Assembly. This CASM file holds all the lines between the
two MyComputer prompts above. To run this program you can invoke
the JCoCo virtual machine as shown here.
The first run invokes coco
which assembles the program producing the assembled output and then
runs the program producing the 11 that appears below the assembled
output. The assembled output is shown because the -v option was used when invoking
coco. The assembled output
is printed to a stream called standard error which is separate from
the standard output stream
where the 11 is printed. To only print the exact output of the
program, the -v option can
be omitted.


In this JCoCo program there is one
function called main. The assembly indicates main has 0 formal
parameters. Constants that are used in the code include None, 5,
and 6. There are three local variables in the function:
x, y, and z. The global print function is called and so is in
the list of globals. Every function in JCoCo has these categories
of identifiers and values within each defined function. Sometimes
one or more of these categories may be empty and can be omitted in
that case.
The instructions follow the
begin keyword and preceed
the end keyword. LOAD_CONST
loads the constant value at its index (zero based and 1 in this
case) into the constants onto the operand stack. JCoCo is a stack
machine and therefore all operations are performed with operands
pushed and popped from the operand stack.
The STORE_FAST instruction stores a
value in the locals list, in this case at offset 0, the location of
x. LOAD_FAST does the opposite of STORE_FAST, pushing a value on
the operand stack from the locals list of variables. BINARY_ADD
pops two operands from the stack and adds them together, pushing
the result. CALL_FUNCTION pops the number of arguments specified in
the instruction (1 in this case) and then pops the function from
the stack. Finally, it calls the popped function with the popped
arguments. The result of the function call is left on the top of
the operand stack. In the case of the print function, None is returned and left on the stack.
The POP_TOP instruction pops the None from the stack and discards it
only to have the main function push a None on the stack just before
returning. RETURN_VALUE pops the top argument from the operand
stack and returns that value to the calling function. Since main
was the only function called, returning from it ends the coco
interpretation of the program.
To run this code, you must have the
coco executable somewhere in your path. Then you can execute the
following code to try it out.

3.3 Input/Output
JCoCo provides one built-in function
to read input from the keyboard and several functions for writing
output to this screen or standard output. The following program
demonstrates getting input from the keyboard and printing to
standard output.

In the Python code in
Sect. 3.3, the input function is called. Calling input
requires a string prompt and returns a string of the input that was
entered. Calling the int
function on a string, as is done in the line that gets the
age from the user, returns
the integer representation of the string’s value. Finally, the
print function takes a
random number of arguments, converts each to a string using the
_ _str_ _ magic method, and prints each string separated
by spaces. The first argument to print in the code of
Sect. 3.3
is the result of concatenating name and the string”, a year from now
you will be”. String concatenation was used because there shouldn’t
be a space between the name
value and the comma.

Fig.
3.2
JCoCo I/O
The assembly language that implements
the program in Sect. 3.3 is given in Fig. 3.2. Notice that built-in
functions like input,
int, and print are declared under the
Globals list. The
name and age variables are the locals.
Line 9 pushes the input function onto the operand stack.
Line 10 pushes the string prompt for input. Line 11 calls the input function with the one allowed
argument given to it. The 1
in line 11 is the number of arguments. When the input function returns it leaves string
entered by the user on the operand stack. Line 12 stores that
string in the name location
in the locals.
Line 13 prepares to convert the next
input to an integer by first pushing the int function on the operand stack. Then
line 14 loads the input
function. Line 15 loads the prompt like line 10 did previously.
Line 16 calls the input function. The result is immediately passed
to the int function by
calling it on line 17. The int function leaves an integer on the
top of the operand stack and line 18 stores that in the
age variable
location.
The next part of the program prints
the output. To prepare for calling the print function, the arguments must be
evaluated first, then print
can be called. Line 19 pushes the print function onto the stack but does
not call print. There are
three arguments to the print function. The first argument is
the result of concatenating two strings together. Line 20 pushes
the name variable’s value
on the stack. Line 21 pushes the string”, a year from now you will
be” onto the stack. Line 22 calls the _ _add_ _ magic
method to concatenate the two strings. The BINARY_ADD instruction
pops two operands from the stack, calls the _ _add_ _
method on the first object popped with the second object as the
argument which is described in more detail in Appendix A.
Lines 23–25 add together age and 1 to get the correct age value to pass
to print. Line 26 pushes
the last string constant on the operand stack and line 27 finally
calls the print function
leaving None on the operand
stack afterwards. Line 28 pops the None value and immediately None is pushed back on the stack in
line 29 because the main
function returns None in
this case, which is returned in line 30, ending the iotest.casm
program’s execution.
A few important things to learn from
this section:
-
Getting input and producing output rely on the built-in functions input and print.
-
Before a function can be called, it must be pushed on the operand stack. All required arguments to the function must also be pushed onto the stack on top of the function to be called.
-
Finally, when a function returns, it leaves its return value on the operand stack.
3.4 If-Then-Else Statements
Programming languages must be able to
execute code based on conditions, either externally provided via
input or computed from other values as the program executes.
If-then statements are one means of executing code conditionally.
The code provided here isolates just an if-then statement to show
how it is implemented in JCoCo assembly.

Disassembling this Python code results
in the code in Fig. 3.3. There are new instructions in
Fig. 3.3
that haven’t been encountered until now, but just as importantly,
there are labels in this code. A label provides a symbolic target
to jump to in the code. Labels, like label00 and label01, are defined by writing them
before an instruction and are terminated with a colon. A label to
the right of an instruction is a target for that instruction.
Labels are a convenience in all assembly languages. They let
assembly language programmers think of jumping to a target in a
program, rather than changing the contents of the PC register,
which is what actually happens. When a program is executed using
JCoCo the labels disappear because JCoCo assembles the code,
replacing the labels with the actual PC target addresses. The JCoCo
code in Fig. 3.4 shows the JCoCo code after it has been
assembled. The assembled code is printed by coco when the program is executed.

Fig.
3.3
If-Then-Else assembly

Fig.
3.4
Assembled code
The first instruction, the
LOAD_CONST, is at offset 0
in the code. The instructions of each function are at zero-based
offsets from the beginning of the function, so we can think of each
function as having its own address space starting at zero. In the
code in Figs. 3.3 and 3.4 the line number of the first instruction is
6, so 6 can be subtracted from the line numbers to determine any
instruction’s address within the function and 6 can be added to any
target to determine the line number of the target location. In
Fig. 3.4
the target of line 13 is 11 which corresponds to line 17. Looking
at Fig. 3.3 this corresponds to the line where
label00 is defined.
Likewise, the target of the JUMP_FORWARD instruction in
Fig. 3.4
is label01 which is defined
on line 19. Subtracting 6, we expect to see 13 as the target PC
address in the assembled code of Fig. 3.4.
Consulting the JCoCo BNF in
Appendix A, there can be
multiple labels on one instruction. In addition, instruction
addresses have nothing to do with which line they are on. That only
appears to be the case in Fig. 3.4 because the
instructions are on consecutive lines. But, adding blank lines to
the program would do nothing to change the instruction addresses.
So, we could have a program like this where one instruction has two
labels. These three instructions would be at three addresses within
the program even though there are four lines in the code.

Labels can be composed of any sequence
of letters, digits, underscores, or the @ character, but must start
with a letter, underscore, or the @ character. They can be any
number of characters long.
In Fig. 3.3, lines 6–11 load the
two values to be compared on the stack. The COMPARE_OP instruction on line 12 has
an argument of 4. Consulting the COMPARE_OP instruction in Appendix A reveals that a 4 corresponds
to a greater than
comparison. The comparison is done by calling the _ _gt_ _ magic method on the
second item from the top of the operand stack and passing it the
top of the operand stack. The two operands are popped by the
COMPARE_OP instruction and
a boolean value, either True or False, is pushed on the operand stack
as the result.
The next instruction jumps to the
target location if the value left on the operand stack was
False. Either way, the
POP_JUMP_IF_FALSE
instruction pops the top value from the operand stack.
Take note of line 16 in
Fig. 3.3.
In assembly there is nothing like an if-then-else statement. Line
15 is the end of the code that implements the then part of the statement. Without
line 16, JCoCo would continue executing and would go right into the
else part of the statement.
The JUMP_FORWARD
instruction is necessary to jump past the else part of the code if the
then part was executed.
Line 17 begins the else
code and line 18 is the last instruction of the if-then-else
statement. The label definition for label01 is still part of the
if-then-else statement, but labels the instruction immediately
following the if-then-else statement.
Practice 3.2
Without touching the code that
compares the two values, the assembly in Fig. 3.4 can be optimized to
remove at least three instructions. Rewrite the code to remove at
least three instructions from this code. With a little more work,
five instructions could be removed.
You
can check your answer(s) in Section 3.17.2.
3.4.1 If-Then Statements
Frequently if-then statements are
written without an else clause. For instance, this program prints
x if x is greater than y. In either case y is printed.


Fig.
3.5
If-Then assembly
Disassembling this code produces the
program in Fig. 3.5. The code is very similar to the code
presented in Fig. 3.3. Line 13 once again jumps past the
then part of the program.
Lines 14–17 contain the then code. Interestingly, line 18 jumps
forward to line 19. Comparing this to the code in
Fig. 3.3
where the jump forward jumps past the else part, the same happens in
Fig. 3.5
except that there is no else part of the statement.
Some assembly languages do not have an
equivalent to POP_JUMP_IF_FALSE. Instead, only an
equivalent to POP_JUMP_IF_TRUE is available. In that
case, the opposite of the condition can be tested and the jump will
be executed if the opposite is true, skipping over the then part. For instance, if testing for
greater than is the intent
of the code, less than or equal
to can be tested to jump around the then part of an if-then-else
statement.
Whether testing the original condition
or the opposite, clearly the JUMP_FORWARD is not needed in the code
in Fig. 3.5. As was seen in practice 3.1, the Python
compiler generated a wasteful instruction. It isn’t wrong to jump
forward, it’s just not needed. The convenience of writing in a
language like Python far outweighs the inconvenience of writing in
a language like JCoCo assembly language, so an extra instruction
now and then is not that big a deal. In this case though, the
Python compiler could be written in such a way as to recognize when
the extra instruction is not needed.
3.5 While Loops
Consider this code which computes the
Fibonacci number for the value stored in the variable f. The sequence of Fibonacci numbers
are computed by adding the previous two numbers in the sequence
together to get the next number. The sequence consists of 1, 1, 2,
3, 5, 8, 13, 21, and so on, the eighth element of the sequence
being 21.


Fig.
3.6
While loop assembly
The JCoCo assembly for this program
implements the while loop of the Python program using JUMP_ABSOLUTE and POP_JUMP_IF_FALSE instructions. Prior
to the loop, the SETUP_LOOP
instruction’s purpose is not readily apparent. In Python a loop may
be exited using a break
instruction. Using break
inside a loop is not a recommended programming style. A
break is never needed. It
is sometimes used as a convenience. To handle the break instruction when it is executed
there must be some knowledge about where the loop ends. In the code
in Fig. 3.6 the first instruction after the loop is on
line 33, where label02 is
defined. The SETUP_LOOP
instruction pushes the address of that instruction on the block
stack. If a break instruction is executed, the block stack is
popped and the PC is set to
the popped instruction address.
Lines 15–18 of Fig. 3.6 implement the
comparison of n< f
similarly to the way if-then-else comparisons are performed. The
first line of this code is labeled with label00 because the end of the loop
jumps back there to see if another iteration should be performed. A
while loop continues executing until the condition evaluates to
False so the POP_JUMP_IF_FALSE instruction jumps to
label01 when the loop
terminates.
The instruction at label01 labels the POP_BLOCK instruction. This instruction
is needed if the loop exits normally, not as the result of a break
statement. The block stack is popped, removing the loop exit point
from it. When exiting as a result of a break, execution jumps to
the instruction at line 33, skipping the POP_BLOCK instruction since the break
statement already popped the block stack.
An important
thing to notice is that a while loop and an if-then-else statement
are implemented using the same instructions. There is no special
loop instruction in
assembly language. The overall flow of a while loop is a test
before the body of the loop corresponding to the while loop
condition. If the loop condition is not met, execution jumps to the
next instruction after the loop. After the body of the loop a jump
returns execution to the while loop condition code to check if
another iteration of the body will be performed. This idiom, or
pattern of instructions, is used to implement loops and similar
patterns are used for loops in other assembly languages as
well.
Practice 3.4
Write a short program that tests the
use of the BREAK_LOOP
instruction. You don’t have to write a while loop to test this.
Simply write some code that uses a BREAK_LOOP and prints something to the
screen to verify that it worked.
You
can check your answer(s) in Section 3.17.4.
3.6 Exception Handling
Exception handling occurs in Python
within a try-except statement. Statements within the try block are executed and if an
exception occurs execution jumps to the except block of statements. If main
were called on the Python program given here, any error condition
would send it to the except block which simply prints the exception
in this case. The except block is only executed if there is an
error in the try block. Errors that could occur in this program
would be a conversion error for either of the two floating point
number conversions or a division by zero error. The code catches an
exception if a zero is entered for the second value.

Implementing exception handling in
JCoCo is similar in some ways to implementing the BREAK_LOOP instruction. The difference
is that the exception causes the program to jump from one place to
the next instead of the BREAK_LOOP instruction. Both exception
handling and the break instruction make use of the block stack.
When a loop is entered, the SETUP_LOOP instruction pushes the exit
point of the loop onto the block stack; the exit point being an
integer referring to the address of the first instruction after the
loop.
To distinguish between loop exit
points and exception handling, the SETUP_EXCEPT instruction pushes the
negative of the except handler’s address (i.e. –1*address). So a
negative number on the block stack refers to an exception handler
while a positive value refers to a loop exit point. In the code in
Fig. 3.7
the exception handler’s code begins at label00.

Fig.
3.7
Exception handling assembly
The try block code begins on line 7 with
the SETUP_EXCEPT. This
pushes the handler’s address for label00 on the block stack which
corresponds to a –27. Execution proceeds by getting input from the
user, converting the input to floats, doing the division, and
printing the result. The print completes on line 24 where
None, which is returned by
print, is popped from the
operand stack.
If execution makes it to the end of
the try block, then no
exception occurred and line 25 pops the –27 from the block stack,
ending the try block. Line
26 jumps past the end of the except block.
If an exception
occurs, three things are pushed onto the operand stack before any
handling of the exception occurs. The traceback is pushed first. The
traceback is a copy of the run-time stack containing each function
call and the stored PC of all pending functions including the
current function’s stack frame and PC. Above the traceback there
are two copies of the exception object pushed on the operand stack
when an exception occurs.
If an exception occurs in the
try block, JCoCo consults
the block stack and pops values until a negative address is found
corresponding to some except block. Multiple try-except
statements may be nested, so it is possible that the block stack
will contain more than one negative address. When a negative
address is found, the PC is
set to its positive value causing execution to jump to the
except block. In
Fig. 3.7,
that’s line 27. The traceback and two copies of the exception are
pushed onto the stack prior to line 27 being executed.
Why are three
objects pushed on the operand stack when an exception occurs?
Python’s RAISE_VARARGS
instruction describes the contents of the operand stack as TOS2
containing the traceback, TOS1 the parameter, and TOS the exception
object. In the JCoCo implementation the parameter to an exception
can be retrieved by converting the exception to a string, so the
object at TOS1 is simply the exception again. For the sake of
compatibility with the Python disassembler JCoCo pushes three
operands pushed onto the operand stack when an exception is
raised.
Exception handlers in Python may be
written to match only certain types of exceptions. For instance, in
Python a division by zero exception is different than a float
conversion error. The JCoCo virtual machine currently only has one
type of exception, called Exception. It is possible to extend
JCoCo to support other types of exceptions, but currently there is
only one type of exception object that can be created. The argument
to the exception object can be anything that is desired. The
program in Fig. 3.7 is written to catch any type of exception,
but it could be written to catch only a certain type of exception.
Line 27 duplicates the exception object on the top of the operand
stack. Line 35 loads a global Exception object onto the stack. The
COMPARE_OP 10 instruction
compares the exception using the exception match comparison which
calls the _ _excmatch_ _ magic method
to see if there is a match between the thrown exception and the
specified pattern. If there is not a match, line 30 jumps to the
end of the except block. The END_FINALLY instruction on line 47
detects if the exception was handled and if not, it re-throws the
exception for some outer exception handling block.
If the exception
was a match, execution of the handler code commences as it does on
line 31 of the program. The top of the operand stack contains the
extra exception object so it is thrown away by line 31. Line 32
takes the remaining exception object and makes the ex reference point to it. Line 33 pops
the traceback from the operand stack.
Should an exception occur while
executing an exception handler, then JCoCo must clean up from the
exception. Line 34 executes the SETUP_FINALLY instruction to push
another block stack record to keep track of the end of the
exception handler. Lines 35–38 print the exception named
ex in the code.
Line 39 pops the exit address that was
pushed by the SETUP_FINALLY
instruction. The POP_EXCEPT
instruction on line 40 then pops the block stack address for the
exception handler exit address. Line 41 pushes a None on the operand stack.
Line 42 is either the next instruction
executed or it is jumped to as a result of an exception while
executing the handler code for the previous exception. Either way,
the ex variable is made to
refer to None. The
DELETE_FAST instruction
doesn’t appear to do much in this code. It is generated by the
disassembler, but appears to delete None which doesn’t seem to need to be
done.
The last instruction of the handler
code, the END_FINALLY
instruction checks to see if the exception was handled. In this
case, it was handled and the instruction does nothing. If execution
jumps to line 47 then the exception handler did not match the
raised exception and therefore the exception is re-raised. Line 48
wraps up by setting up to return None from the main function.
Practice 3.5
Write a short program that tests
creating an exception, raising it, and printing the handled
exception. Write this as a JCoCo program without using the
disassembler.
You
can check your answer(s) in Section 3.17.5.
3.7 List Constants
Building a compound value like a list
is not too hard. To build a list constant using JCoCo you push the
elements of the list on the operand stack in the order you want
them to appear in the list. Then you call the BUILD_LIST instruction. The argument to
the instruction specifies the length of the list. This code builds
a list and prints it to the screen.

Fig.
3.8
Assembly for building a list

The assembly language program in
Fig. 3.8
builds a list with two elements: [‘hello’, ‘world’]. Lines 6 and 7
push the two strings on the operand stack. Line 8 pops the two
operands from the stack, builds the list object, and pushes the
resulting list on the operand stack. Python defines the
_ _str_ _ magic
method for built-in type of value, which is called on the list on
line 12.
If you run this program using the
JCoCo interpreter you will notice that [‘hello’, ‘world’] is not
printed to the screen. Instead, [hello, world] is printed. This is
because currently the _ _str_ _ method is called on
each element of the list to convert it to a string for printing.
This is not the correct method to call. Instead, the _ _repr_ _ magic method
should be called which returns a printable representation of the
value retaining any type information. In the next chapter there
will be an opportunity to fix this.
3.8 Calling a Method
Calling functions like print and input was relatively simple. Push the
function name followed by the arguments to the function on the
operand stack. Then, call the function with the CALL_FUNCTION instruction. But, how
about methods? How does a method like split get called on a string? Here is a
program that demonstrates how to call split in Python.

Fig.
3.9
Assembly for Calling a method

Line 6 of the assembly language code
in Fig. 3.9 prepares to call the input function by loading the name
input onto the operand
stack. Line 7 loads the argument to input, the prompt string. Line 8 calls
the input function leaving the entered text on the operand stack.
Calling split is done
similarly.
In this Python code the syntax of
calling input and
split is quite different.
Python sees the difference and uses the LOAD_ATTR instruction in the assembly
language instructions to get the split attribute of the object referred
to by s. Line 10 loads the
object referred to by s on
the stack. Then line 11 finds the split attribute of that object. Each
object in JCoCo and Python contains a dictionary of all the
object’s attributes. This LOAD_ATTR instruction examines the
dictionary and with the key found in the globals list at the
operands index. It then loads that attribute onto the operand
stack. The CALL_FUNCTION
instruction then calls the method that was located with the
LOAD_ATTR
instruction.
The STORE_ATTR instruction stores an
attribute in an object in much the same way that an attribute is
loaded. JCoCo does not presently support the STORE_ATTR instruction but could with
relatively little effort. The ability to load and store object
attributes means that JCoCo could be used to implement an
object-oriented language. This makes sense since Python is an
object-oriented language.
Practice 3.6
Normally, if you want to add to
numbers together in Python, like 5 and 6, you write 5+6. This
corresponds to using the BINARY_ADD instruction in JCoCo which
in turn calls the magic method _ _add_ _ with the method
call 5._ _add_ _(6). Write a short JCoCo program where
you add two integers together without using the BINARY_ADD instruction. Print the
result to the screen.
You
can check your answer(s) in Section 3.17.6.
3.9 Iterating Over a List
Iterating through a sequence of any
sort in JCoCo requires an iterator. There are iterator objects for
every type of sequence: lists, tuples, strings, and other types of
sequences that have yet to be introduced. Here is a Python program
that splits a string into a list of strings and iterates over the
list.


Fig.
3.10
List iteration assembly
Lines 6–8 of the assembly code in
Fig. 3.10 gets an input string from the user,
leaving it on the operand stack. Line 9 stores this in the variable
x. Lines 10–12 call the
split method on this
string, leaving a list object on the top of the operand stack. The
list contains the list of space separated strings from the original
string in x. Line 13 stores
this list in the variable lst.
Line 14 sets up the exit point of a
loop as was covered earlier in this chapter. Line 15 loads the
lst variable onto the
operand stack. The GET_ITER
instruction creates an iterator with the top of the operand stack.
The lst is popped from the
operand stack during this instruction and the resulting iterator is
pushed onto the stack.
An iterator has a _ _next_ _ magic method that
is called by the FOR_ITER
instruction. When FOR_ITER
executes the iterator is popped from the stack, _ _next_ _ is called on it,
and the iterator and the next value from the sequence are pushed
onto the operand stack. The iterator is left below the next value
in the sequence at TOS1. When _ _next_ _ is called on the
iterator and there are no more elements left in the sequence, the
PC is set to the label of the FOR_ITER instruction, ending the
loop.
When the loop is finished the block
stack is popped to clean up from the loop. Line 25 loads the
None on the stack before
returning from the main
function.
Practice 3.7
Write a JCoCo program that gets a
string from the user and iterates over the characters of the
string, printing them to the screen.
You
can check your answer(s) in Section 3.17.7.
3.10 Range Objects and Lazy Evaluation
Indexing into a sequence is another
way to iterate in a program. When you index into a list, you use a
subscript to retrieve an element of the list. Generally, indices
are zero-based. So the first element of a sequence is at index 0,
the second at index 1, and so on.
There are two
versions of Python in use today. Version 2, while older is still
widely used because there are many Python programs that were
written using it and there is a cost to converting them to use
Python 3. Python 3 was created so new features could be added that
might be incompatible with the older version. One difference was in
the range function. In Python 2, the range function generated a
list of integers of the specified size and values. This is
inefficient because some ranges might consist of millions of
integers. A million integers takes up a lot of space in memory and
takes some time to generate. In addition, depending on how code is
written, not all the integers in a range may be needed. These
problems are a result of eager
evaluation of the range function. Eager evaluation is when
an entire sequence is generated before any element of the sequence
will actually be used. In Python 2 the entire list of integers is
created as soon as the range function is called even though the
code can only use one integer at a time.
Python 3 has dealt with the
eager evaluation of the
range function by defining a range object that is lazily evaluated. This means that when
you call the range function to generate a million integers, you
don’t get any of them right away. Instead, you get a range object. From the range object you
can access an iterator. When _ _next_ _ is called on an
iterator you get the next item in the sequence. When _ _next_ _ is called on a
range object iterator you get the next integer in the range’s
sequence. Lazy evaluation
is when the next value in a sequence is generated only when it is
ready to be used and not before. This code creates a range object.
The range object is designed to provide lazy evaluation of integer
sequences.

This Python code uses indices to
iterate backwards through a list. In this case an iterator over the
range object yields a descending list of integers which are the
indices into the list of values entered by the user. If the use
enters four space separated values, then the range object will
yield the sequence [3, 2, 1, 0]. The first argument to range is the
start value, the second is one past the stop value, and the third
argument is the increment. So the sequence in the Python code in
Sect. 3.10 is a descending sequence that goes down
one integer at a time from the length of the list minus one to
zero.
The JCoCo assembly code in
Fig. 3.11 implements this same program. Lines 15–23
set up for calling the range function with the three integer
values. Lines 15–20 call the len function to get the length of the
list and subtract one. Lines 21 and 22 put two –1 values on the
operand stack. Line 23 calls the range function which creates and
pushes a range object onto the operand stack as its result.

Fig.
3.11
Range assembly
Line 24 creates an iterator for the
range object. As described in the last section, the FOR_ITER instruction calls the
_ _next_ _ magic
method on the iterator to get the next integer in the range’s
sequence. The lazy evaluation occurs because an iterator keeps
track of which integer is the next value in the sequence. Line 26
stores the next integer in the variable i.
The BINARY_SUBSCR instruction is an
instruction that has not been encountered yet in this chapter. Line
28 loads the list called lst onto the operand stack. Line 29
loads the value of i onto
the operand stack. The BINARY_SUBSCR instruction indexes into
lst at position
i and pushes the value
found at that position onto the operand stack. That value is
printed by the print function call on line 31 of the program.
Lazy evaluation is an important
programming language concept. If you ever find yourself writing
code that must generate a predictable sequence of values you
probably want to generate that sequence lazily. Iterators, like
range iterators, are the means by which we can lazily access a
sequence of values and range objects define a sequence of integers
without eagerly generating all of them.
3.11 Functions and Closures
Up to this point in the chapter all
the example programs have been defined in a main function. JCoCo supports the
definition of multiple functions and even nested functions. Here is
a Python program that demonstrates how to write nested functions in
the Python programming language. The main function calls the function named
f which returns the
function g nested inside
the f function. The
g function returns
x. This program
demonstrates nested functions in JCoCo along with how to build a
closure.

Fig.
3.12
Nested functions assembly

Notice the Python code in section
3.11 calls
the disassembler on the top-level function main. It is not called on f or g because they are nested inside
main and the disassembler
automatically disassembles any nested functions of a disassembled
function.
The format of the corresponding JCoCo
program in Fig. 3.12 is worth noting as well. The top level
main function is defined
along the left hand side. Indentation has no effect on JCoCo but
visually you see that f is
nested inside main. The
function g is nested inside
f because it appears
immediately after the first line of the definition of f on line 3. The rest of the definition
of f starts again on line
10 and extends to line 21. The definition of g starts on line 3 and extends to line
9.
The number of arguments for each
function is given by the integer after the slash. The f/1 indicates that f expects one argument. The
main and g functions expect zero arguments.
These values are used during a function call to verify that the
function is called with the required number of arguments.
Examine the Python code in section
3.11
carefully. The main
function calls the function f which returns the function
g. Notice that f returns g, it does not call g. In the print statement of
main the function
f is called, passing 3 to
the function that returns g. The extra set of parens after the
function call f(3) calls
g. This is a valid Python
program, but not a common one. The question is: What does the
program print? There are two possible choices it seems: either 10
or 3. Which seems more likely?
On the one hand, g is being called from the main function where x is equal to 10. If the program
printed 10, we would say that Python is a dynamically scoped
language, meaning that the function executes in the environment in
which it is called. Since g
is called from main the
value of x is 10 and in a
dynamically scoped language 10 would be printed. The word
dynamic is used because if
g were called in another
environment it may return something completely different. We can
only determine what g will
return by tracing the execution of the program to the point where
g is called.
On the other hand, g was defined in the scope of an
x whose value was 3. In
that case, the environment in which g executes is the environment provided
by f. If 3 is printed then
Python is a statically scoped language meaning that we need only
understand what the environment contained when g was defined, not when it was called.
In a statically scoped language this specific instance of
g will return the same
value each and every time it is called, not matter where it is
called in the program. The value of x is determined when g is defined.
Dynamically scoped languages are
rare. Lisp, when it was first defined, was dynamically scoped.
McCarthy quickly corrected that and made Lisp a statically scoped
language. It is interesting to note that Emacs Lisp is dynamically
scoped. Python is statically scoped as are most modern programming
languages.
To execute functions in a statically
scoped language, two pieces are needed when a function may return
another function. To execute g not only is the code for g required, but so also is the
environment in which this instance of g was defined. A closure is formed. A closure is the
environment in which a function is defined and the code for the
function itself. This closure is what is called when the function
g is finally called in
main.

Fig.
3.13
Execution of nested.casm
Take a look at the JCoCo code for
this program in Fig. 3.12. Line 14 begins creating a new closure
object in the body of function f by loading the cell variable named
x onto the stack. A cell
variable is an indirect reference to a value. Figure 3.13 depicts what is
happening in the program just before the x is returned in the function
g. A variable in Python,
like Java and many other languages, is actually a reference that points to a value. Values exist on the heap and are
created dynamically as the program executes. When a variable is
assigned to a new value, the variables reference is made to point
to a new value on the heap. The space for values on the heap that
are no longer needed is reclaimed by a garbage collector that frees space on
the heap so it can be re-used. In Fig. 3.13 there are three
values on the heap, a 10, a 3, and one other value called a cell in
JCoCo and the Python virtual machine.
Because the function g needs access to the variable
x outside the function
f, the 3 is indirectly referenced through a
cell variable. The LOAD_CLOSURE instruction pushes that
cell variable onto the stack to be used in the closure. Since only
one value is needed from the environment, the next instruction on
line 15 builds a tuple of all the values needed from the
environment. Line 16 loads the code for g onto the stack. Line 17 forms the
closure by popping the function and the environment from the stack
and building a closure
object.
The variable x is a local variable for the function
f. But, because
x is referenced in
g and g is nested inside f, the variable x is also listed as a cell variable in
f. A cell variable is an
indirect reference to a value. This means there is one extra step
to finding the value that x
refers to. We must go through the cell to get to the 3.
The LOAD_DEREF instruction on line 7 is
new. A LOAD_DEREF loads the
value that is referenced by the reference pointed to in the list of
cellvars. So, this instructions pushes the 3 onto the operand
stack. Finally, line 35 calls the closure consisting of the
function and its data.
In the function g the freevars refer to the tuple of
references in the closure that was just called, so the first
instruction, the LOAD_DEREF, loads the 3 onto the
operand stack. Figure 3.13 depicts this state right before the
RETURN_VALUE instruction is
executed.
To finish up the execution of this
program a 3 is returned from the call to g and its frame is popped from the
run-time stack. Control returns to main where the 3 is printed. After
returning from main its
frame is also popped from the run-time stack which ends the
program.
Practice 3.8
The program in Fig. 3.12 would work just fine
without the cell. The variable x could refer directly to the 3 in both
the f and g functions without any ramifications.
Yet, a cell variable is needed in some circumstances. Can you come
up with an example where a cell variable is absolutely
needed?
You
can check your answer(s) in Section 3.17.8.
3.12 Recursion
Functions in JCoCo can call
themselves. A function that calls itself is a recursive function.
Recursive functions are studied in some detail in
Chap. 5 of this text. Learning to write
recursive functions well is not hard if you follow some basic
rules. The mechanics of writing a recursive function include
providing a base case that comes first in the function. Then, the
solution to the problem you are solving must be solved by calling
the same function on some smaller piece of data while using that
result to construct a solution to the bigger problem.
Consider the factorial definition.
Factorial of zero, written 0!, is defined to be 1. This is the base
case. For integer n greater
than 0, n! = n*(n–1)!. This
is a recursive definition because factorial is defined in terms of
itself. It is called on something smaller, meaning n–1 which is closer to the base case,
and the result is used in computing n!. Here is a Python program that
computes 5!.

The JCoCo implementation of this
program is given in Fig. 3.14. The program begins in main by loading 5
on the operand stack and calling the factorial function. The result is
printed to the screen with the print function.

Fig.
3.14
Recursion assembly
Calling factorial jumps to the first
instruction of the function where n is loaded onto the operand stack,
which in this case is 5. Lines 7–8 compare n to 0 and if the two values are equal,
1 is returned. Notice that the RETURN_VALUE instruction appears in the
middle of the factorial function in this case. A return instruction
doesn’t have to appear at the end of a function. It can appear
anywhere it makes sense and in this case, it makes sense to return
from the base case as soon as soon as possible.
The code from label00 forward is the recursive case
since otherwise we would have returned already. The code subtracts
one from n and calls
factorial with that new, smaller value. Notice that the recursive
function call is identical to any other function call. Finally,
after the function call the result of the call is on the operand
stack and it is multiplied by n to get n! which is returned.
Because this is a recursive function,
the preceding two paragraphs are repeated 5 more times, each time
reducing n by 1. The
program continues to count down until 1 is returned for the
factorial of 0. At its deepest, there are 7 stack frames on the
run-time stack for this program: one for the main function, and six more for the
recursive factorial
function calls. The run-time stack grows to 7 stack frames deep
when the base case is executed and then shrinks again as the
recursion unwinds. Finally, when the program returns to
main, 120 is printed to the
screen.
3.13 Support for Classes and Objects
In Python a class definition consists
of a class declaration. The class declaration contains a magic
method named _ _init_ _ which is
responsible for initializing any created object. On line 14 of this
Python program an object is instantiated. Python automatically
calls the constructor to initialize the space allocated by Python
for the object.

A class is a collection of functions
that all operate on some given grouping of data. For instance, the
class Dog contains a
function called eat and
another called speak. Both
the eat and speak functions operate on objects of
type Dog.
The functions of a class definition
are often called methods to differentiate them from stand-alone
functions. Methods are provided a reference to the current object which is the collection
of data on which a method operates. The reference self is used to reference the current
object and is always the first parameter to a method in Python.
This description of methods isn’t entirely accurate when
considering the Python virtual machine.
In the Python virtual machine
methods are created from
the class’ functions when
an object is instantiated. Consider the assembly program in
Fig. 3.15 which demonstrates the instantiation of a
class called Dog. In line
36 of the main function the class called Dog is called. Calling a class in Python means
executing the code that allocates a Dog object in memory. This work is
handled by the virtual machine. The assembly language program does
not directly allocate the space for the object through any
instruction. It is accomplished by calling the class.

Fig.
3.15
The dog class
All objects in Python consist of a
dictionary that stores the attributes of the object. When the
Dog class is called, the
dictionary in the Dog
object is initialized with the methods Dog object. The methods of the
Dog object are essentially
the functions of the Dog
class. The difference between a method and a function is the
self parameter. A method
provides the self argument
to its function by providing a reference to the current object as
the first parameter. The method is a wrapper for the
function. After the methods
are stored in the object’s dictionary, the _ _init_ _ method is called
to perform any further initialization of the object.
To get a better understanding of the
difference between methods and functions, consider the code that
calls the eat method. When
the eat method is called on
the Dog object, the method
provides a reference to the current Dog object mesa as the first parameter before
calling the Dog class’
eat function. This is
appearent starting on line 38 in Fig. 3.15. Line 38 loads the
the reference for mesa onto
the operand stack. Then line 39 looks up the eat method in the object’s dictionary,
leaving the method on top of the operand stack, but not the
reference to the Dog object. Also, notice that no arguments are
loaded on top of the method. Yet, the eat function has one parameter, self.
When the eat method is
called on line 40 the virtual machine loads the self parameter
before calling the eat
function. The distinction between methods and functions is revisted
again in the next chapter.
Practice 3.10
In this section
it was stated that every object consists of a dictionary which
holds the attributes of the object. What is stored in the
dictionary of the object that mesa refers to in this section?
You
can check your answer(s) in Section 3.17.10.
3.13.1 Inheritance
In object-oriented programming,
inheritance comes into play when one class inherits from another.
Inheritance is useful for polymorphism and code re-use. When
programming using Python, polymorphism happens without inheritance
because Python is a dynamically typed language, meaning that all
method calls are looked up at run-time as was seen in the last
section when the LOAD_ATTR
instruction was executed. The LOAD_ATTR instruction looks up a method
by name in the object’s dictionary. The run-time look up of methods
by name creates the polymorphic behavior of Python. The only
purpose of inheritance in Python is code re-use. The next chapter
will have more on polymorphism and how it applies to Java
programming where inheritance is needed to implement
polymorphism.
Consider the Python program in this
section. Again there is a Dog class that this time inherits from
an Animal class. The
Animal class defines an
eat method which is re-used
by the Dog class. The
Animal constructor also
contains code that is re-used by the Dog class. But, the Dog class defines its own speak method, overriding the
speak method in the
Animal class.
The inheritance of Animal contributing to the Dog class is indicated by writing
Dog(Animal) on line 12. The
call to super on line 14
returns an instance of a super class object which can be used to
reference the super class, in this case the Animal class. Python programs can use
multiple inheritance. This is not true in JCoCo. Only single
inheritance is currently supported.

Practice 3.11
Code was omitted in
Figs. 3.16 and 3.17 for brevity in the chapter. Pick a method
and complete the assembly code according to the original Python
code from which it is derived.
You
can check your answer(s) in Section 3.17.11.

Fig.
3.16
Inheritance in JCoCo - part 1

Fig.
3.17
Inheritance in JCoCo - part 2
3.13.2 Dynamically Created Classes
The previous section demonstrates
declaring a class and creating objects in the JCoCo assembly
language. It is also possible to create a class dynamically, at
run-time. This also makes it possible to define class variables if
desired. A class variable is a variable assigned to the class
instead of instances of the class (i.e. objects). Consider this
Python program where dogNumber is a class variable that can be used
to count the number of instances of an object that have been
created. Building this class requires that extra code be executed
when building the class. The class variable must be initialized to
0.

The assembly code for this program
looks a bit different than the previous section. Instead of seeing
a Dog class, there is a
Dog function. The
Dog function becomes the
Dog class as a result of
its execution. To explain what happens, let’s start with
Fig. 3.19 which contains the code for the main
function. Line 55 of this code contains the LOAD_BUILD_CLASS instruction. This
instruction loads a built-in function which builds a class from two
arguments. A closure is one argument. As was stated in
Sect. 3.11, a closure is code and an environment. We
will visit this again in chapters 4 and 6. A closure is both code
and the environment (i.e. the collection of variables) in which the
code should be executed. The closure in this example is responsible
for building the class’ contents including its class variable and
the methods of the class, which if you recall from earlier in this
section are really functions until an object is instantiated.
The other
argument to the built-in class builder function is the name of the
class. Line 56 creates a closure. Line 57 buildes a tuple from the
closure. Line 58 loads the code for the class initialization (i.e.
the code for the Dog
function). Line 59 builds the closure with the tuple and the code.
Line 60 loads the name of the class onto the operand stack. Line 61
then calls the built-in class builder function passing to it the
closure and the name of the class.
The built-in class builder function
then does some housekeeping by creating a class instance, naming it
Dog since that was passed
as the name of the class, and calls the Dog function to complete the class
instantiation. This takes us to the code on line 33 in
Fig. 3.18.

Fig.
3.18
Dynamically created class - part 1
The STORE_LOCALS instruction on line 34
deserves some explanation. The local variables of the function
Dog are a dictionary or map
from strings (i.e. the names of the variables) to their values.
When the built-in class builder function calls the Dog function to complete the
construction of the class, it passes the dictionary for the class
into the function. This dictionary is the dictionary of the class
Dog and by executing the
STORE_LOCALS instruction
the dictionary also becomes the dictionary of locals for the
Dog function. So, anything
that is stored in the local variables will then be stored in the
class instance. This sharing of the local variables dictionary and
the class dictionary simplifies the class construction by making
any variables stored in the Dog function also named variables,
including named methods, in the class instance.
Line 35 of Fig. 3.18 stores Dog as the module name. The
_ _name_ _
attribute is already set to Dog by the built-in class builder
function. So line 36 gives the same name to the _ _module_ _ attribute.
Lines 37 and 38 initialize the class
variable dogNumber to 0.
Line 39 begins the work of adding the functions into the class
instance, which will be instantiated to methods when a Dog instance is created. The first
function to be stored is the _ _init_ _ constructor which
happens on lines 39–43. Lines 44–46 store the speak function in the class. The reason
the constructor takes a bit more work is because it references and
increments the class variable dogNumber and because of this the
constructor needs both the code and the environment to executed
correctly. The speak method
does not reference any class variables so no environment is needed.
The MAKE_FUNCTION
instruction builds a closure with an empty environment.

Fig.
3.19
Dynamically created class - part 2
While classes can be built either
dynamically (i.e. at run-time) or statically using the assembly
language syntax, the disassembler will use dynamic allocation when
the environment is used in one of the instance methods of the
class. Using the environment requires a closure and closures can be
constructed during dynamic allocation of the class.
Practice 3.12
In some detail, describe the contents
of the operand stack before and after the built-in class builder
function is called to create a class instance.
You
can check your answer(s) in Section 3.17.12.
3.14 Chapter Summary
An understanding of assembly language
is key to learning how higher level programming languages work.
This chapter introduced assembly language programming through a
series of examples, drawing parallels between Python and Python
virtual machine or JCoCo instructions. The use of a disassembler
was key to gaining this insight and is a great tool to be able to
use with any platform.
Most of the key constructs of
programming languages were presented as both Python programs and
JCoCo programs. The chapter concluded by covering classes,
inheritance, and dynamic class creation.
The assembly language covered in this
chapter comes up again in Chaps. 4 and 6. Chapter 4 covers the implementation of the
JCoCo virtual machine and Chap. 6 implements a high-level functional
language compiler that produces JCoCo assembly language
programs.
JCoCo is an assembler/virtual machine
for Python virtual machine instructions. Of course, there are other
assembly languages. MIPS is a CPU architecture that has wide
support for writing assembly language programs including a MIPS
simulator called SPIM. In fact, assemblers are available for pretty
much any hardware/operating system combination in use today.
Intel/Linux, Intel/Windows, Intel/Mac OS X all support assembly
language programming. The Java Virtual Machine can be programmed
with the instructions of the JVM using a java assembler called
Jasmin. Assembly language is the fundamental language that all
higher level programming languages use in their
implementations.
3.15 Review Questions
- 1.
How do the Python virtual machine and JCoCo differ? Name three differences between the two implementations.
- 2.
What is a disassembler?
- 3.
What is an assembler?
- 4.
What is a stack frame? Where are they stored? What goes inside a stack frame?
- 5.
What is the purpose of the block stack and where is it stored?
- 6.
What is the purpose of the Program Counter?
- 7.
Name an instruction that is responsible for creating a list object and describe how it works.
- 8.
Describe the execution of the STORE_FAST and LOAD_FAST instructions.
- 9.
How can JCoCo read a line of input from the keyboard?
- 10.
What is the difference between a disassembled Python program and an assembled JCoCo program? Provide a short example and point out the differences.
- 11.
When a Python while loop is implemented in JCoCo, what is the last instruction of the loop and what is its purpose?
- 12.
What do exception handling and loops have in common in the JCoCo implementation?
- 13.
What is lazy evaluation and why is it important to Python and JCoCo?
- 14.
What is a closure and why are closures needed?
- 15.
How do you create an instance of a class in JCoCo? What instructions must be executed to create objects?
- 16.
Write a class, using JCoCo, and create some instances of the class.
3.16 Exercises
- 1.
Consulting the JCoCo assembly language program in the solution to exercise 3.2, provide the contents of the operand stack after each instruction is executed.
- 2.
Write a JCoCo program which reads an integer from the user and then creates a list of all the even numbers from 0 up to and including that integer. The program should conclude printing the list to the screen. Test your program with JCoCo to be sure it works. Do this with as few instructions as possible.
- 3.
With as few instructions as possible add some exception handling to the previous exercise to print “You didn’t enter an integer!” if the user fails to enter an integer in their program.
- 4.
In as few instructions as possible write a JCoCo program that computes the sum of the first n integers where the non-negative n is read from the user.
- 5.
Write a recursive JCoCo program that adds up the first n integers where n is read from the user. Remember, there must be a base case that comes first in this function and the recursive case must be called on something smaller which is used in computing the solution to the whole problem.
- 6.
Write a Rational class that can be used to add and multiply fractions together. A Rational number has an integer numerator and denominator. The _ _add_ _ method is needed to add together Rationals. The _ _mul_ _ method is for multiplication. To get fractions in reduced format you may want to find the greatest common divisor of the numerator and the denominator when creating a Rational number. Write this code in Python first, then disassemble it to get started with this assignment. You may wish to write the greatest common divisor function gcd as part of the class although no self parameter is needed for this function. The greatest common divisor of two integers,
and
, can be defined recursively. If
is zero then
is the greatest common divisor. Otherwise, the greatest common divisor of
and
is equal to the greatest common divisor of
and the remainder
divided by
. Write a recursive function called gcd to determine the greatest common divisor of
and
.
Disassemble the program and then look for ways of shortening up the JCoCo assembly language program. Use the following main program in your code.
3.17 Solutions to Practice Problems
These are solutions to the practice
problems. You should only consult these answers after you have
tried each of them for yourself first. Practice problems are meant
to help reinforce the material you have just read so make use of
them.
3.17.1 Solution to Practice Problem 3.1
The assembly code in
Fig. 3.2
blindly pops the None at
the end and then pushes None again before returning from main.
This can be eliminated resulting in two fewer instructions. This
would also mean that None
is not needed in the constants, but this was not eliminated below.

3.17.2 Solution to Practice Problem 3.2
As in practice 3.1 the POP_TOP and LOAD_CONST from the end can be
eliminated. In the if-then-else code both the then part and the else part execute exactly the same
STORE_FAST instruction.
That can be moved after the if-then-else code and written just
once, resulting in one less instruction and three less overall.
Furthermore, if we move the LOAD_GLOBAL for the call to
print before the
if-then-else statement, we can avoid storing the maximum value in
z at all and just leave the
result on the top of the operand stack: either x or y. By leaving the bigger of
x or y on the top of the stack, the call to
print will print the
correct value. This eliminates five instructions from the original
code.

It is worth noting that the code
above is exactly the disassembled code from this Python program.

When main is called, this code prints
the result of a conditional
expression. The if-then-else expression inside the print
statement is different than an if-then-else statement. An
if-then-else statement updates a variable or has some other
side-effect. An if-then-else expression, or conditional expression as it is called
in Python documentation, yields a value: either the then value or the else value. In the assembly language
code we see that the yielded value is passed to the print function
as its argument.
3.17.3 Solution to Practice Problem 3.3

3.17.4 Solution to Practice Problem 3.4
The following code behaves
differently if the BREAK_LOOP instruction is removed from
the program.

3.17.5 Solution to Practice Problem 3.5
This is the hello world program with
exception handling used to raise and catch an exception. This
solution does not include code for finally handling in case an exception
happened while handling the exception. It also assumes the
exception will match when thrown since JCoCo only supports one type
of exception.

3.17.6 Solution to Practice Problem 3.6
This program adds 5 and 6 together
using the _ _add_ _ magic method
associated with integer objects. First 5 is loaded onto the operand
stack. Then LOAD_ATTR is
used to load the _ _add_ _ of the 5 object
onto the stack. This is the function. The argument to _ _add_ _ is loaded next
which is the 6. The 6 is loaded by the LOAD_CONST instruction. Then
_ _add_ _ is
called with one argument. The 11 is left on the operand stack after
the function call. It is stored in x, the print is loaded, x is loaded onto the operand stack, and
print is called to print
the value. Since print
leaves None on the stack,
that value is returned from the main function.

3.17.7 Solution to Practice Problem 3.7

3.17.8 Solution to Practice Problem 3.8
A cell variable is needed if an inner
function makes a modification to a variable that is located in the
outer function. Consider the JCoCo program below. Without the cell
the program below would print 10 to the screen and with the cell it
prints 11. Why is that? Draw the run-time stack both ways to see
what happens with and without the cell variable.

Interestingly, this program cannot be
written in Python. The closest Python equivalent of this program is
given below. However, it is not the equivalent of the program
written above. In fact, the program below won’t even execute. There
is an error on the line x = x +
1. The problem is that as soon as Python sees x = in the function g, it decides there is another
x that is a local variable
in g. But, then
x = x + 1 results in an
error because x in
g has not yet been assigned
a value.

3.17.9 Solution to Practice Problem 3.9
A couple things to notice in
Fig. 3.20. The run-time stack contains one stack
frame for every function call to factorial. Each of the stack
frames, except the one for the main function, point at the
factorial code. While there
is only one copy of each function’s code, there may be multiple
stack frames executing the code. This happens when a function is
recursive. There also multiple n values, one for each stack frame.
Again this is expected in a recursive function.

Fig.
3.20
Execution of fact.casm
3.17.10 Solution to Practice Problem 3.10
Python is a very
transparent language. It turns out there is function called
dir that can be used to
print the attributes of an object which are the keys of its
dictionary. The dictionary maps names (i.e. strings) to the
attributes of the object. The following strings map to their
indicated values.
-
_ _init_ _ is mapped to the constructor code.
-
eat is mapped to the eat method.
-
speak is mapped to the speak method.
-
food is mapped to an integer.
-
This is all that is mapped by JCoCo. However, if you try this in Python you will discover that a number of other methods are mapped to default implementations of magic methods in Python including a hash method, comparison methods like equality (i.e. _ _eq_ _), a repr method, a str method, and a number of others.
3.17.11 Solution to Practice Problem 3.11

3.17.12 Solution to Practice Problem 3.12
To get ready to
execute the built-in class builder function the stack must contain
the following in order from the top of the stack down: The name of
the class is on the top of the operand stack. Below the name is the
closure of the class initializing function and its environment.
Below that is the built-in class builder function itself. The
CALL_FUNCTION instruction
is executed with two arguments indicated to call the class
builder.
Upon its return,
the two arguments and the class builder function have been popped
from the stack and the instance of the class is left on the operand
stack. This class instance may then be stored as a reference from
some known location, likely by a STORE_FAST instruction.