In this chapter you’ll learn about the
implementation of the JCoCo virtual machine while at the same time
you’ll be introduced to the Java and C++, two statically typed
object-oriented programming languages. The primary focus of the
chapter is on learning advanced object-oriented programming using
Java and C++. Statically typed languages, like C++ and Java, differ
from dynamically typed languages like Python in the way that type
errors are caught. When running a Python program a type error can
occur in any branch of code. One of the big problems with Python
programming is that these errors may exist until every possible
path in a Python program is executed. Testing Python code takes
considerable effort to ensure every possible path is executed.
While thorough testing is always a good idea, these type errors may
not be discovered until much later in the development cycle.

Fig.
4.1
JCoCo
Java and C++ programs are statically
typed. This means that type errors are found at compile time, when
the program is translated into executable format, without executing
the program at all. Programmers must declare the types of all
values or the compiler must be able to infer their type from the
context of expressions in the program. With the declaration of
value types in C++ and Java programs, the programmer is notified if
any operation is not allowed without executing a single line of
code. Run-time errors are still possible, but those run-time errors
are due to logic problems and not due to type errors.
The JCoCo implementation will serve as
nice examples while learning Java and C++. We’ll compare Java and
C++ when appropriate to show you the differences and similarities
between the two languages. In the interest of seeing the big
picture, we’ll start with an overview of the JCoCo implementation
as pictured in Fig. 4.1. JCoCo reads an input file which must be
formatted according to the grammar specified in Sect. A.1. Two
parts of the JCoCo implementation, the scanner and the parser are responsible for reading the
input file. The scanner is implemented as a finite state machine.
The parser is written as a top-down parser. The parser produces a
list of function and
class definitions which
make up the abstract syntax
tree definition of the program. The remainder of the JCoCo
implementation makes up the bytecode interpreter.
The bytecode interpreter evaluates the
abstract syntax tree (i.e. AST) which consists of function and class objects. The AST is interpreted
within the context of frame
objects. A frame is one activation record on the run-time stack of
the executing program. When the program begins, the interpreter
starts execution at the main function, creating a frame for it
and starting execution at the first instruction.
The examples of Java and C++ used
within this chapter will come from the implementation of the
scanner, parser, function, class, and frame classes along with
classes for types and instances of types like integers, floats,
strings, and lists.

Fig.
4.2
The JCoCo virtual machine
JCoCo is written
in Java. A similar project, CoCo, was earlier written in C++. JCoCo
is an improved, more fully developed version of CoCo, rewritten in
Java. JCoCo is a large project consisting of 56 source files and
around 8,900 lines of code. Don’t be intimidated, a lot of the code
is repetitive. With such a large program, structuring it correctly
is of the utmost importance. The JCoCo virtual machine is an
interpreter of bytecode instructions somewhat like the Java Virtual
Machine (i.e. JVM). The JCoCo interpreter reads assembly language
source files called CASM
files as you learned about in the last chapter.
Much of the design of CoCo and JCoCo is
similar. JCoCo includes support for programmer-defined classes.
CoCo does not support classes definition. Otherwise, the two
implementations are very similar. The early part of this chapter
will present examples of both the Java and C++ implementations when
appropriate to compare and contrast the two languages. Later
sections will explore the details of the Java implementation of
JCoCo.
Like other interpreters, the JCoCo/CoCo
implementation is divided into some logical components: the
scanner, parser, and bytecode interpreter. The scanner reads characters from the CASM
source file and creates objects called tokens. The last chapter had
many examples of CASM files. Tokens from a CASM file like the one
in Sect. 3.2 and pictured in
Fig. 4.2
include a Function keyword,
a colon, a main identifier,
a slash, an integer 0, another keyword Constants, another colon, a
None keyword, and so on.
These tokens are returned one at a time to the parser when the
parser requests another token.
The parser reads the tokens one at a time
from the scanner and uses them while parsing the source file
according to the grammar for JCoCo given in Appendix A. The grammar given there is
LL(1) so the parser is implemented as a recursive descent parser.
Each non-terminal of the grammar is a function in the parser. The
right hand sides of rules for each nonterminal defines the body of
each function in the parser. There will be more on this later in
the chapter. The result of parsing the source file is an Abstract
Syntax Tree, or AST. This AST is an internal representation of the
program to be interpreted.
The JCoCo bytecode interpreter is the
part of the program, given the AST, that interprets the byte code
instructions of each function. As the instructions are executed,
the virtual machine interacts with I/O devices like the keyboard
and the screen. Bytecode interpretation is the responsibility of
several parts of the JCoCo implementation as you will read later.
The last part of this chapter has a detailed explanation of the
implementation of the JCoCo virtual machine.
4.1 The Java Environment
In Chap. 1 we learned that Java originated as
part of the Green project. Today, Java is a robust language that
contains many features that make it convenient for programmers
while also being efficient and powerful. Java actually consists of
two important tools: a Java Virtual Machine (i.e. JVM) and a Java
compiler that compiles from Java source code to Java bytecode
(which is what the JVM executes).
Let’s consider the hello world
example, written in Java. The Java program is given in
Fig. 4.3.
This program must be saved in a file called HelloWorld.java. The program is
compiled and run using commands like this.


Fig.
4.3
Hello world
The javac program is the Java compiler
which translates a Java source
program, ending in an extension of .java, to a bytecode file, ending in .class. The bytecode file is a binary
file that is machine readable, but not human readable. The bytecode
file is read by the Java Virtual
Machine or JVM which
is actually the program called java or java.exe if you are running on a
Windows machine. The JVM interacts with the operating system to
execute the bytecode file.
Typically, a Java program consists of
many source code files which are compiled into many bytecode files.
When programming in Java, each source code file must be named the
same as the public class within the source code file. For instance,
the code in Fig. 4.3 must be in a file called HelloWorld.java because the public
class is called HelloWorld.
Every Java source code file can have exactly one public
class.
Writing a
non-trivial Java program involves creating many classes and
therefore many source files. When a Java project is compiled, the
Java compiler looks at the dates of all source files and all
bytecode files. Every time a file is created or modified its last
modification date is changed. This is information that is
maintained by every operating system including Mac OS X, Microsoft
Windows, and Linux. If any bytecode file is found to be older than
its corresponding source file, then the source file is recompiled
to produce a new bytecode file with a newer date than its source
code file. This mechanism of using timestamps to determine what
needs to be recompiled is called a make facility for historical reasons
which we’ll revisit in the next section.
Practice 4.1
Another program is written and
compiled. Here is the error message from the compiler. What can you
discern from the compile message? Why would this be important to
Java?

You
can check your answer(s) in Section 4.34.1.
4.2 The C++ Environment
C++ and Java share a lot of syntax.
C++ was designed first with Java starting development about
10 years later. As mentioned in chapter one, Bjarne Stroustrup
was developing C++ during the early eighties. He designed the
language to be backward compatible with C so there were some
decisions already made for him like the need for separate compilation and the presence
of a macro processor. C++
is one of the most widely used object-oriented languages today and
continues to evolve. A standards committee now oversees C++ with
regular revisions to the language like the C++11 revision which
came out in 2011 and the 2014 version which contained small changes
over the 2011 version. A new version was formally accepted late in
2017. Development of the C++ language is ongoing.
Using C/C++ for a programming project
does not come without some risks. A significant problem, perhaps
the most persistent problem over time, with C/C++ programs are
memory leaks. C/C++ programmers must be disciplined in their
allocation and deallocation of memory. It is common that programs
that run for a long time will have a memory leak that has to be
tracked down, which is a difficult task. In many languages a
garbage collector takes care of freeing memory that is no longer
needed by a program. A garbage collector cannot safely be included
as part of C and C++ programs. Both C and C++ are designed to give
the programmer maximum control. This means that more responsibility
is left to the programmer and as a result programmers need to be
very disciplined when using C/C++. There is more on this in
Sect. 4.7.

Fig.
4.4
Java compiler and virtual machine

Fig.
4.5
C++ compile
C and C++ have many uses including
operating system development, timing critical software, and
detailed hardware access. Learning to program in C++ well will take
you a long ways towards being a great programmer in any language.
This chapter won’t teach you everything you need to know to become
a C++ programmer. That could be and is the topic of many books. But
this chapter will introduce you to many of the important concepts
and skills you’ll need to become a good C++ programmer.
Like Java, C++ programs must be
compiled before you can run the them. Java programs are compiled to
Java bytecode and the bytecode is run on the JVM. C++ programs are
compiled into the machine language of the CPU that will execute
them. The operating system of the computer where a C++ program runs
is responsible for loading the executable program and getting it
ready to run but otherwise a compiled C++ program runs directly on
the CPU of the machine for which it was compiled.

Fig.
4.6
hello.cpp
Figure 4.5 depicts the
compilation process for C++ programs. Examine
Fig. 4.4
to contrast that with the execution of Java programs. The C++
environment looks more complicated. But everything in the green box
is actually accomplished using one compile command.
Figure 4.6 contains the classic hello world program written in
C++.
To run the hello world program it must
first be compiled. Figure 4.5 shows the process of compiling a C++
program like the one that appears in Fig. 4.6. First, the macro
processor reads the iostream header file and combines it
with the rest of the source file. The iostream header file does not include
the code for streams. It just declares the streams and the
operators used to write to the out stream. The combined program text
is sent to the compiler which parses the program and generates
machine language code using an assembler. The machine language code
is then linked with the iostream library to produce the
executable code. Thankfully, this whole process is encapsulated in
one command. There will be more about both the macro processor and
I/O streams in the next sections.

Fig.
4.7
Compiling hello.cpp
Executing the g++ command compiles the program as
shown in Fig. 4.7. By default g ++ produces a program called a.out. To
execute the program you type a.out and the operating system will
load and run it. The default a.out can be renamed or a different
name can be provided on the compile command.
The
option in Fig. 4.8 tells g ++ to include debugging information in
the program. The -o tells
g ++ to name the executable program
hello instead of
a.out.


Fig.
4.8
Include debug and name
To compile a C++ program you must have
a C++ compiler installed on your system. The g ++ compiler used in
Fig. 4.7
is the GNU C++ compiler. This compiler is available for Mac OS X,
Linux, and Microsoft Windows.
4.2.1 The Macro Processor
The first line of the program in
Fig. 4.6
is called a macro processor
directive. The macro processor is a part of the C++ compiler
that is responsible for pulling other files into the source program
and sometimes for some simple editing of a source file to get it
ready to be compiled. In this program the macro processor
includes another file or
library called iostream.
The iostream file is called a header file because it defines
functions and variables that exist in some other library or code on
the system where it is compiled. Header files define the interfaces
to these other libraries or code. When a header file is enclosed in
angle braces, a less than/greater than pair, it is a system
provided header file. More information on the macro processor and
header files can be found in Sect. 4.9.1.
4.2.2 The Make Tool
A file system is the software and
format that controls how files are stored on the hard disk of a
computer. All operating systems have their own file systems and
sometimes support multiple file systems. Microsoft Windows supports
NTFS and Fat32 among others. Linux support ext2, ext3, reiserfs,
and others. Mac OS X supports several file systems including HFS+.
Every one of these file systems store attributes of every file
including the date and time the file was last modified.
Make is a program that can be used to
compile programs that are composed of modules and utilize separate
compilation. C and C++ programs utilize separate compilation and
typically you write a make file to compile programs written in
these languages, or you use a tool to automatically create a make
file. Java programs are not compiled via the make program because
the make program is built into the Java compiler as was mentioned
in the previous section.
The idea is simple. Every time a
module is compiled by the C++ compiler it produces an object file.
For instance, when PyObject.cpp is compiled, the C++
compiler writes a file called PyObject.o. For each of these files the
date and time when it was last modified or created is stored with
the file. After a compile the date on PyObject.cpp is older than the date on
PyObject.o. When a
programmer changes PyObject.cpp, its date will be newer
than PyObject.o‘s
date.
Make uses this simple observation
along with make rules to execute the compile commands necessary to
make PyObject.o‘s date
newer than PyObject.cpp‘s
date. Here is a make rule for PyObject.cpp.

This rule says that PyObject.o must be newer than
PyObject.cpp and
PyObject.h. If either of
these two files are newer then make will execute the command on the
next line, which must be indented under the first line. The result
of executing this compile command is to produce a new PyObject.o file with a newer date than
either of the two source files.
To make the coco executable, all the object files
must be linked together. To link everything together the first rule
is written like this.

All 38 object
files must be listed here. This says that the date on coco, the executable program, must be
newer than the date on all its object files.
All these rules are placed in a file
called Makefile in the same
directory as the C++ source files. When make is invoked it will look for a file
named Makefile. By keeping
track of the dates, only the source files that have been updated
will get recompiled and the coco executable will get recreated by
linking together all the object files.
Writing a good Makefile is sometimes difficult and
almost always error prone, so often there is a rule in the makefile
called clean. Executing
make clean will erase all
object files so you can get a fresh compile. There are also tools
like autoreconf that will
generate a Makefile
automatically with just a few inputs. Take a look at the
rebuild script in the CoCo
distribution to see how this might be used. To use autoreconf you must have the automake
tools installed on your system. But if you do, you can execute

to build the entire CoCo Virtual
Machine. Without the automake tools you should still be able to
execute the configure and
make commands to build
CoCo.
Separate compilation in C++ programs
means that each module in the program is compiled separately. Each
object file, generated by the compilation of a module, is produced
independently of the other source files. This is important because
large C++ projects often contain hundreds of C++ source files.
Separate compilation means that only the small piece a programmer
changes needs to be recompiled if the interface (i.e. the header
file) to other modules does not change. After compiling the source
files to object files, the object files can be linked together to
form an executable program. Linking is a very fast operation
compared to compiling.
Practice 4.2
Using C++ there are no naming
requirements for modules and classes like Java. So, when class
A uses class Test both class A and class Test can be put into a file by any
name. Why is this OK for C++ programs, but not for Java
programs?
You
can check your answer(s) in Section 4.34.2.
4.3 Namespaces
Line 2 of the program in
Fig. 4.6
opens up the std, short for
standard, namespace in the program. The first two lines of this C++
program are like importing a module in Python or a package in Java.
When importing a module in Python the programmer writes an import
statement like one of these two lines.


Fig.
4.9
Namespace std
In Java the programmer would write an
import statement somewhat like this, although not exactly.

The Python equivalent of a namespace
is a module. Python modules can be imported in one of two ways,
preserving the namespace or merging it with the existing namespace.
In Java packages are the equivalent of a namespace and selected
classes and objects can be imported from a package. Namespaces are
important in C++, Python, and Java, because without them there
would be many potential name conflicts between header files and
modules that would create compile errors and prevent programs from
compiling, or in the case of Python - from running, that were
otherwise correct programs. In C++ programs, if we didn’t want to
open the std namespace we
could rewrite the program as shown in Fig. 4.9.
The safest way to program is to not
open up namespaces or merge them together. But, that is also
inconvenient since the whole name must be written each time. What
is correct for your program depends on the program being
written.
4.4 Dynamic Linking
Dynamic linking is related to
namespaces, modules, and packages. Modern programming languages
like C++ and Java are reliant on many libraries so programmers can
solve problems instead of rewriting code that is common to more
than one program. Libraries containing commonly used code are
generally available to be used by programs written in a high level
language, including C++ and Java programs. These libraries must be
linked into your program to be able to use them.
Figure 4.5 shows object files (i.e. modules) being
linked together into a C++ program.
There is a problem with the picture in
Fig. 4.5.
Early C programs could be self-contained programs that relied on
only a small number of system calls from the Unix operating system.
However, modern C, C++, Java, and almost any other modern
programming language are reliant on so many libraries that linking
all of them together would be a problem in several ways.
-
The size of the linked executable program would be huge taking up a lot of space in memory as it was executing.
-
Any change in any library would require each program that uses it to be re-linked to get the new version of the library.
-
There is no reason to have multiple copies of libraries, one for each program that uses it. This wastes space in addition to the overhead of having to manage multiple copies of libraries.
Modern languages don’t statically link all the libraries that
are required by a program. They dynamically link them. When you hear
the word dynamic you should
think run-time. Libraries
are generally linked at run-time. Software, often part of the
operating system, detects when a library is going to be used by a
program and loads it into memory and links it to the program that
requests its services as the program is executing. Microsoft
Windows calls these dynamically linked libraries DLL’s. Windows includes services that
let libraries be written so they can be dynamically linked to
programs as they execute. Mac OS X and Linux also have the ability
to dynamically link libraries. C++ programs often dynamically link
to libraries that are provided by the C++ run-time libraries and
other libraries that may be required by a program but have been
supplied with the program.
Java programs also use dynamic
linking. In fact, dynamic linking is built into the very
foundations of Java. The JVM loads bytecode files (i.e. modules) as
needed in your Java program. Java programs consist of .class files, called bytecode files,
which must be in the current working directory or in a directory on
the class path. The class path is a list of directories, or
folders, where the JVM looks for bytecode files. The class path is
recorded in an environment variable called CLASSPATH. Here is one example of a
class path.

The class path is a list of folders,
or directories, where these dynamically loaded .class files can be found. Sometimes a
whole group of classes are written to implement some library. For
instance, this class path includes mysql-connector-java-5.1.17-bin.jar.
This is actually what is called a JAR file. A JAR file stands for Java
Archive, and is a zipped up set of .class files that are stored in
compressed format. Dynamically linked libraries are so common to
Java programs that JAR files were added as a means to conveniently
group and redistribute collections of classes for Java programs.
The bytecode files found in a JAR file are organized into packages.
Importing something like

in a Java program would cause the
BufferedInputStream class
to be dynamically linked from the java.io package, which is a library
provided by the Java run-time environment.
4.5 Defining the Main Function
Lines 3–5 of the hello world program
in Fig. 4.6 define the main function. Every C++ program must
have one main function, and only one. The main function should
return an integer and it is given an integer and an array of
character arrays which are the command-line arguments. The
command-line arguments are elaborated on in more detail in the
section on arrays and pointers later in this chapter.
Every Java program must also have a
main function. However, when the program is run the class whose
main function you want to run must be specified. In this way, each
class could potentially have a main function. The main of the
specified class will be the first to run. The main function of the
Java hello world program can be found in Fig. 4.3.
Practice 4.3
Command-line arguments are typed in
after the name of the program. For instance if a program is called
grep then you might provide
command-line arguments like this.

Both C++ and Java programs can receive
command-line arguments through the main function. With C++ the
number of command-line arguments is passed as argc and the actual arguments are
passed in the array of strings (i.e. character arrays) in the
parameter named argv as
declared in Fig. 4.9. The variable argc is always at least one for C++
programs but the length of the command-line arguments String array
in Java may be zero if no command-line arguments are passed. Do you
know why?
You
can check your answer(s) in Section 4.34.3.
4.6 I/O Streams
Line 4 of the program in
Fig. 4.6
prints Hello World to the
screen. To be a little more precise, cout represents what is called a stream
in C++. You can think of a C++ stream like a real stream with water in
it. You can place things in the stream and they will be carried
downstream. To place something in a C++ stream you use the
operator. Writing


places the string “Hello World” into
the cout stream. This expression returns the cout stream. This
means that multiple
operators can be chained together. Line 4 is
like writing


The parentheses are not needed in this
example since the
is
already left-associative. But they were included so you can see
that the function call to
returns a stream which can be used in the
next
operator to the right.



There are three streams automatically
associated with programs. These three streams are associated with
any program, whether C++, Python, Java, or other language. In C++,
the first stream is called cout and by default it writes to the
screen. The cerr stream
also writes to the screen by default. The cin stream reads from the keyboard by
default. The operator for reading from a stream is the right-shift
operator, written cin>>variable where the variable will hold
the value of its type which was read from the stream. In each of
these cases these streams can be redirected to read or write to
different locations. Redirecting input and output is an operating
system feature and not really associated with a specific
programming language. You can search on the web for information
about redirecting standard output, standard error, or standard
input if you are interested in learning more about
redirection.
Java programs have the same three
streams. System.out is the
name of the standard output stream. System.err is the standard error
stream. System.in is the
input stream. Figure 4.3 demonstrates writing to standard output in a
Java program. The right-shift and left-shift operators are not used
to read from and write to streams in Java. Instead the more
traditional function call syntax is used.
4.7 Garbage Collection
Garbage
collection occurs when dynamically allocated space needs to be
returned to the pool of available space in memory. This space that
is available for run-time allocation is called the heap. Every time an object is created a
little of the heap memory of the computer must be reserved to hold
that object’s state information. When the object is no longer
needed, the space on the heap has to be freed so it can be used by
another object later.
Languages like Java and Python provide
garbage collection as part of the underlying model of computation.
They can do this because these languages are careful about how
pointers are exposed to the programmer. In fact pointers are called
references in these languages to distinguish them from pointers in
languages like C and C++. The trade-off is that these languages
take some control away from the programmer. Java, Python, and many
languages that provide garbage collection require a virtual machine
to execute their programs and the virtual machine takes care of
managing and freeing unused memory.
Garbage collection can impact the
run-time performance of a system. Languages like Java and Python
aren’t as well-suited to real-time applications where timing is
critical. In these languages garbage collection can occur at any
time. Usually, running of a program is not time critical and the
time taken for garbage collection is negligible. The advantages of
garbage collection typically far outweigh the possibility of memory
leaks, but not in timing critical applications.
The existence of a run-time system
that supports garbage collection, like the Java and Python virtual
machines, means that those programs have less access to the
underlying hardware of the machine. To safely free unused memory
any garbage collection system must restrict the use of pointers in
programs and as a result programs written in languages like Java
and Python have less access to the details of the hardware
platform. Again, this is not usually a problem for most programs,
but there are instances where direct hardware access is important.
Programs like operating systems are typically not written in Python
or Java. To avoid any misconceptions, Android applications are
written in Java, but the Android operating system itself is based
on the Linux kernel which is implemented in C.
C++ programs must manage the
allocation and freeing of heap space. But, it’s not always clear
when an object will no longer be used. A memory leak occurs when
memory never gets freed even though the C++ program is done using
it. There is extra work involved in writing C++ classes to insure
that objects get freed when they are no longer needed. In the case
of the CoCo virtual machine it is not safe to free objects once
created because there is no reference counting in CoCo to decide
when an object is no longer in use. Because objects are created and
often referenced from multiple parts of a CASM program it is safe
to simply free objects in CoCo. True garbage collection is needed
in the C++ implementation of CoCo to make it a really useful
virtual machine. As it stands, CoCo works for running short
programs, but would not be suitable for long running
applications.
For Java programs the garbage
collector is a thread that runs once in a while and checks to see
how many references are still referring to an object. If there are
no parts of the program using an object, then it can be freed. Once
in a while you might have a group of objects that are not being
used, but all appear to be using each other. In this case the
garbage collector can form a dependency graph and figure out that
the objects involved form a cycle and no other objects outside the
cycle are depending on the group of objects in the cycle. In this
case, all objects in the cycle can be freed. The existence of a
garbage collector greatly simplifies writing Java programs
including JCoCo.
4.8 Threading
In the previous
section it was mentioned that the JVM garbage collector runs in a
different thread. A thread
is a running sequence of instructions that shares access to objects
with other threads. Each thread runs largely independently from
other threads in the same program. You can think of each thread as
essentially an independent program that is working with other
threads in the same larger program to accomplish some work.
Java was built from the ground
up to be a multi-threaded programming language. Every object
in Java contains a lock that can be used to synchronize the
behavior of multiple threads. When more than one thread is running,
its work should not undo or change the work another thread is
doing. When more than one thread is running there are two issues
that need to be dealt with: synchronization of the threads, and
communication between the threads.

Fig.
4.10
The PyToken class
Locks on objects
make it possible for Java threads to both synchronize their work,
and communicate with each other in structured ways. Every object in
Java has a lock associated with it. There are also keywords in Java
like synchronized that
insure that only one method at a time may run on an object. This
text won’t teach you about Java threading, but it is an important
topic and should be studied at some point.
C++ also has support for threads
through the thread class in the standard namespace. However, C++
support of threads is quite a bit different than the Java support.
For instance, C++ does not have keywords that allow for
synchronized methods like Java. Threading in C++ requires a little
more thought and work. C++ and Java are equally powerful in their
support of multi-threaded programs, but given a choice, Java is the
language to use for multi-threaded applications.
4.9 The PyToken Class
Object-Oriented programming is all
about creating objects. Objects have state information, sometimes just
called state, and methods
that operate on that state, sometimes altering the state. If we
alter the state of an object we call it a mutable object. If we cannot alter the
object’s state once it is created, the object is called
immutable.
A class defines the state information
maintained by an object and the methods that operate on that state.
We’ll start by examining the PyToken class in Fig. 4.10 since it is a simple
immutable class. The PyToken class defines the token objects that
are returned by the scanner to the parser during parsing of a JCoCo
program. All the JCoCo code is in a package called jcoco. Line 1 declares that this class
belongs in this jcoco
package.
Lines 4–17 of Fig. 4.10 define the
TokenType enum, which is
short for enumeration. The
TokenType enumeration
defines the types of the tokens returned by the scanner. If you
recall from Chap. 3, the syntax of CASM programs is
pretty simple and these constant values are all the possible token
types. Each constant serves as a name for each token type. With
this enumeration it is possible to use these constant names in the
Java code where the type of token is needed. For instance, here is
one snippet of code from the JCoCo interpreter. Using descriptive
constant names is useful in writing self-documenting code which you
should always strive to do.

Lines 19–22 define the instance variables of the object (i.e.
the object state). Each of these variables is declared private so
that only the class’ methods may access the state information
directly. Lines 24–29 define the PyToken constructor which is
called when a PyToken object is created. Here is how a PyToken
object gets created.

Of course, the variables type, lex, line, and column would all have to have values
already and be of the proper types. For instance, the lex variable must be declared as a
String. Java is a
statically typed language, so all variables must be declared before
they can be used. In this code the variable t was declared to have PyToken
type.
There are a couple of things you can’t
see in the textbook. Because this class is defined in a package
called jcoco, the result of
compiling this class will be placed in a subdirectory named
jcoco. Packages and
subdirectories go together in Java organization of files. In
addition, this file must be named PyToken.java since it contains the
public class PyToken. This
is also part of Java’s organization of files.
4.9.1 The C++ PyToken Class
The implementation of PyToken in C++ looks a bit different
than the Java version. Java and C++ both support separate
compilation of code. Using Java each class is written in a separate
file. Each file is compiled separately by the Java compiler. When
to re-compile a Java class is decided based on dates of both the
.java and the .class files.
In C++ there is no such make mechanism
built into the compiler. Instead, the separate make tool provides this functionality
as described in Sect. 4.2.2. In addition, the interface to the class
(i.e. the declaration of the class’ instance variables and methods)
is separated from the actual code that implements the methods. So,
the PyToken class
definition is separated into two files: the PyToken header file,
named PyToken.h, and the
method implementations located in PyToken.cpp. Figure 4.11 shows the contents
of the header file declaration of the class. Only the .cpp source files are compiled using
the compiler. The header files are included in the source files for
use during compilation of the source files.

Fig.
4.11
The C++ PyToken class declaration -
PyToken.h
Much of the class declaration in the
header file looks like the Java version. The enum defined in Fig. 4.10 is implemented as
integer constants in Fig. 4.11 for no good reason. Enums are supported in C++ as well.
Line 6 provides the declaration of a destructor which in general is used by
C++ because C++ programs must free up their space since there is no
garbage collector as there is in Java. However, the destructor in
this case doesn’t really have a purpose since these tokens don’t
have pointers to other objects. A destructor is needed precisely
when an object contains pointers to other resources that must be
freed. PyToken objects do not contain any pointers to other objects
and hence the destructor has no purpose in this class.
The other difference worth noting is
the use of const after the
four methods that return values. This declares that these methods
don’t mutate the PyToken
object. They only return values from the PyToken object. The use of const exists in C++ because C++ is very
flexible in the way parameters are passed and values are returned.
Declaring that a method is const helps C++ know where the method
can be safely called and can optimize the performance of C++
programs.
The C++ implementation of PyToken is
given in Fig. 4.12. The first line includes the declaration
of the PyToken.h header
file. This is a macro processor directive to bring that source code
into this file. By doing this, the PyToken class is declared for this
source code.
The PyToken:: that you see in
Fig. 4.12 is a scope qualifier. It indicates that
while none of this code is physically written inside the
PyToken class definition,
it is meant to be a part of the PyToken class.
Lines 3–6 of Fig. 4.12 use an arrow
operator, written +->+. In Java this
is written as a dot. The arrow operator follows a pointer. In C++,
this is a pointer that
points to the current object. In Java, this is a reference and we use the dot
notation to dereference the this reference. Pointers are the
address of data in the memory of the computer. Pointers can be used
in expressions to create new pointers using pointer arithmetic. In
a programming language a pointer can point anywhere. A reference is much more controlled.
References are somewhat like pointers except that they cannot be
used in arithmetic expressions. They also don’t directly point to
locations in memory. When a reference is dereferenced using a dot,
the run-time system does the lookup in a reference table.
This difference between references and
pointers means that we can safely rely on every reference pointing
to a real object where we don’t necessarily know if a pointer is
pointing to space that might be safely freed or not since the
pointer might be the result of some pointer arithmetic. References
are safe for garbage collection. Pointers are not.

Fig.
4.12
The C++ PyToken class implementation -
PyToken.cpp
4.10 Inheritance and Polymorphism
Object-oriented programming languages
help us organize our code. One great advantage of organizing our
code around objects occurs when we are able to re-use code. Code
re-use is important so that we can write something once and forget
it while we solve other problems. But for code re-use to work we
need a way of customizing this code for our purposes.
Inheritance is the mechanism we employ
to re-use code in software we are currently writing. Polymorphism is the mechanism we employ
to customize the behavior of code we have already written. In this
section we’ll look at some C++ code to see how inheritance and
polymorphism are specified. In the next section we’ll revisit the
same code as implemented in Java.
Consider the header file for PyObject
in Fig. 4.13. The CoCo and JCoCo virtual machines work
on Python objects. Every data value in Python is an object, so this
idea of objects is very pervasive in the JCoCo/CoCo
implementations. In fact, it is so pervasive there are certain
things that every object in Python must be able to do. Certain
methods can be called on every Python object. To be able to re-use
as much code as possible it makes sense to write that common code
in one place. One such place is the PyObject class in the C++ CoCo
implementation.

Fig.
4.13
The C++ PyObject header file -
PyObject.h
Every object in Python can be
converted to a string. While the string representations vary, the
mechanism to convert an object to a string is to call the
_ _str _ _ method
on the object. This is declared on line 22 of
Fig. 4.13. Since all objects should respond to this
method, the PyObject class
defines a toString method and a _ _str _ _ which
calls the toString method. The _ _repr _ _ method
is similar to the _ _str _ _ method.
In some case the two methods return exactly the same string. But,
the _ _repr _ _
returns a string that if evaluated using the eval function, would
construct the same object. For instance, consider this code.

After evaluating this code, both
x and y refer to lists of integers, where
y is a complete copy of the
contents of the list referred to by x.
Every object in
Python responds to a number of basic method calls. CoCo does not
attempt to implement all of them. But another that it does
implement is the _ _hash _ _ method
which returns a hash value for all hashable objects in Python. This
is used when the object is used as a key in a dictionary (i.e. hash
table). Only immutable objects may be used as keys in
dictionaries.
The _ _type _ _ method
returns the type of any Python object. Every Python object has a
type. The type is returned via this method which is also an
object.
There are a few things to note in
Fig. 4.13 related to programming in C++. The first
seven lines are called macro processor directives. Any line
starting with a pound sign (i.e.
) is a macro processor directive. The
first line is an if-not-defined directive and the second
line is a define directive.
The last line of Fig. 4.13 is an endif that goes with the first line.
The pattern of ifndef,
define, endif macro processor directives is
needed because include
directives often end up with circular references where include
A includes B which may in turn includes
C which includes
A again. This kind of
circular reference is avoided by defining PYOBJECT _H _ in
Fig. 4.13. Once the PyObject.h include is included, the
PYOBJECT _H _ is
defined and if PyObject.h
is included again through a circular reference, or even through
another include including it without a circular reference, it will
not get included twice. This pattern of ifndef-define-endif is used for every
header file in C and C++ programming.

Line 10 declares a class (i.e. type)
called PyType. This is
called a forward declaration. The PyType class is used in this header
file, but PyType.h also
includes PyObject.h so the
forward declaration was necessary because of the circular
reference. Line 14 is a constructor for the PyObject class. Line 15
is the destructor declaration which again doesn’t really do
anything for this class.
The use of the keyword virtual is important. Virtual methods
are methods that are included in the virtual function table of a
C++ class. This virtual function table is how C++ implements
polymorphism. When a
virtual function is called, there is an extra lookup of the
function’s address because classes that inherit from this class may
override any of the virtual functions. For instance, it can’t be
known at compile-time which version of toString should be called, the one in
PyObject or one of the toString methods defined in a subclass
of PyObject.

Fig.
4.14
The CoCo str magic method
Examine the PyObject:: _ _str _ _
method shown in Fig. 4.14. This method calls toString and returns a new PyStr object as the result of
converting the object to a string. The subtly here is that which
toString will be called is unknown until this code actually
executes. For instance, if the current object is a PyInt then the code would be executed
from PyInt.cpp as shown in Fig. 4.15. But if a
PyList were the current
object, then the toString
method would be executed from Fig. 4.16.

Fig.
4.15
The PyInt toString method

Fig.
4.16
The PyList toString method
Both PyInt and PyList inherit from
PyObject in the C++ implementation. So, polymorphism works because
toString is declared
virtual and therefore the determination of which toString to call is made through an
extra lookup of the actual pointer to the function, in the virtual
function table, at run-time.
Line 28 declares a function, which in
this case is an overloaded left-shift operator (i.e. +<<+) that can
be used to print objects. The implementation of this overloaded
left-shift operator, from the file PyObject.cpp, relies on polymorphism to
customize its behavior, like the _ _str _ _ method
also implemented in that module. The toString to get called will depend on
which type of object this is called.

4.11 Interfaces and Adapters
In early Object-Oriented programming
languages the specification of an interface was tied directly to a
class. For instance, in the previous section we learned that
toString was tied directly
to the PyObject class and
any classes that inherited from PyObject. This works great until you
have some class that inherits from something other than
PyObject and would like to
use the polymorphism defined by the toString method. Then you have a
situation where you would like to inherit from two different
classes at the same time. C++ solves this problem with multiple
inheritance. C++ classes can inherit from more than one
class.
Java does a few things a little
different than C++. First, unlike C++, every class in Java inherits
from the Object class
either directly or indirectly. There is one class hierarchy in Java
of which every class participates. C++ has no built-in inheritance
hierarchy. Using C++, a class that does not explicitly inherit from
something does not inherit from anything. In Java a class that does
not explicitly inherit from anything inherits from Object.
Secondly, multiple inheritance is not
supported using Java, which simplifies inheritance and its
implementation. But, Java solves the whole problem of interfaces
being tied to class declarations by separating the two concepts. An
interface is a promise to support certain methods in a class.
Classes can implement as many interfaces as they wish, which is the
Java way of achieving multiple inheritance. But, interfaces are in
no way tied to the class hierarchy. What’s more, you can declare a
parameter to be of the type of an interface. Consider the code in
Fig. 4.17.

Fig.
4.17
The PyObject interface

Fig.
4.18
The PyObjectAdapter
Like the C++ version, the Java
PyObject interface declares a str method, similar in purpose to the
C++ toString method.
Declaring the str method in
the interface means that all classes that implement this interface
must implement the str
method.
While the interface declaration
separates the interface from the class hierarchy, it also does not
implement any of the code for the interface. It’s often the case
that many classes which implement an interface will have at least
some common code. Either each class must implement the same code or
the programmer may choose to use inheritance to write an adapter
class that implements the interface and provides common code to
several subclasses. This is the case in Fig. 4.18. A significant
portion of the code is omitted for brevity here.
There are several things to note in
this code. Lines 4–7 define protected variables. Protected
variables are hidden (i.e. not accessible) from any code that is
not in the same package, in this case the jcoco package. Line 10 of the code
calls this() which calls
the default constructor on line 15 so that code common to both
constructors need not be duplicated.
Line 33 uses a decorator called
@Override. This decorator
tells the compiler that the method that follows overrides or
implements a method from a base class or interface. This is useful
in case you make a spelling error or incorrectly specify a type of
parameter of a method. Spelling a name wrong, or changing the type
of a parameter will result in a method that will never get called
because polymorphism only works when the name and the types of
arguments all match. Otherwise you are just defining a different
method since Java supports parametrically overloading names,
meaning that two methods may have the same name if they have
different types of parameters. So, @Override can be useful in catching
mistakes that might otherwise be difficult to debug.
Practice 4.4
We have seen how polymorphism is
provided by the C++ and Java programming languages. Polymorphism is
also provided by Python. Yet with Python we don’t declare methods
virtual, like C++, and we don’t have an built-in class hierarchy
like Java. How does polymorphism happen in Python?
You
can check your answer(s) in Section 4.34.4.
4.12 Functions as Values
Python is a dynamically typed
language. As such, methods are looked up at run-time, not compile
time. All methods and values in Python are objects that are stored
in dictionaries within other objects so they can be looked up at
run-time. The keys in these dictionaries are strings: the names of
the values, methods, or functions.
To implement a virtual machine that
works like Python’s virtual machine, it is necessary to treat
functions as values and store them in dictionaries, or hash tables,
like Python’s virtual machine implementation. C++ supports treating
functions as values. Using C++ we can write the following.

This code comes from the PyObject class in the C++
implementation of CoCo, in the file PyObject.cpp. There is a subtle nuance
to this code. Once the _ _str _ _ method
is set in the object’s dictionary any and all subclasses that
inherit from PyObject and
override the _ _str _ _ method
will automatically get the overridden method definition. There is
no need to set " _ _str _ _" to
point to the new, overridden _ _str _ _ method.
Because the _ _str _ _ method
is declared virtual, polymorphism means it only has to be set in
the dictionary once.
SImilar code is not possible in Java
because Java does not treat functions and methods as values. But
there is a solution. A class can simulate a function or
method. In fact, Java contains support for doing just this. JCoCo
implements its own version of run-time lookup of a method or
function by using objects to simulate the functions and methods.

Fig.
4.19
The PyCallable interface
JCoCo defines an interface called
PyCallable, shown in
Fig. 4.19. This interface defines one method called
_ _call _ _. This
method takes a list of PyObject references and returns a
PyObject as its result.
This reflects the calling mechanism within Python. All Python
functions are given a list of Python objects and return a Python
object. This uniformity means that any class that implements the
PyCallable interface can be
called by calling its _ _call _ _
method. This means that functions can be treated as values in JCoCo
by encoding the functions as objects.
4.13 Anonymous Inner Classes
Interfaces in Java specify the
methods that must be supported by classes that implement them. The
PyCallable interface,
described in the last section, specifies a _ _call _ _
method. Like other interfaces, there are adapter classes that
provide some common code for classes that choose to implement the
PyCallable interface. The
simplest of these is the PyBaseCallable class which is used in
the functions implemented in the PyObjectAdapter class and the
PyCallableAdapter class to
avoid a circular reference problem within the object hierarchy.
Another class also implements the PyCallable interface, named
PyCallableAdapter, which is
used by all other implementations of PyCallable other than those created in
the PyObjectAdapter and
PyCallableAdapter
classes.
Figure 4.18 contains code on
lines 19–29 that creates an instance of the PyBaseCallable adapter.
This is an example of an anonymous inner class. Line 19 creates a
class that has no name, but inherits from PyBaseCallable and
overrides the _ _call _ _
method. There is one instance of this PyBaseCallable class created, the
instance for this _ _str _ _ method.
When an object wishes to call the _ _str _ _ method
on an object it calls the callMethod method of the PyObjectAdapter class.

Fig.
4.20
The callMethod code
The callMethod code in
Fig. 4.20 looks up the method in the object’s
dictionary and if it finds it, it calls the _ _call _ _ method
on it, which in the case of the code on lines 19–29 of
Fig. 4.18 is overridden to call the str() method on the object and return a
PyStr object with the
contents returned by the str() method.
Anonymous inner classes are very
important in Java. An inner class is any class defined within
another class. Inner classes are important because they provide a
means to implement call-backs. When an event occurs in a Java
program, like an event from a GUI program (i.e. a mouse-click) or a
message being received from the internet, if a call-back has been
registered to handle that event, then the call-back is called.
Inner classes are the perfect way to implement a call-back because
the inner class automatically has access to all the variables and
methods of the outer object.
In the case of the code in
Fig. 4.18, the inner class is anonymous, it does not
have a name. This is okay because typically a call-back has one and
only one instance created for it, for a particular outer object.
Earlier versions of Java did not have anonymous classes leaving
Java programs littered with inner classes that only ever had one
instance created. Java programmers needed a more compact, precise
syntax to manage call-backs and anonymous classes were
introduced.
The advantage of the inner class
defined on lines 19–29 of Fig. 4.18 is on line 27 where
the str() method is called
which is a member of PyObjectAdapter which is the class of
the outer object in this instance. Since this is an inner class, we
can directly call the str()
method in this call-back. Anonymous inner classes are used
extensively throughout the JCoCo implementation.
4.14 Type Casting and Generics
Line 21 of the C++ code in
Fig. 4.13 is an example of declaring a variable
dict whose type is defined
by a template. A template is how C++ programmers write generic
classes. A generic class is usually a container of some type, in
this case a hash table. This hash table maps strings to functions
that are given a vector of
PyObject pointers and
return a PyObject pointer.
The vector class is again a
template. The vector passed to these functions is a sequence of
PyObject pointers.
Generics are an important part of
object-oriented languages. Generics let programmers re-use classes,
especially classes that are designed as data structures like maps
and vectors. A map is a
data structure mapping keys
to values. The type of the
keys and values can be practically anything. So, a generic
map class provides the
ability to map any type of keys to any type of values. In Java a map is called a HashMap. In C++ there are several kinds
of map classes. Please note that in C++ the standard map class is not implemented as a hash
table. It guarantees O(log n) insert and lookup time. A hash table
guarantees an amortized complexity of O(1) insert and lookup. The
unordered _map of
C++11 is implemented as a hash table. Before C++11 this class was
not included with C++.
Java has one type hierarchy.
Everything inherits from Object either directly or indirectly. By
having one type hierarchy the creators of Java could provide
container classes for many of the data structures we need in our
programs including HashMap
and ArrayList which
provides a means for storing a list of objects. For instance, if
you needed a list of objects to be returned from a function you
might code it as shown in Fig. 4.21.

Fig.
4.21
An ArrayList example
The BodyPart function, a part of the
PyParser module, returns a
list of PyByteCode objects.
When one of these PyByteCode objects is needed, we would
be forced to write code like this to access the first PyByteCode
object in the list.

The (PyByteCode), with the parens, is
called a type cast or just
a cast. Casting is
necessary when moving down in the inheritance hierarchy. A cast is
a way to tell the Java compiler that you know the actual type of
the value while the compiler does not. There is a run-time check
that is inserted into your code. If the cast is invalid, the Java
program will throw an exception so casting is safe. It’s just not
convenient and the extra run-time check is less than desirable,
although arguably better than not checking at all.
Casting is the same in C++ and Java.
The same issue occurs in C++. When moving down the inheritance
hierarchy, a cast would be required. C++ has a datatype similar to
Java’s ArrayList, called
vector. However, C++ does
not have one super class of all other classes. So the vector datatype would be a little
harder to write without something called generics.
Moving down the inheritance hierarchy
in either Java or C++ requires the programmer to write more code.
If we could avoid moving up the hierarchy in the first place, then
moving down again would become unnecessary. This is the aim of
generics. Generics were added to Java to make moving up and down
the inheritance hierarchy, and therefore casting, unnecessary in
many circumstances. Consider the real version of the BodyPart function in
Fig. 4.22.

Fig.
4.22
An ArrayList example using generics
In this code the angle brackets
(i.e.< and>) delimit the type of the ArrayList. The ArrayList is a list of PyByteCode elements. This declares the
specific type contained in the ArrayList making the declaration of the
ArrayList generic, so that
it can be a container of any type, not just Object values. To declare the
ArrayList class to be
generic the Java creators would have changed its definition to look
like this.

The type, T, becomes a parameter to the class
declaration. A version of the class is created for each declared
version of the ArrayList. So, when the ArrayList <PyByteCode> class is specified,
a PyByteCode ArrayList
object is created.
Python, since it is dynamically
typed, does not need generics. Generics are only needed for
statically typed languages like Java and C++. In C++ generics are
called templates. A template is a parameterized class. Like Java,
the parameter to the class is a type or types. Standard template
containers in C++ include unordered _map, map, vector, list, queue, stack, deque, set, and array among others. Consider the
declaration of the unordered _map template in C++ in
Fig. 4.23. This template definition shows us that
more than one type parameter can be used for a template or a
generic in C++ or Java. In the case of the unordered _map there are five type
parameters passed to the declaration of the map.

Fig.
4.23
The unordered _map template
Diamond notation is one topic related
to generics in Java. To save even more writing, Java programmers
may use diamond notation when writing a generic object declaration.
The declaration of the variable instructions earlier in this section
could have be written as follows if we would have used diamond
notation.

You can probably see the diamond in
the code. Since PyByteCode
is already written once on this line, using Java you don’t have to
write it again. The compiler can infer the type of the ArrayList as
it is created from the type of the reference instructions pointing to it.
4.15 Auto-Boxing and Unboxing
In C++ you can declare a vector of int if you need to by writing

It is not possible to declare an
ArrayList of int in Java. Templates in C++ can take
any type as an argument to the class, even primitive types. In
Java, only classes can serve as arguments to generics. The type
int is a primitive type.
That means it is not a class. However, the creators of Java
understood this problem and provided wrapper classes for each of
the primitive types so we could declare collections of ints for instance by wrapping each
int as an Integer. So, while we can’t declare an
ArrayList of int, we can declare an ArrayList of Integer as follows.

A wrapped int is created and added to our list as
follows.

And getting it back out again
involves writing some code like this.

This is the old way of wrapping and
unwrapping integers, and other primitive types, in Java. Once
again, programmers do this often enough they wanted a more compact
way of wrapping and unwrapping primitive types. Java programmers
refer to this as boxing and
unboxing. Recent versions
of Java support auto-boxing and unboxing. So, now in Java you can
write the following.

The variable
x’s value is auto-boxed
when it is added to the list and auto-unboxed when it is extracted.
Java determines when to box and unbox primitive values based on the
type of value and the method being called. The syntax is much more
compact and it is easier to read.
C++ does not support autoboxing and
unboxing, but since you can declare template containers of
primitive types it isn’t as necessary to wrap and unwrap primitive
values when using C++.
Practice 4.5
The Java ArrayList contains two overloaded
methods called remove. One
takes an int parameter and
removes an object at the specified index in the ArrayList. The other takes an
Object as a parameter and
removes the first instance of the object from the ArrayList. Why might this pose a
problem?
You
can check your answer(s) in Section 4.34.5.
4.16 Exception Handling in Java and C++
Java and C++ can throw exceptions and
catch them as in many languages. Sometimes exceptions are thrown in
code not written by us but code we use. For instance, indexing
beyond the end of a vector. Other times we may wish to throw an
exception. In C++ literally any type of value can be thrown.
Figure 4.24 shows how an object
called a PyException is thrown using C++. This code was taken from
the PyRange.cpp module. When indexOf is called beyond the end of a
range object, CoCo throws a PyException object with a value of
stop iteration as shown in
Fig. 4.24.
Using Java, you throw values that
inherit from the class Exception. Additionally, in some cases you
must declare that a function throws an exception. For instance, in
Fig. 4.25 the indexOf method declares that it throws
PyException. It turns out
in this case that declaring that the method throws this exception
is optional because PyException inherits from RuntimeException. Exceptions that
inherit from RuntimeException don’t have to be
declared to be thrown in Java.

Fig.
4.24
Throwing an exception in C++

Fig.
4.25
Throwing an exception in Java

Fig.
4.26
Catching an exception in C++

Fig.
4.27
Catching an exception in Java
Exceptions that are thrown can be
caught and the C++ version of the exception is caught in
PyFrame.cpp in the FOR _ITER instruction. The code
for this appears in Fig. 4.26. To catch an exception it must be thrown
in a try block or in some
code called from a try block. Then the type of value caught in the
catch must match the type
of value thrown. Figure 4.27 provides the Java version of catching an
exception. There is not much difference between C++ and Java in
handling exceptions. The additional code in Fig. 4.26 on lines 14–20 are
needed because C++ does not have garbage collection while Java
does.

Fig.
4.28
Signal handling
Figures 4.24 and 4.26 demonstrate how
exception handling can be used to implement iteration within the
CoCo interpreter, while Figs. 4.25 and 4.27 provide the Java
version for JCoCo. When the end of an iteration is reached, a stop
iteration exception is thrown and when caught it signals the end of
the iteration.
Exception handling is a means of
handling conditions within a program, whether planned or unplanned.
C++ and Java programs can throw and/or catch exceptions as needed.
However some problems in C++, like division by zero errors, do not
surface as exceptions. They are signaled instead which is the topic
of the next section.
4.17 Signals
The C version of exception handling
is signal handling. C programs can generate signals, but it is more
common to put a signal handler in place to handle signals generated
by the operating system. Figure 4.28 contains an excerpt
of the code from main.cpp where a signal handler is implemented and
is installed in main.
There are several types of signals
and the code in Fig. 4.28 is written to catch all of the signals
defined in the C standard. The constant signal types are defined in
an include called signal.h.
When a signal is generated the program immediately jumps to the
signal handler passing it the signal value that was generated. The
signal handler usually is written to report some type of error and
then terminates. The signal handler presented in
Fig. 4.28 does that. It prints a traceback of the
program and then terminates.
4.18 JCoCo in Depth
The rest of the chapter will cover
only the Java implementation of JCoCo. Many similarities exist to
the C++ CoCo implementation. JCoCo is mostly a superset of the CoCo
implementation. When there are differences between the two
implementations they will be noted. Primarily JCoCo adds support
for the creation of programmer-defined classes.
4.19 The Scanner
The JCoCo virtual machine reads a
CASM file as depicted in Fig. 4.1. The virtual machine
starts by using a scanner to return the tokens of the CASM file. It
is common to implement a scanner as a finite state machine. The
finite state machine consists of states and transitions between
states depending on the characters read from the input file. The
finite state machine accepts tokens of the CASM file. The finite
state machine employed by JCoCo is depicted in
Fig. 4.29.

Fig.
4.29
The JCoCo scanner FSM
When the parser is constructed, it
first creates a scanner to read the tokens of the CASM program. The
PyScanner’s getToken method is written as a finite state machine to
get the tokens of the CASM program. Figure 4.30 contains the
getToken and putBackToken methods for the scanner.
The putBackToken method is
capable of putting back one token which is used by the parser when
it has to look ahead one token to determine its next action.

Fig.
4.30
PyScanner getToken and putBackToken
methods
The scanner reads from a stream,
which in the case of JCoCo is a PushbackInputStream so that a character
can be unread. The scanner
also keeps track of its position within the file so each token can
carry along the position where it was found in the input
file.
The start state is 0 as shown in the
Fig. 4.29. There are several things to take note of
in the finite state machine. First, identifiers are accepted by
state 1 and are limited to letters and digits where the first
character is a letter. Underscore characters and @ characters are
considered letters by the scanner so tokens like @x _1 are
recognized as identifiers by JCoCo even though that is an illegal
identifier in Python. State 2 accepts integers. State 5 accepts
floating point numbers which must have a decimal point. Scientific
floating point notation is not accepted by JCoCo.
States 6 and 7 are responsible for
recognizing strings. These states keep reading until a single or
double quote is found to end the string. However, strings cannot
have a quote or double quote in them as they are defined. For
instance, the string ‘how’s it going?’ is not allowed because there
is no escape character implemented in JCoCo and the second quote
would end the string. The entire implementation of the finite state
machine can be found in PyScanner.java with only an excerpt of the
code appearing in Fig. 4.30.
The input stream contains a method to
put back the last character which is used by the scanner code on
line 38 of the finite state machine loop in Fig. 4.30. The last character
must be put back when a token is returned because the last
character is not a part of that token. Consider state 1 for
example. The finite state machine remains in state 1 as long as the
character is still a letter or a digit. When it is neither, the
foundOne variable is set to
true, terminating the loop. But, that last character may be part of
the next token and so is put back before returning.
Just before getToken returns a token, the token to
be returned is saved. This is used by the putBackToken method. If the last token
is put back then needToken
is simply set to false. When getToken is called, lines 2–5 check to
see if needToken is false
and if so return the token that was put back by the putBackToken method. By saving the
token before it is returned the last token is always remembered in
case it needs to be returned again.
4.20 The Parser
The tokens of a CASM file are read by
the parser and parsed according to the grammar rules in
Appendix A. Each BNF
non-terminal corresponds to one function in the parser. The parser
returns an abstract syntax tree representing the CASM program. In
this implementation, the abstract syntax tree is an ArrayList of PyCode and PyClass
objects, which means the ArrayList is declared as a list of
PyObjects.
Figure 4.31 contains an outline of the parse method and some of the code
called by parse which can
be found in PyParser.java.

Fig.
4.31
PyParser.java excerpt 1
Each method of the parser corresponds
to a non-terminal of the grammar. The implementation of each method
method is determined by the right hand sides of its rules. The
entire parser implementation is in PyParser.java.
Figures 4.31 and 4.32 contain two excerpts of this code.
Examining the rules for ClassFunctionList, FunDef, and ConstPart will shed some light on the
implementations of the methods in Figs. 4.31 and 4.32.

Fig.
4.32
PyParser.java excerpt 2

Starting with the ClassFunctionList non-terminal, its
rules say that either it is empty (i.e.<null>) or it is a
ClassFunDef followed by a
ClassFunctionList. How do
we know which rule to follow? The answer can be found by looking
ahead one token. If we examine the FunDef rule, it must start with the
keyword Function and that
should be the next token to be read in the ClassFunctionList implementation unless
the next part of the program is a class definition. In that case
the ClassDef non-terminal
requires a Class keyword.
To determine what to do we get the next token in line 29 of
Fig. 4.31, put it back right away, and check to see
if it was a Function or
Class keyword. If it was
either of these, then the first rule is executed by calling
ClassFunDef followed by
ClassFunctionList. If
Function or Class is not the next token, then we
return the ArrayList passed
to the method since we follow the <null> rule.
Why is an ArrayList of PyObjects passed to the ClassFunctionList method? This
ArrayList is the abstract syntax tree “so far”, as it has been read
up to this point in the parser. The ClassFunctionList method adds to that
ArrayList if it finds another function or class definition.
The FunDef method has only one rule to
follow. It is responsible for building a PyCode object to return to
the ClassFunDef method.
When FunDef is called, we
have already checked that the first token is the keyword
Function so lines 2–5 could
be omitted. The rest of the method gets tokens, checks them to see
if they are the expected tokens, and calls other methods of the
parser to read the rest of the function definition.
The ConstPart method has two rules to
follow, like the ClassFunctionList method. Again, it
must get a token to determine which rule to follow. If the next
token is not Constants,
then the empty rule is used and the ConstPart method returns an empty
ArrayList. Otherwise, it
returns an ArrayList of the
constants used in the function. Each constant string is used to
build a PyObject value for
that constant. The ArrayList nestedCFList is passed to the
ConstPart method because a
nested class or function is itself a constant value stored in a
PyClass or PyCode object respectively. When a
constant like code(g)
appears in the list of constants it tells the parser to look up the
code for it in the list of nested classes or functions passed to
the ConstPart method.
The code excerpts in
Figs. 4.31 and 4.32 demonstrate that the functions of the
parser are straightforward implementations of the rules in the
grammar. Once in a while a lookahead token is needed to determine
which rule to follow, but otherwise the parser gets tokens when
required and calls other nonterminal methods when indicated by the
rule. The trickiest part of writing the parser is probably
determining what should be returned. This is dictated by the
information that is required in the abstract syntax tree which is
determined by the intended use of the information in the source
file.

Fig.
4.33
listiter.casm
4.21 The Assembler
Before CoCo can execute the code in a
function, all labels must be converted to target addresses in the
instructions. Labels make no sense to the bytecode interpreter.
Labels are convenient for programmers but are not for code
execution. The assembly phase looks for labels and replaces any
instruction jump label with the address to which it corresponds.
For instance, consider the CASM program in Fig. 4.33. The label00 identifies the instruction at
offset 11 in the main function. The label01 maps to offset 18 and
label02 maps to offset 19.
The instructions on line 14, 17, and 23 need to get the offset, not
the label, of their intended targets. This is the job of the
assembler.
The assembler is
simple enough to include in the parser code when the body part of a
function is parsed. There are two parts to it utilizing a
HashMap to remember and
then update the target addresses in the code.
The code for the assembler is
contained in two of the parser methods, the LabeledInstruction method and the
BodyPart method. The
grammar rules surrounding this code are provided here.

The code for LabeledInstruction adds each discovered
label to a map from labels to integer offsets. Lines 32–39 of
Fig. 4.34 do this when they discover an instruction
contains a label. If the code finds a label, then line 34 adds the
label to the map making it point to the offset, called index in the code.
Target locations are updated in the
body of the function on lines 11–19 of Fig. 4.34. If an instruction
is found that uses a label as its target, the instruction is
deleted and a new instruction with identical opcode is created with
the actual target address of the instruction.

Fig.
4.34
Assembling a program
4.22 ByteCode
A PyByteCode object is created for
each instruction found in a CASM program. The class definition,
partially defined in Fig. 4.35, shows an enum being declared with all possible
opcodes. This enum
construct in Java is convenient and powerful. An enum is actually a
class definition that declares both the name of each enumerated
value and any attributes associated with it. In the case of the
PyOpCode enum each
instruction name has associated with it the number of arguments
that will be supplied with the instruction. Each instruction either
has zero or one arguments, which appear immediately following the
instruction in a CASM file. Referring back to
Fig. 4.33 most instructions in that example have one
argument with the exception of the GET _ITER, POP _TOP, and POP _BLOCK instructions which have
zero arguments.

Fig.
4.35
Static initialization
Enumerated values are convenient
because they aid in writing self-documenting code. The enumerated
values in the program are constructed as objects, one for each
value enumerated in the declaration. They can be referred to by
their enumerated value in code. For instance, in PyFrame.java a switch statement chooses
between possible instruction values.

This code gets the opcode from the
next instruction. The switch statement is written with each
instruction opcode enumerated in the case statements. This makes
the code very clear when examining it. The behavior of the
instructions is associated with the name of each instruction.
To build the PyByteCode objects from
the CASM file it is necessary to translate the string opcodes, like
"LOAD _FAST" into
their actual opcodes, like LOAD _FAST. To accomplish this
there has to be a way to look up a string and find its
corresponding opcode. This lookup is done in O(1) time by using a
HashMap. The hash map is
created once, when the program begins. When code is executed once
and only once at program initialization, it is called static initialization. Java and C++
both support static initialization of values.
Internally to the PyByteCode class
there are two statically allocated maps that help in translating
each instruction that is read by the parser into a PyByteCode
object. The code in Fig. 4.35 appears in the PyByteCode module. The two variables
OpCodeMap and ArgMap are statically initialized and
available to all code implemented in the class. OpCodeMap is used when an opcode name
is found in a CASM file. It serves to verify it is a valid
instruction and to provide a translation to its enumerated value.
ArgMap provides a count of
the number of operands, either 0 or 1, allowed for the instruction.
For example, looking up "BINARY _ADD" as OpCodeMap["BINARY _ADD"] would
yield the enumerated value BINARY _ADD.
Static initialization of variables
can be helpful when you have one-time code that needs to be run
during program initialization. This section demonstrates how to do
this using Java. Similar code exists for C++. See the module
PyByteCode.cpp and
PyByteCode.h in the CoCo
implementation for an example of doing this using C++ for further
details. In this Java version of it the two functions that create
the maps are executed when called by the static initialization on
lines 17 and 18 of the code in Fig. 4.35.
4.23 JCoCo’s Class and Interface Type Hierarchy
The JCoCo implementation consists of
approximately fifty six classes and interfaces. Approximately fifty
six classes because JCoCo continues to grow and evolve. Class
inheritance is used for code re-use and polymorphism throughout the
implementation. Figure 4.36 provides a look at the hierarchy of
classes and interfaces. The PyBuiltIns represent a collection of
classes for all the built-in functions provided with JCoCo which
include concat, print, fprint, tprint, iter, len, open, and repr.
The fprint and tprint built-in functions are not part of Python.
The fprint function is a functional version of print that takes one
value and returns the fprint instance. The tprint built-in function
takes a tuple and prints the elements of the tuple with spaces
separating values in the tuple. PyBuildClass is similar to a
built-in function, but is only accessible via the LOAD _BUILD _CLASS
instruction.

Fig.
4.36
JCoCo type hierarchy
PyIterators represents the collection
of all iterator classes which include all the iterable values
supported by JCoCo including dictionaries, lists, files, funlists
(which are functional lists implemented as head/tail links). The
key shows that the dark grey classes are used internally in the
JCoCo implementation and are not available to CASM programs. These
classes are part of the internal implementation of JCoCo and are
not accessible to the programmer. Some of them have been described
earlier in this chapter. The PyType class is used by all but two of
the JCoCo types of values in the construction of their type
objects. The type of exception and range objects had to inherit
from the PyType class so the behavior of calling the type could be
overridden since these two types can be called to build either an
exception object or range object, respectively.
There are two interfaces and three
adapter classes. When possible, adapters were written to allow code
to be shared between multiple classes. Consider the
PyPrimitiveTypeAdapter class. This class defines the magic methods
_ _repr _ _,
_ _str _ _,
_ _hash _ _,
_ _iter _ _, and
_ _type _ _. These
methods are called by the repr, str, hash, iter, and type methods.
The PyException class is one
interesting example of the need for multiple inheritance in Java.
PyException inherits from RuntimeException. By inheriting from
RuntimeException, JCoCo exceptions don’t have to be declared to be
thrown in each and every method that either throws or calls
something that could throw an exception. Without inheriting from
RuntimeException pretty much every method in JCoCo would have to be
declared as possibly throwing a PyException which would have made
quite a mess of the code.
Since PyException inherits from
RuntimeException it cannot inherit from PyObjectAdapter. Instead,
PyException implements the PyObject interface and therefore must
re-implement the methods common to the PyObjectAdapter class. With
multiple inheritance this could have been avoided. But, this is the
only case where multiple inheritance would have been useful in this
collection of classes.
JCoCo supports a number of different
types of values, including integers, floats, dictionaries, lists,
strings, booleans, and a few others. Each of these values has a
type associated with it. Each JCoCo object has a method called
getType that returns its
type. It turns out even types are objects and they too have a
type. Calling getType on a
type returns the type named type. This has to end somewhere and it
does with the type object named type. The type of type is type. This comes from the first two
lines of the initTypes
function in the JCoCo.java source file. The type object is created
on the first line and is created with itself as its own type
identifier.

The next few sections will dive
deeper into a few of the JCoCo classes and interfaces to explore
their purpose and how they fit in to the larger implementation of
JCoCo.
Practice 4.6
The existence of
a class named PySuper
suggests that classes can be built dynamically (i.e. at run-time).
The need for PySuper stems from needing to look up the superclass
dynamically, while the program is running, because in general it
cannot be known before its use. What instruction in appendix A
is responsible for getting JCoCo ready to dynamically create a
class?
You
can check your answer(s) in Section 4.34.6.
4.24 Code
A CASM function consists of more than
just a sequence of PyByteCode objects. There is the name of the
function, the number of arguments passed to the function, the list
of constants used by the function, the local variables, the global
variable references, and any enclosed functions or classes declared
within this CASM function. All this information and more is
encapsulated within a PyCode object.

Fig.
4.37
The PyCode class instance variables
From a Java programming perspective,
this code not that unique. It is a container for all the
information that goes along with each function. The class
declaration of instance variables is given in
Fig. 4.37.
PyCode objects cannot be executed.
There is no way to run a
PyCode object. To run code you need two things: the code and the
environment in which it should be run. The environment is the
variables, functions, and other values that are already defined and
initialized before the function executes. The environment and code
are both provided to PyFunction objects when they are executed,
which is described in more detail in Sect. 4.25.
When a Python function is encoded as
a PyCode object there are two lists that are necessary. They
reflect the eventual contents of the environment. The free
variables are variables that are referred to that exist in the
environment and not in the code of the function. The other list,
the cellvars, are a list of variables that come from the
environment or are part of an inner function’s environment that may
be modified and therefore have to be indirectly referenced. This
means that we go through an extra step to reference cellvars so we
can update their values while accessing them from another
environment. See Sect. 4.25 for an example.
4.25 Functions
Each function in a CASM file is
scanned by the parser and a PyCode object is created in the
abstract syntax tree to represent the code, its name and number of
arguments along with its declaration of constants, locals,
freevars, cellvars, and globals as described in
Sect. 4.24. But, PyCode objects are not callable as
shown in Fig. 4.36. To be callable you need both the code and
an environment in which to execute the code. The environment fills
in the gaps so to speak. The freevars are not defined within the
function’s code. The freevars come from the environment.

Fig.
4.38
The PyFunction constructor and
_ _call _ _ method
The code in Fig. 4.38 provides the
constructor and the call method for the PyFunction class. The
constructor builds what is called a closure from the environment
and the code. The closure is initialized on line 8–10 where we
iterate over the free variables in the code mapping the cell
variables in the closure from their free variable names to their
cells variable values. All variables that are accessed from the
closure are accessed indirectly, through cell variables.
When a function
gets called, the _ _call _ _ method
gets called. When this occurs a new PyFrame object is created. The
PyFrame contains the program counter and space for local variables
to be stored. A PyFrame object is executed by calling the execute
method.
Practice 4.7
What are the free variables and bound
variables in this Python function?

You
can check your answer(s) in Section 4.34.7.
4.26 Classes
User-defined classes in JCoCo are
collections of PyFunction objects and nested classes. The class
contains the name of the class, (i.e. its type) its super class,
and a list of PyFunction or PyClass objects as shown in
Fig. 4.39. This reflects the implementation in
Python. For instance, while it’s not normally written this way, to
add two integers together it is possible to write this.

Fig.
4.39
The PyClass constructor and
_ _call _ _ method

This code looks up the addition magic
method in the int class and calls it passing the two arguments to
the function. While addition is a method called on an integer
object, it can also be called on the class by providing both
integers.

When a PyClass is constructed it is
passed a list of inner classes and one PyCode object for each of
its methods. The PyCode objects are used to build PyFunction
objects on lines 22–23. The PyFunction objects are placed in the
attr dictionary. The
variable attr of the class
holds attributes that are to be passed on to any instance of this
class. The class contains the functions that will become methods of
any instance of the class. For example, the _ _add _ _
function in the int class will become a method in an instance of
the int class.
4.27 Methods
Instances of a class are created by
calling their class. For instance, writing Dog(“Mesa”) creates an instance of the
Dog class. The code that is
executed when a class is called is shown in Fig. 4.39 on lines 48–54,
which calls the initInstance method on lines 36–47. In
this code all PyFunctions found in the attrs variable are passed on to the
instance of the class (i.e. the object instance) as PyMethod
objects. Then, on line 52 the object’s constructor is called to
perform any initialization of the object. Continuing our example
from the previous section, this means that like in Python, the
following code can be disassembled and executed in JCoCo.

The integer 4
had to be written int(4)
because otherwise it would look like there was a decimal point in
the number. Syntactically it would not be an integer in that case.
Of course, the more usual way to call this method is by using the
overloaded + operator.
The PyMethod class is a wrapper class
for a PyFunction. A PyFunction of a class is passed on to any
instance of that class as a PyMethod as shown in
Fig. 4.39. Calling a method on an object is usually
written as object.method(arg1,
arg2, ...). It is useful to know that JCoCo passes the list
of args in reverse order. So the last argument is first, followed
by the second to last, and so on.
When a method is called as in
object.method(args) the
first argument passed to the method is always self in Python. This self variable is the reference to the
object. Since the arguments
are passed in reverse order in JCoCo, the self argument can be added to the end
of the arguments passed to the function that the method
encapsulates. This is shown in Fig. 4.40 on line 3 where the
reference to the current object is added to the end of the
ArrayList of arguments. Note that line 8 removes self from the
args ArrayList before
returning so the calling code won’t see a modified list of
arguments.

Fig.
4.40
The PyMethod
_ _call _ _ method
Practice 4.8
Consider the code in the example
below. When a Dog object is created its _ _init _ _ method
implicitly gets called. We never explicitly call the constructor on
a Python object. Where does _ _init _ _ get
called in the JCoCo virtual machine?

You
can check your answer(s) in Section 4.34.8.
4.28 JCoCo Exceptions and Tracebacks
The execute method for PyFrame exits in one of two ways.
Either the RETURN _VALUE instruction is
executed, or an exception occurs that is not handled within this
function. If an exception occurs, execution jumps to line 18 of the
code in Fig. 4.41. All intentionally thrown exceptions
thrown by JCoCo are PyException objects, so the catch block
on line 18 will catch it.

Fig.
4.41
A synopsis of exception handling in
JCoCo
JCoCo exceptions use the exception
handling mechanism of Java to jump to the code starting on line 18
anytime a PyException is thrown in a JCoCo program, which would
be any exception intentionally thrown by a CASM program. Upon
entering the catch block on line 19 the code looks for an exception
handler that may have been put in place to handle the exception in
this PyFrame object. There may be an exception handler and there
might not. JCoCo includes a block stack used for iteration and
exception handling. The block stack records the exit points of
loops and the addresses of exception handlers. For loops, the exit
point is pushed on the block stack in case a BREAK _LOOP instruction is
executed to break out of a loop. When an exception handler is put
in place the location of the handler is indicated by a negative
value on the block stack to differentiate it from loop exit points.
Lines 13–16 show how an exception handler is set up within a frame.
The SETUP _EXCEPT
instruction pushes the exception handler address onto the block
stack.
When an exception occurs the block
stack is popped until the address of an exception handling block is
found (i.e. a negative value is popped). The address of the
exception handler is the negation of this negative value.
When an
exception occurs it can occur anywhere within the code. In
particular there could be operands left on the operand stack
because the exception occurred in the middle of some other work.
For instance, an exception might occur while preparing for a
function call and arguments may be left on the operand stack. The
PyMarker class serves to help clean up the operand stack after an
exception is caught. When an exception handler is installed with
the SETUP _EXCEPT
instruction, a PyMarker object is pushed onto the operand stack. If
a PyMarker object is popped during normal instruction execution, it
is just thrown away. If an exception occurs the exception handling
code on line 25–29 pops arguments from the operand stack until it
is emptied or until a PyMarker object is found, thus cleaning up
the operand stack.
If an exception handler was found in
the JCoCo function’s code, the exception is pushed onto the stack
to get ready to jump to the exception handler code. Lines 31–33
seem a little strange until you remember that JCoCo maintains
compatibility with Python 3.2 and disassembled Python code expects
there to be three operands pushed onto the stack when an exception
handler begins executing. Line 34 causes execution to jump to the
first instruction of the exception handler. The Python virtual
machine also assumes there is another exception handling block
pushed on the blockStack. This is not needed by JCoCo, but to
maintain compatibility line 35 pushes an entry onto the block
stack.
If no exception handler is found then
line 38 adds the current frame to the exception’s traceback. The
traceback is a list of all the PyFrame objects that are popped
until an exception handler is found. If no exception handler is
found, control returns to the main function where the traceback is
printed.
Non-JCoCo Java exceptions may also be
thrown by JCoCo code. This can happen in one of two scenarios.
JCoCo may have a bug and if so, an exception may be thrown due to
some unforeseen circumstance. The other reason a standard Java
exception may be thrown would be due to the programmer attempting
an illegal operation, like an arithmetic operation causing integer
overflow for instance. If either of these scenarios occur, then
lines 42–49 will handle those exceptions. When a non-JCoCo Java
exception occurs a PyException is created, the current frame is
added to its traceback and the PyException is thrown. If this
PyException is not caught anywhere within the CASM program then
control will return to the main function in JCoCo.java and the
exception’s traceback will be printed as the source of the
error.
4.29 Magic Methods
The JCoCo implementation of Python’s
virtual machine makes use of inheritance to reuse code in the
implementation and to create is-a relationships between the objects
manipulated by JCoCo. That type hierarchy is provided in
Fig. 4.36. All of the JCoCo datatypes implement the
PyObject interface which sits at the root of this type
hierarchy.
Through inheritance every descendent
of PyObjectAdapter contains a dictionary called dict that maps method names to methods
as shown in Fig. 4.18. In Python, when a method is called, the
method name is looked up in this dictionary to locate the code that
corresponds to the method name. JCoCo mirrors this implementation
to emulate the dynamic run-time typing behavior of Python. The
dynamic lookup of methods occurs in the callMethod method of the
PyObjectAdapter class as described in the related
Sect. 4.13.

Fig.
4.42
PyObjectAdapter’s constructor
Figure 4.42 contains four
methods that are defined on every type of value in JCoCo. For
instance, any object can be converted to a string and all objects
have a type within the hierarchy that can be retrieved. Notice that
all these methods have the same signature. Each function or method
in JCoCo (and Python) takes a list of objects as arguments and
returns an object. Every JCoCo object implements methods with this
signature and only this signature. The _ _str _ _ and
_ _type _ _
methods are called magic methods by Python developers because they
get called automatically by certain operators in Python. For
instance, converting a PyObject to a string calls the _ _str _ _ magic
method to get a string representation of the object. Calling
type on an object in a
Python program results in calling the _ _type _ _ method
to get an object’s type.
Calling repr on an object in a Python program
calls the _ _repr _ _
method. The repr function
returns a string representation of an object that, when evaluated,
would yield a copy of that same object. The hash function may also be called on any
object, but only hashable objects implement the _ _hash _ _ magic
method with something other than throwing an exception.

Fig.
4.43
PyInt’s additional magic methods
Methods are the operations that can
be performed on objects within the JCoCo type hierarchy. Magic
methods are methods that get called as a result of either a
built-in function call or some operator overloading in Python. The
_ _str _ _,
_ _type _ _,
_ _repr _ _, and
_ _hash _ _ magic
methods are added to the dictionary of methods for all objects by
the PyObjectAdapter constructor. Figure 4.42 shows the methods
that are supported by the PyObjectAdapter class for all subclasses.
But, subclasses of PyObjectAdapter may add to the map of supported
operations. For instance, the PyInt object’s constructor calls the
funs method to add a whole
host of additional supported methods to integer objects as shown in
Fig. 4.43.
The method name, called just
name in the code in
Fig. 4.20, is searched for in the dictionary. If it
is not found, an exception is thrown. Otherwise, mbr is made to point at the code of the
method (i.e. a PyCallable object). Line 6 calls the _ _call _ _ method
for this member function or method on the current object, returning
whatever is returned from the call to the caller. The use of the
dictionary maps names (i.e. strings) provided to JCoCo, to the
methods of objects, which are implemented as PyCallable objects,
within JCoCo. If the object does not have a method defined, the
callMethod code gracefully
handles this by throwing an exception which will result in a
traceback being printed of the offending CALL _FUNCTION instruction.
Within JCoCo, magic methods get
called as the result of many instructions. For instance, the
_ _add _ _ magic
method gets called on an object as the result of executing the
BINARY _ADD
instruction. The COMPARE _OP instruction calls
several different magic methods depending on the comparison operand
of the instruction. Precisely which magic method gets called for a
given instruction is detailed in the documentation provided in
Appendix A.
Practice 4.9
How can an
object or class override the default behavior of a magic method
like the _ _str _ _ method
without changing the JCoCo virtual machine itself?
You
can check your answer(s) in Section 4.34.9.
4.30 Dictionaries
Python implements a type called
dict, short for dictionary,
which is a map from keys to
values. Dictionary objects
are implemented as a hash table with O(1) get and set methods. A
dictionary is created by using the braces around an optional list
of key/value pairs. Line 2 of Fig. 4.44 shows an empty
dictionary being created. Items are put in the dictionary using
subscript notation. The key is the subscript and the value is the
assigned value at the key’s location. Lines 3–5 provide an example
of storing key/value pairs in a dictionary. Line 11 uses subscript
notation to look for a value corresponding to a key.

Fig.
4.44
dicttest.py
Dictionaries
differ from lists because the subscript can be almost any type of
value. Dictionaries are not limited to integer subscripts like
lists. There are three requirements of a dictionary key. The key
must be hashable. Hashing
refers to deriving an integer from a value, as close to unique as
possible. Keys in a dictionary must support an equality test. There
must be a way of determining if two keys are equal. Finally, keys
should not be mutable. Python lists are not suitable for keys
because they are mutable. In JCoCo strings, integers, floats,
tuples, and funlists are not mutable and therefore are suitable as
keys in a dictionary. Funlists are a JCoCo specific functional
programming list that is not a part of standard Python, but is
supplied with JCoCo. Floats are not usually used as keys due to
their being approximations of real numbers and the chance for
round-off error in calculations. In general floats should not be
compared for equality and therefore, while immutable, they are not
usually appropriate as keys in a dictionary.
The dict datatype is not included in the
JCoCo implementation available to you on Github. This section
describes the steps to add dictionaries as a case study of
extending the JCoCo virtual machine. You must have the Java
development environment installed on your computer to complete the
steps described here. When you have completed the steps in this
section the disassembled code from Fig. 4.44 will execute on
JCoCo producing output similar to that of Python.
4.30.1 Two New Classes
Two new classes are required to
support dictionaries; the PyDict class and the PyDictIterarator
class. The PyDict class resembles the PyList class in some ways.
The PyDict class must be added to the Java source code in a file
called PyDict.java. An excerpt of the PyDict.java header file is
given in Fig. 4.45.

Fig.
4.45
Outline of PyDict.java
There are several methods to be
implemented to support dictionaries. The _ _getitem _ _
method is given a key in the args vector and returns the
correponding value. The _ _setitem _ _
method maps a key to a value. The key is at index 0 in args and the value is at args[1]. The _ _len _ _ method
returns the size of the map. All of these methods use the HashMap
called map and the methods
of a HashMap are used in the implementation of the PyDict
class.
The map instance variable is central to the
PyDict implementation. A HashMap is part of the utility library of
Java. There is a subtle implementation detail with the use of
HashMap here. Java provides built-in support for hashing items. The
Object class of Java
includes a method called hashCode that is used to get a hash
value for any hashable Java value. But, in JCoCo we wish to call
the _ _hash _ _ magic
method to give the JCoCo assembly language programmer control over
how an object is hashed. The PyObjectAdapter class comes to the
rescue here by defining the hashCode method to call the
_ _hash _ _
magic method.

This means that
the JCoCo PyDict implementation can just use the HashMap as you
would in any Java program and the _ _hash _ _ magic
method will automatically get called when needed.
Many of the classes provided with
Java have hashing functions defined for them already. The PyStr
class can use the string hashing function provided by the
hashCode method of
PyObject. That hashCode
function can be called in the _ _hash _ _ magic
method of PyStr to hash the string. Every type of object that could
be used for a key in a dictionary must implement the _ _hash _ _
method.
The HashMap needs to determine if two
keys are equal as part of the hash table implementation. The
HashMap needs to know if the key it is looking up matches one that
it finds in the hash table. Again, PyObjectAdapter comes to the
rescue. The HashMap automatically calls the equals method of Java Object to
determine if the two keys are equal or not. The PyObjectAdapter
overrides the equals method
to call the _ _eq _ _ magic
method. So, the HashMap automatically makes calls to both the hash
and equals magic methods. This means that the equals magic method
must be implemented in any class that will be used as a key in a
dictionary. Of course, this is already done for the built-in types
of JCoCo.
The other class to be written is a
PyDictIterator class that implements iteration over the keys of the
dictionary. The PyListIterator can be used as an example in how to
write this iterator. Remember that to terminate the iteration, the
PYSTOPITERATION exception
must be thrown once the iterator is exhausted. Here is how
iteration is achieved over a Java HashMap object.

This code must be divided up into the
PyDictIterator implementation in a manner similar to the
PyListIterator code.
4.30.2 Two New Types
In addition to the new classes, two
new types must also be defined. The main module, JCoCo.java,
contains a function called initTypes. The dict and dict _keyiterator types should be
added as two new types to this function. To do this, two new values
for the PyTypeID enum in PyType.java must also be defined; the
PyDictType and PyDictKeyIteratorType values. This is a
relatively simple addition to the code, but must be tied together
with the implementations of the PyDict and PyDictIterator classes.
Once these types are created, don’t forget to set the instance
functions in the type objects.
4.30.3 Two New Instructions
Finally, after disassembling the code
in Fig. 4.44 a new instruction appears, the
BUILD _MAP
instruction. This instruction creates an empty dictionary and
pushes it onto the operand stack.

Fig.
4.46
Initializing a dictionary
Disassembling the code in
Fig. 4.46 yields one other instruction. The
STORE _MAP instruction
expects three operands on the stack. The TOS element is a
key, the TOS1 element is a
value and the TOS2 element
is a dictionary. The STORE _MAP instruction stores the
key/value pair in the
dictionary and leaves the dictionary on top of the operand stack
when it completes. These two new instructions are implemented in
PyFrame.java in the execute
method. You can look at other examples of instructions to see how
these two instructions should be implemented.
4.31 Chapter Summary
This chapter covered object-oriented,
imperative programming in Java with C++ covered to a lesser degree.
Advanced techniques including inheritance, polymorphism,
interfaces, generics, autoboxing and unboxing, inner classes and a
few other important topics were covered with examples coming from
the JCoCo virtual machine implementation.
Java and C++ are statically typed
languages as compared to Python, which is a dynamically typed
language. Static typing requires more work by the programmer when
writing code, but also provides some level of assurance that code
is type-correct. Python program type errors are not found until
run-time. The C++ and Java compilers catch most type errors at
compile-time.
C++ and Java share a lot of syntax,
but they are distinct languages in many ways. C++ is especially
suited to low-level and real-time implementations where performance
is critical and you need access to the underlying hardware. C++
gives you complete control of when dynamically allocated data is
freed. But with that responsibility comes the age old problem of
memory leaks. C++ programs are prone to memory leaks and the C++
CoCo implementation is full of them because it is difficult to
determine exactly when space can be freed in the virtual machine
without implementing some form of garbage collection. C++ has many
great programming features like templates, a large standard
library, and compiler support for many hardware platforms.
Java programs benefit from garbage
collection which is built into the JVM. Java also provides a
unified type hierarchy for classes with Object at the root of the
tree. C++ has no built-in class hierarchy. Java has built into it
several nice programming features like auto-boxing and unboxing,
threading support including synchronization support for all Java
objects, inner class support, and support for separating interfaces
from implementation.
Both C++ and Java serve as good
examples of statically typed, object-oriented, imperative
programming languages. They each have their benefits and
shortcomings, but for the CoCo and JCoCo virtual machines, Java is
the better-suited language providing garbage collection and a more
unified approach to handling exceptions within the virtual machine.
In the next chapter we introduce another programming paradigm while
studying another statically typed language.
4.32 Review Questions
- 1.
What does static type checking mean? Does C++ have it? Does Python have it? Does Java have it?
- 2.
What are the names and purposes of the two programs that make up the Java environment for executing programs?
- 3.
What is the number one problem that C/C++ programs must deal with? Why is this not a problem for Java and Python programs?
- 4.
What does the make tool do and how does it work for C++ programs?
- 5.
Is there an equivalent to the make tool for Java programs?
- 6.
How does the C++ compiler distinguish between macro processor directives and C/C++ statements?
- 7.
What is a namespace in C++? What is comparable to a namespace in Java? In Python?
- 8.
What is the default executable name for a compiled C++ program?
- 9.
What is separate compilation and why is it important?
- 10.
What is dynamic linking? Does it happen in C++ or in Java? Why is it important?
- 11.
Which environment has garbage collection built in, C++ or Java?
- 12.
What are the advantages of garbage collection?
- 13.
Are there any drawbacks to garbage collection?
- 14.
What is a destructor and when is it needed?
- 15.
What do you have to write to get a polymorphic method in C++?
- 16.
What is the purpose of polymorphism?
- 17.
What is the purpose of inheritance?
- 18.
How do interfaces and classes differ in Java? How are they similar? How are they different?
- 19.
What is an adapter class? Why are they useful?
- 20.
What is a callback and how are they usually implemented in Java?
- 21.
What are generics? Why are they convenient?
- 22.
What is a template? How do you declare a vector in C++?
- 23.
What is auto-boxing and unboxing?
- 24.
How is a function represented as a value in Java?
- 25.
What is an anonymous class?
- 26.
What is the type(6) in JCoCo and Python? How about the type(type(6))? How about the type(type(type(6)))? Why isn’t it interesting to go any further?
- 27.
The JCoCo scanner is based on a finite state machine. How is the finite state machine implemented? What are the major constructs used by a finite state machine?
- 28.
Does the JCoCo parser run bottom-up or top-down?
- 29.
In JCoCo how are a PyCode object and a PyFunction object related?
- 30.
What is a traceback and why is it important?
- 31.
What is the purpose of a PyMethod class?
- 32.
Arriving at hash values for hashable objects in Java is trivial. Describe how JCoCo determines hash values for objects in the implementation of PyDict objects.
4.33 Exercises
- 1.
Alter the finite state machine of PyScanner.java to allow strings to include the escape character. Any character following the backslash, or escape character, in a string should be allowed. This project can be implemented by altering the PyScanner.java class to allow the escape character to appear within a string. Hint: Two extra states may be needed to implement this code. Note that JCoCo will already allow pretty much any character, including tabs and newline characters, to be included in a string constant. The only characters that pose problems are single and double quotes. The escape character should not be included in the constant string, only the character that follows the escape character.
- 2.
Implement true division and floor division for floats in JCoCo. Write a test program to thoroughly test these new operations supported by floats. The test program and the source code are both required for the solution to this problem. You may use the disassembler to help generate your test program.
- 3.
Alter the JCoCo grammar to allow each line of a function’s code to be either a JCoCo instruction or a source code line. Any source code line should be preceded by a pound sign, a line number, and a colon followed by the text of the source code line. A source code line would reflect a line from a source language other than JCoCo which was compiled to the JCoCo assembly language. Then, when an uncaught exception occurs in the JCoCo program, the traceback should be printed along with the source code line that caused the exception. This is a challenging exercise and requires changes to the scanner, parser, internal storage of PyCode objects, and traceback handling.
- 4.
Add a dictionary object type to JCoCo by following the description at the end of this chapter. This project requires significant programming and there are pieces in the last part of the chapter that are left out. However, the provided code samples along with other similar code in the JCoCo project provides enough details to be able to complete it. When done, the successful project will be able to run the disassembled code from Figs. 4.44 and 4.46. The output should appear to be identical to the output produced by running the Python programs. However, the order of keys may be different since dictionaries are implemented with an unordered _map datatype.
- 5.
Empty type calls produce empty results in Python but not in JCoCo. For instance, when int() is called in Python, the object 0 is created. In JCoCo this produces an error. Use Python to determine what should happen for all the empty type calls that JCoCo supports. Then modify CoCo so it will behave in a similar fashion.
- 6.
Add a set datatype to JCoCo. Lookup the set datatype in Python documentation. Include support in your set datatype to support contructing a set, union, intersection, mutating union, and mutating set difference along with set cardinality, membership, and addition of an element to the set. Write a Python test program and disassemble it. Then run your test program to test your set datatype.
- 7.
Modify JCoCo to allow instructions like LOAD _FAST x in addition to LOAD _FAST 0. Currently, the LOAD _FAST and STORE _FAST instructions insist on an integer operand. If an identifier operand were provided then the identifier must exist in the sequence of LOCALS. If it does not, the parser should signal an error. Internally, the LOAD _FAST and STORE _FAST instructions should not change. The conversion from identifier to integer should happen in the parser. Convert the LOAD _GLOBAL, and LOAD _ATTR instructions to allow either an identifier or integer operand in the same manner. Do not try to modify the LOAD _CONST instruction since it would be impossible to distinguish between indices and values for constants.This project is not too hard to implement. Labels are already converted to offsets in the parser in the BodyPart method. That code has to be modified slightly to handle identifiers for things other than labels. The identifiers for the load and store instructions can be converted to integer operands in the FunDef function.
- 8.
Currently the assembler has three different load instructions including LOAD _FAST, LOAD _GLOBAL, and LOAD _DEREF that all use indices into different lists as operands. Define a new pseudo LOAD instruction that lets you specify an identifier for a value to load. For instance LOAD x would result in scanning the LOCALS list for x. If x were found in the first position of the locals list, then the LOAD x would be changed to a LOAD _FAST 0 instruction. Otherwise, if x was not in the list of locals, then the GLOBALS would be scanned next and if x were found there a LOAD _GLOBAL instruction would replace the LOAD pseudo instruction. If x was not found in the globals, then the cellvars could be scanned and finally the freevars. Create a STORE pseudo instruction as well for the STORE _FAST and STORE _DEREF instructions.Do not try to implement the pseudo instructions for any of the other load or store instructions. For instance, it would be impossible to know whether a LOAD referred to a LOAD _DEREF or a LOAD _CLOSURE if you tried to include LOAD _CLOSURE in your pseudo instruction.
4.34 Solutions to Practice Problems
4.34.1 Solution to Practice Problem 4.1
The Java compiler insists that if the
class is called Test then
the file must be Test.java.
This is necessary because when class A is compiled and uses the Test class, the Java compiler must find
the Test class. There is
nothing included or imported into class A to tell the compiler where to look.
Instead, Java looks for a file called Test.java if class Test is used in class A.
4.34.2 Solution to Practice Problem 4.2
The C++ compiler
uses macro processor directives to explicitly include header files
which declare classes and other entities. The header file names are
explicitly provided in the include macro processor directive. The
C++ compiler does not need to infer the name of the module from the
name of the class like Java does.
4.34.3 Solution to Practice Problem 4.3
In C++ programs the name of the
executable for the program is passed in argv[0]. So the value of argc is always at least one. In a Java
program the program consists of a collection of .class files. The
main function must be defined inside the public class for one of
these .class files, so the name of the main module is always known
by the programmer and therefore is not passed as an argument.
4.34.4 Solution to Practice Problem 4.4
Polymorphism occurs in Python because
methods are looked up by name at run-time. This leads to run-time
type checking, not compile-time as is supported by C++ and Java.
C++ and Java are statically typed languages. Python is dynamically
typed. Further, Python supports inheritance, but the purpose of
inheritance in Python is only for code re-use. Polymorphism is not
related to inheritance in Python programs since polymorphism occurs
because of the late, just in time, lookup of methods in an object’s
attribute dictionary.
4.34.5 Solution to Practice Problem 4.5
The confusion may occur when an
ArrayList of Integer was declared. When calling
a.remove(1) will autoboxing
occur? The answer is no, in this case because the ArrayList class contains a method with
int as a parameter Java
will choose the method with the closest argument types when there
are multiple to choose from. Nevertheless, the programmer must be
aware of this to correctly choose the right remove method. If the Object argument version is really the
one to call, then it must called as a.remove(new Integer(1)) to get the
correct remove called.
Boxing must be explicitly called in this case. No autoboxing will
occur.
4.34.6 Solution to Practice Problem 4.6
It’s the LOAD _BUILD _CLASS
instruction. This instruction loads the built-in function onto the
operand stack so it can be called to dynamically build a
class.
4.34.7 Solution to Practice Problem 4.7
The bound variables are f, x, and y. They are bound because the name of
the function is always defined and the parameter name is given the
value of any argument passed to the function. The variable
y is bound to the same
value as x because
y appears on the left side
of an assignment statement.
The free variables are aVal and lstInts. These values have to be
supplied in the environment by forming a closure before the
function can be executed.
4.34.8 Solution to Practice Problem 4.8
The _ _init _ _
gets called during object instantiation on line 52 of
Fig. 4.39.
4.34.9 Solution to Practice Problem 4.9
Since magic methods are looked up at
run-time (i.e. dynamically) then at any time before the magic
method, the object can have its magic method implementation
replaced by an alternative implementation. There is no modification
necessary to the JCoCo Java code.