Compiled vs. interpreted languages
Programming languages can be classified in many ways; one form of classification is whether they are compiled or interpreted.
In order for any software to run on a computer it must be converted into machine code: the sequence of 0s and 1s representing a set of instructions that can be comprehended by a computer processor. A compiled language is one where the source code is converted into machine code by a tool called a compiler, and packaged as a program that can be distributed directly to end-users.
Because each operating system is different, compilers need to be written for each operating system. The beauty of compilers, however, is that the same source code can be used to create executable programs for different operating systems. Examples of compiled languages are C and C++.
Interpreted languages also need their source code to be converted to machine code in order to be run on a specific operating system, but with interpreted languages this happens in real-time as the program executes. This means that the same code can be executed on any operating system supporting an interpreter for that language (which is the software that converts the source code into machine code in real time).
Interpreted languages are generally slower than compiled languages, because the compilation step is effectively happening while the program executes. Examples of languages that can be run by interpreters are JavaScript and Perl. Many interpreted languages can also be compiled, however, therefore the line between interpreted and compiled languages is sometimes blurred.
Java does not easily fit into either of these categories. Java is a compiled language: all source code must be compiled before it can be executed. As you will see, the Java Development Kit (JDK) contains a utility program called javac that is responsible for compiling Java source code.
Unlike traditional compilers however, the Java compiler does not produce machine code for a specific operating system. Instead, it produces Java bytecode.
Java bytecode does contain a set of instructions, just like machine code. Unlike machine code, however, Java bytecode cannot be executed directly by a computer. Java requires an intermediary called the Java Virtual Machine – which is provided by the Java Runtime Environment (JRE).
Any computer that wishes to run a program written in Java must have a JRE installed, and this will perform the task of converting the Java bytecode into machine code for the specific operating system as the program executes.
If you use Windows or OSX you have probably noticed that you are often requested to install a new version of Java: this is the Java Runtime Environment. Once Java is installed you can run any application written in Java.
You may be wondering why Java uses this approach? Java bytecode can be thought of as a halfway house between source code and machine code. It contains a set of instructions that can be executed by an interpreter more efficiently than regular source code.
Java chose this approach as part of its “write-once, run-anywhere” philosophy. The same compiled version of the program can be executed on any computer that has a JRE installed: it is not necessary to build different versions of the program for each platform. This has been immensely valuable, because JREs are available not just for desktop computers, but for mobile phones, set-top boxes, and home appliances. In fact, Oracle claims some 3 billion devices run Java.
This approach was also intended to make the language more secure. Traditional compiled programs can do anything on the operating system that the user running them can do – including deleting files, or sending them over the Internet to a hacker. The JRE by comparison provides a sandbox that programs run inside of, and it is possible to control what the program can do inside this sandbox, or make it request the ability to perform certain operations. This approach has become standard with Android and iOS in recent years, but was built into Java from the very beginning.
Because Java bytecode is converted to machine code as the program executes, Java shares characteristics with interpreted languages. When Java was first released this was a reasonable way to look at it, and Java was marketed as an interpreted language.
As mentioned earlier, however, interpreted languages tend to be slower than compiled languages. Even though Java bytecode is more efficient to convert to machine code than raw source code, there is still some overhead. As a result, early versions of Java were slow compared to compiled languages, and this perception has stuck to some extent. Any discussion of Java on Slashdot still immediately descends into a flame war on the performance of Java.
Java has not been an interpreted language for a long time however. Almost all Java runtimes now use an approach called Just-in-Time (JIT) compilation. The Java bytecode is compiled into machine code as the program runs, just before it is needed. Although there is still some overhead in this process, Java tends to perform as well as compiled languages; with the added benefit that the same Java bytecode can be run on many different operating systems.
One interesting aspect of Java bytecode is that languages other than Java can generate it. There are now many languages that generate Java bytecode, including several mainstream languages such as Scala. In fact, the latest version of Java comes complete with a JavaScript engine called Nashorn that integrates with the Java language, and allows you to write JavaScript code that executes on the Java platform.
Although the version of Java most commonly used is owned by Oracle, anyone can technically write a JDK and JVM. The most common alternative to the Oracle version is OpenJDK, which is an open source implementation of Java.