Broad Translators
What you'll discover
- The spectrum of languages, from high abstractions to low machine instructions
- An deep look at many translators, including preprocessors, linkers, and compilers, etc.
- Dive into Microsoft's ecosystem, with a focus on the Common Language Runtime (CLR)
- comparisons of key terms: bytecode, binary code, machine code, managed code, and unmanaged code, etc.
- The distinctions between Intermediate Language (IL), Intermediate Representation (IR), and Assembly
- Table concludes that support Just-In-Time (JIT) or Ahead-of-Time (AOT) compilation
- The relationship between JIT/AOT compilation and static/dynamic typing in programming languages. Correlation || causation?
Languages from High to Low
high-level language
Usually what we write in
assembly language
- Human readable
- Some developers write assembly language for optimization
machine language
- Readable by computers
- Also known as binary language
The term "source code" depends on context; it can refer to high-level language or assembly language.
Broad Definition of Translators
Broadly includes all tools capable of form conversion:
- preprocessor
- compiler
- assembler
- interpreter
compiler
Not necessarily into an executable file. For example, Java compiles to bytecode for its JVM.
Interpreter
Unlike a compiler, it reads and executes simultaneously without creating an executable file.
An Interpreter directly executes instructions written in a programming or scripting language without previously converting them to an object code or machine code.
assembler
assembly language to machine language
Preprocessor → Compiler → Assembler → Linker
Tools and Their Associated Languages
- Pre-processor: Handles preprocessing directives like
- #include
- #define.
- a MUST for C, C++
- Compiler: Converts preprocessed code to assembly language. Needed for most compiled languages.
- GCC for C, C++
- rustc for Rust
- Rust to IR to Assembly
- javac for Java
- MSVC for C, C++ (windows platform)
- Assembler: Converts assembly code to object files. Low-level languages like assembly start here.
- Linker: Links multiple object files and libraries into an executable file.
Required Tools for Each Programming Language
tools that Language requires:
- C/C++: Needs all four
- Java: Compiler converts source to bytecode, doesn't need assembler and linker
- Java Source Code
- Java Compiler
turn .java into .class
- Java Bytecode
- JVM (Class Loader → Bytecode Verifier → Interpreter/JIT Compiler) execution
turn .class into machine code
- Machine Code
- Python: Usually only needs an interpreter, not the above tools
- The most common is CPython, which is implemented in C. There are other implementations, such as Jython (Java) and IronPython (.NET)
Parsing (to AST), then AST to Bytecode (implicit step), then start interpretation(execution)
- Assembly: Starts from assembler, doesn't need pre-processor and compiler
Your assembly code assembling, linking, loading then execution
Interpreted Languages and Tools
Directly translates and executes source code step by step.
Examples: Python, Ruby, PHP, Perl, Lua
JavaScript engines use JIT technology (v8 engine)
Usually don't need preprocessors or assemblers… but may have preprocessor-like concepts (macro expansion):
- Python: Module imports, decorators
- Ruby: Module imports, metaprogramming mechanisms
- JavaScript: Module imports (ES6+), transpilers like Babel
We don't need to know exactly how this works, but the underlying mechanisms do have functions similar to an assembler:
- Virtual Machines: use VM to execute code.
- CPython: Python uses the Python Virtual Machine (PVM)
- Java uses the Java Virtual Machine (JVM).
These virtual machines may internally use Assembly Language to implement certain functionalities or optimizations.
- JIT Compilation: Some interpreters use JIT compilation techniques to compile frequently used code segments into machine code. This process may involve optimizations at the Assembly Language level.
e.g., Modern JavaScript engines use JIT compilation, compiling hot spots of code into machine code at runtime.
- Bytecode: Many interpreted languages first compile source code into bytecode. The VM then interprets and executes this bytecode. Can be viewed as a high-level form of Assembly Language.
e.g., Python primarily uses an interpreter for execution, but it compiles source code into bytecode (.pyc files) to improve execution efficiency.
Which role in translators is JIT technology closest to?
Compiler, as it only compiles the parts that need to be executed.
CIL and CLR in the Microsoft Ecosystem
Let’s discuss CIL and CLR here.

CIL (MSIL)
C# and VB.NET use the .NET Platform Compiler, also known as Roslyn, to translate into usable IL at runtime
CIL (MSIL) is divided into two categories: .DLL & .EXE, including metadata information
DLL v.s. EXE: DLLs require a host EXE, while EXEs are independent processes
- EXE: An independent program with its own address space, aimed at executing applications
- DLL: Requires a host, mainly prepares methods/classes for use by other applications, Microsoft's implementation of shared libraries.
Assembly not only just includes CIL, but also:
- Type Information
- Security Information
Is CIL considered Assembly Language?
- CIL is higher-level (more readable, more abstract)
- CIL is platform-independent (not targeting specific platforms or hardware architectures, cross-platform execution)
But Assembly Language targets specific processors
- Both are intermediate states on the way to machine code
Can CIL be considered closer to bytecode?
Both CIL and bytecode are IRs, both are used by virtual machines, both use JIT, so it can be said that:
CIL is an intermediate language that combines some features of assembly language (human-readable, low-level) and bytecode (platform-independent, designed for virtual machines). It plays a role in the .NET ecosystem similar to Java bytecode in the Java ecosystem.
CLR (Rumtime)
CIL (byte code) is eventually executed as machine code on the CLR: IL is used by the CLR (Common Language Runtime), translated into machine-readable machine code through the CLR's JIT Compiler, and the CLR also manages the translated native code (in memory).
NGEN: .NET also provides the Native Image Generator (NGEN) tool, which can pre-compile CIL into native code to reduce JIT compilation overhead.
Java: Bytecode vs. Binary Code
In the Java ecosystem, bytecode is typically translated into machine code by the Java Interpreter.
Outside the Java world, bytecode and binary code have distinctly different meanings.
For the JVM, bytecode is its binary code. As long as a system has a JVM, it can run compiled bytecode. This is why bytecode is often considered binary code in the Java context.
Note: The term "Java interpreter" often refers to the interpretation component within the JVM, not a separate tool that directly converts source code to bytecode.
Bytecode vs.
Assembly Language vs.
Object Code
all are intermediate states
Bytecode vs. Assembly Language
Both bytecode and assembly language are considered Intermediate Representations (IR) that fall between source code and machine code. However, they have a big difference:
- Bytecode is for software interpretation/execution (e.g., by virtual machines)
- Assembly language is created for hardware execution (e.g., by CPUs)
The main distinction is that bytecode is generated for a virtual machine (software),
while assembly language is created for a CPU (hardware).
Object Code
Object code can be thought of as an intermediate step in the compilation process:
- Multiple object code files are combined by a linker to produce machine code
- The linker uses placeholders and offsets within the object code to connect everything together
JVM vs. CLR and
CLR Implementations
JVM vs. CLR
JVM (Java Virtual Machine) and CLR (Common Language Runtime) are similar concepts. Both are runtime environments for executing bytecode.
- JVM is for Java
- CLR is for .NET
CLR Implementations
.NET Framework born in 2002, only works in Windows. .NET Core born in 2016, for crossplatform.
CLR is a concept with three main implementations:
- coreCLR: The runtime for .NET Core
- .NET Framework’s CLR:
.NET Framework Version CLR version .NET Framework 4 🔝 4 .NET Framework 3 2 .NET Framework 2 2 .NET Framework 1.1 1.1
- Mono Runtime: Originally independent, later formed Xamarin, now acquired by Microsoft (Xamarin CLR is based on Mono CLR)
- Mono runtime doesn't have a specific name
- Unity was based on Mono but is now extending towards lower levels, potentially moving away from Mono
- For Android, Xamarin converts code to IL (Intermediate Language), then uses Mono runtime's JIT Compiler
- For iOS, it uses AOT (Ahead-Of-Time) compilation, similar to UWP
(before iOS 14.2, Apple didn’t accept JIT)
UWP is not using JIT, it use AOT
UWP (Universal Windows Platform) doesn't use JIT compilation. In the .NET ecosystem:
- High-level languages are compiled to IL
- UWP uses AOT execution, which is separate from JIT
Additional Note: .NET Native is a specialized AOT Compiler for UWP
technologies | mode |
.NET Native | AOT(Ahead-of-Time) |
.NET Framework CLR / CoreCLR / Mono CLR | JIT |
Machine Code vs. Native Code vs.
Managed Code v.s. Unmanaged Code
managed code (have their own env, context to work on)
C#, VB.NET, Java, which also executed in their own VM (e.g. .NET CLR and JVM). Platforms that understand IL will convert it to machine code
How to remember? "Managed" means it needs extra management, and also provides garbage collection
unmanaged code
C, C++ which are compiled directly into machine code. Programmers need to handle more dirty works.
native code
Native code is compiled for a specific hardware architecture.
machine code
Broader concept.
If Computer DO UNDERSTAND the code, then it’s machine code.
unmanaged code & native code are interchangeable
This pair are interchangeable, because they all works directly with hardware.
machine code & native code are interchangeable
This pair can be interchangeable depending on the context, due to their relative nature.
For example, native code is designed to run directly on specific hardware,
so from the perspective of that particular environment, it is almost machine code.
Both are the last step for that (context || hardware)
If we're just talking about what the computer can understand, we generally call it machine code for all languages when it's in a form the computer can understand.
This relative nature of the terms explains why they are sometimes used interchangeably, especially when discussing code execution in a specific hardware environment.
assembly v.s. LLVM IR v.s. IL
First, IR vs. IL
- IR (Intermediate Representation):Typically refers to an internal form used by compilers.
- IL (Intermediate Language):Usually refers to an intermediate form closer to high-level languages.
However, these terms are often used interchangeably. For example, LLVM uses IR, while .NET uses IL.
And IL retaining more of the source language's structure.
Assembly vs. LLVM IR
LLVM IR tends to have more of the original high-level language concept.
Assembly has very clear and specific instructions that closely match the machine's architecture.
from high to low: IL > LLVM IR > Assembly > machine code
JIT v.s. AOT
Just in Time v.s. Ahead of Time
for Language-based
JIT: Like JavaScript, compiles code during execution.
AOT: Like C, compiles all code before running.
for modes: dev v.s. prod
Production (AOT):
- Faster, smaller bundle
- More work for server, less for client
- Example: vue-loader uses AOT
Development (JIT):
- Not bundled together, files packaged separately and dynamically
- Easier for development, lighter on CPU, but more work for browser
mode in Angular
JIT:
platformBrowserDynamic().bootstrapModule(AppModule)
AOT:
plaformBrowser().bootstrapModuleFactory(AppModuleNgFactory)
Static languages equal AOT?
Many static languages use AOT, but it's not a rule
exception:
- Static languages can use JIT like Java, uses JIT in its VM.
- Dynamic languages can use AOT like Python's Cython.
language | AOT | JIT | |
Java | ✅ | ✅ | |
C# | ✅ | ✅ | JIT by default (.NET CLR) AOT available (.NET Native) |
JavaScript | ✅ | ✅ | JIT in browsers, Node.js; AOT with tools like Closure Compiler |
Vue | ✅ | 1. vue-loader is indeed an AOT 2. JIT realization: Vue 3 introduced a new feature which optimizes the rendering process at runtime. | |
Angular | ✅ | ✅ | |
ReactJs | depends on ecosystems: 1. babel, webpack do AOT things 2. Nextjs SSR similar to AOT | ||
Python | ✅ | ✅ | Interpreted by default, JIT with Numba and PyPy , AOT with Cython |
C/C++ | ✅ | ||
Rust | ✅ | ||
Go | ✅ |
That’s why Vue’s initial development startup is slow, and it can’t have any errors, whereas React’s JIT will only show errors if you route to a page with errors.
Interpreted Languages equals dynamic language?
language | language kind | type checking |
Python | interpreted | dynamic |
JavaScript | interpreted | dynamic |
Java | compiled | static |
C++ | compiled | static |
Exception | ||
Haskell, OCaml | Interpreted | static |
Erlang | compiled | dynamic |
Correlation is not causation.
Big Thank to these resources
https://www.spreered.com/compiler_for_dummies/
https://stackoverflow.com/questions/1210873/difference-between-dll-and-exe
https://stackoverflow.com/questions/11701063/is-cil-an-assembly-language-and-jit-an-assembler
https://www.geeksforgeeks.org/difference-between-byte-code-and-machine-code/
https://techterms.com/definition/bytecode
https://stackoverflow.com/questions/466790/assembly-code-vs-machine-code-vs-object-code
https://www.geeksforgeeks.org/language-processors-assembler-compiler-and-interpreter/
JVM, CLR
https://pediaa.com/what-is-the-difference-between-jvm-and-clr/
https://stackify.com/net-ecosystem-runtime-tools-languages/
https://blog.csdn.net/yinfourever/article/details/108258319
https://niraj-vishwakarma.medium.com/how-unity-supports-cross-platform-feature-ae722321cfa
Angular
https://medium.com/@Sujithnath/angular-aot-vs-jit-comparison-ce1d96ede491