@jialin.huang
FRONT-ENDBACK-ENDNETWORK, HTTPOS, COMPUTERCLOUD, AWS, Docker
To live is to risk it all Otherwise you are just an inert chunk of randomly assembled molecules drifting wherever the Universe blows you

© 2024 jialin00.com

Original content since 2022

back
RSS

Broad Translators

What you'll discover

  • The spectrum of languages, from high abstractions to low machine instructions
  • An deep look at many translators, including preprocessors, linkers, and compilers, etc.
  • Dive into Microsoft's ecosystem, with a focus on the Common Language Runtime (CLR)
  • comparisons of key terms: bytecode, binary code, machine code, managed code, and unmanaged code, etc.
  • The distinctions between Intermediate Language (IL), Intermediate Representation (IR), and Assembly
  • Table concludes that support Just-In-Time (JIT) or Ahead-of-Time (AOT) compilation
  • The relationship between JIT/AOT compilation and static/dynamic typing in programming languages. Correlation || causation?

Languages from High to Low

high-level language

Usually what we write in

assembly language

  • Human readable
  • Some developers write assembly language for optimization

machine language

  • Readable by computers
  • Also known as binary language

The term "source code" depends on context; it can refer to high-level language or assembly language.

Broad Definition of Translators

Broadly includes all tools capable of form conversion:

  1. preprocessor
  1. compiler
  1. assembler
  1. interpreter

compiler

Not necessarily into an executable file. For example, Java compiles to bytecode for its JVM.

Interpreter

Unlike a compiler, it reads and executes simultaneously without creating an executable file.

An Interpreter directly executes instructions written in a programming or scripting language without previously converting them to an object code or machine code.

assembler

assembly language to machine language

Preprocessor → Compiler → Assembler → Linker

Tools and Their Associated Languages

  • Pre-processor: Handles preprocessing directives like
    • #include
    • #define.
    • a MUST for C, C++
  • Compiler: Converts preprocessed code to assembly language. Needed for most compiled languages.
    1. GCC for C, C++
    1. rustc for Rust
      1. Rust to IR to Assembly
    1. javac for Java
    1. MSVC for C, C++ (windows platform)
  • Assembler: Converts assembly code to object files. Low-level languages like assembly start here.
  • Linker: Links multiple object files and libraries into an executable file.

Required Tools for Each Programming Language

tools that Language requires:

  • C/C++: Needs all four
  • Java: Compiler converts source to bytecode, doesn't need assembler and linker
    1. Java Source Code
    1. Java Compiler

      turn .java into .class

    1. Java Bytecode
    1. JVM (Class Loader → Bytecode Verifier → Interpreter/JIT Compiler) execution

      turn .class into machine code

    1. Machine Code
  • Python: Usually only needs an interpreter, not the above tools
    • The most common is CPython, which is implemented in C. There are other implementations, such as Jython (Java) and IronPython (.NET)

    Parsing (to AST), then AST to Bytecode (implicit step), then start interpretation(execution)

  • Assembly: Starts from assembler, doesn't need pre-processor and compiler

    Your assembly code assembling, linking, loading then execution

Interpreted Languages and Tools

Directly translates and executes source code step by step.

Examples: Python, Ruby, PHP, Perl, Lua

JavaScript engines use JIT technology (v8 engine)

Usually don't need preprocessors or assemblers… but may have preprocessor-like concepts (macro expansion):

  • Python: Module imports, decorators
  • Ruby: Module imports, metaprogramming mechanisms
  • JavaScript: Module imports (ES6+), transpilers like Babel

We don't need to know exactly how this works, but the underlying mechanisms do have functions similar to an assembler:

  • Virtual Machines: use VM to execute code.
    • CPython: Python uses the Python Virtual Machine (PVM)
    • Java uses the Java Virtual Machine (JVM).

    These virtual machines may internally use Assembly Language to implement certain functionalities or optimizations.

  • JIT Compilation: Some interpreters use JIT compilation techniques to compile frequently used code segments into machine code. This process may involve optimizations at the Assembly Language level.
    e.g., Modern JavaScript engines use JIT compilation, compiling hot spots of code into machine code at runtime.
  • Bytecode: Many interpreted languages first compile source code into bytecode. The VM then interprets and executes this bytecode. Can be viewed as a high-level form of Assembly Language.
    e.g., Python primarily uses an interpreter for execution, but it compiles source code into bytecode (.pyc files) to improve execution efficiency.

Which role in translators is JIT technology closest to?

Compiler, as it only compiles the parts that need to be executed.

CIL and CLR in the Microsoft Ecosystem

Let’s discuss CIL and CLR here.

CIL (MSIL)

C# and VB.NET use the .NET Platform Compiler, also known as Roslyn, to translate into usable IL at runtime

CIL (MSIL) is divided into two categories: .DLL & .EXE, including metadata information

DLL v.s. EXE: DLLs require a host EXE, while EXEs are independent processes

  • EXE: An independent program with its own address space, aimed at executing applications
  • DLL: Requires a host, mainly prepares methods/classes for use by other applications, Microsoft's implementation of shared libraries.

Assembly not only just includes CIL, but also:

  • Type Information
  • Security Information

Is CIL considered Assembly Language?

  1. CIL is higher-level (more readable, more abstract)
  1. CIL is platform-independent (not targeting specific platforms or hardware architectures, cross-platform execution)

    But Assembly Language targets specific processors

  1. Both are intermediate states on the way to machine code

Can CIL be considered closer to bytecode?

Both CIL and bytecode are IRs, both are used by virtual machines, both use JIT, so it can be said that:

CIL is an intermediate language that combines some features of assembly language (human-readable, low-level) and bytecode (platform-independent, designed for virtual machines). It plays a role in the .NET ecosystem similar to Java bytecode in the Java ecosystem.

CLR (Rumtime)

CIL (byte code) is eventually executed as machine code on the CLR: IL is used by the CLR (Common Language Runtime), translated into machine-readable machine code through the CLR's JIT Compiler, and the CLR also manages the translated native code (in memory).

NGEN: .NET also provides the Native Image Generator (NGEN) tool, which can pre-compile CIL into native code to reduce JIT compilation overhead.

Java: Bytecode vs. Binary Code

In the Java ecosystem, bytecode is typically translated into machine code by the Java Interpreter.

Outside the Java world, bytecode and binary code have distinctly different meanings.

For the JVM, bytecode is its binary code. As long as a system has a JVM, it can run compiled bytecode. This is why bytecode is often considered binary code in the Java context.

Note: The term "Java interpreter" often refers to the interpretation component within the JVM, not a separate tool that directly converts source code to bytecode.

Bytecode vs.
Assembly Language vs.
Object Code

all are intermediate states

Bytecode vs. Assembly Language

Both bytecode and assembly language are considered Intermediate Representations (IR) that fall between source code and machine code. However, they have a big difference:

  • Bytecode is for software interpretation/execution (e.g., by virtual machines)
  • Assembly language is created for hardware execution (e.g., by CPUs)
    The main distinction is that bytecode is generated for a virtual machine (software),
    while assembly language is created for a CPU (hardware).

Object Code

Object code can be thought of as an intermediate step in the compilation process:

  1. Multiple object code files are combined by a linker to produce machine code
  1. The linker uses placeholders and offsets within the object code to connect everything together

JVM vs. CLR and
CLR Implementations

JVM vs. CLR

JVM (Java Virtual Machine) and CLR (Common Language Runtime) are similar concepts. Both are runtime environments for executing bytecode.

  • JVM is for Java
  • CLR is for .NET

CLR Implementations

.NET Framework born in 2002, only works in Windows. .NET Core born in 2016, for crossplatform.

CLR is a concept with three main implementations:

  1. coreCLR: The runtime for .NET Core
  1. .NET Framework’s CLR:
    .NET Framework VersionCLR version
    .NET Framework 4 🔝4
    .NET Framework 32
    .NET Framework 22
    .NET Framework 1.11.1
  1. Mono Runtime: Originally independent, later formed Xamarin, now acquired by Microsoft (Xamarin CLR is based on Mono CLR)
    • Mono runtime doesn't have a specific name
    • Unity was based on Mono but is now extending towards lower levels, potentially moving away from Mono
    • For Android, Xamarin converts code to IL (Intermediate Language), then uses Mono runtime's JIT Compiler
    • For iOS, it uses AOT (Ahead-Of-Time) compilation, similar to UWP

      (before iOS 14.2, Apple didn’t accept JIT)

UWP is not using JIT, it use AOT

UWP (Universal Windows Platform) doesn't use JIT compilation. In the .NET ecosystem:

  • High-level languages are compiled to IL
  • UWP uses AOT execution, which is separate from JIT

Additional Note: .NET Native is a specialized AOT Compiler for UWP

technologiesmode
.NET NativeAOT(Ahead-of-Time)
.NET Framework CLR / CoreCLR / Mono CLRJIT

Machine Code vs. Native Code vs.
Managed Code v.s. Unmanaged Code

managed code (have their own env, context to work on)

C#, VB.NET, Java, which also executed in their own VM (e.g. .NET CLR and JVM). Platforms that understand IL will convert it to machine code

How to remember? "Managed" means it needs extra management, and also provides garbage collection

unmanaged code

C, C++ which are compiled directly into machine code. Programmers need to handle more dirty works.

native code

Native code is compiled for a specific hardware architecture.

machine code

Broader concept.

If Computer DO UNDERSTAND the code, then it’s machine code.

unmanaged code & native code are interchangeable

This pair are interchangeable, because they all works directly with hardware.

machine code & native code are interchangeable

This pair can be interchangeable depending on the context, due to their relative nature.
For example, native code is designed to run directly on specific hardware,
so from the perspective of that particular environment, it is almost machine code.

Both are the last step for that (context || hardware)

If we're just talking about what the computer can understand, we generally call it machine code for all languages when it's in a form the computer can understand.

This relative nature of the terms explains why they are sometimes used interchangeably, especially when discussing code execution in a specific hardware environment.

assembly v.s. LLVM IR v.s. IL

First, IR vs. IL

  • IR (Intermediate Representation):Typically refers to an internal form used by compilers.
  • IL (Intermediate Language):Usually refers to an intermediate form closer to high-level languages.

However, these terms are often used interchangeably. For example, LLVM uses IR, while .NET uses IL.

And IL retaining more of the source language's structure.

Assembly vs. LLVM IR

LLVM IR tends to have more of the original high-level language concept.

Assembly has very clear and specific instructions that closely match the machine's architecture.

from high to low: IL > LLVM IR > Assembly > machine code

JIT v.s. AOT

Just in Time v.s. Ahead of Time

for Language-based

JIT: Like JavaScript, compiles code during execution.

AOT: Like C, compiles all code before running.

for modes: dev v.s. prod

Production (AOT):

  • Faster, smaller bundle
  • More work for server, less for client
  • Example: vue-loader uses AOT

Development (JIT):

  • Not bundled together, files packaged separately and dynamically
  • Easier for development, lighter on CPU, but more work for browser

mode in Angular

JIT:

platformBrowserDynamic().bootstrapModule(AppModule)

AOT:

plaformBrowser().bootstrapModuleFactory(AppModuleNgFactory)

Static languages equal AOT?

Many static languages use AOT, but it's not a rule

exception:

  1. Static languages can use JIT like Java, uses JIT in its VM.
  1. Dynamic languages can use AOT like Python's Cython.
languageAOTJIT
Java
C#JIT by default (.NET CLR)
AOT available (.NET Native)
JavaScriptJIT in browsers, Node.js; AOT with tools like Closure Compiler
Vue 1. vue-loader is indeed an AOT
2. JIT realization: Vue 3 introduced a new feature which optimizes the rendering process at runtime.
Angular
ReactJsdepends on ecosystems:
1. babel, webpack do AOT things
2. Nextjs SSR similar to AOT
PythonInterpreted by default, JIT with Numba and PyPy, AOT with Cython
C/C++
Rust
Go

That’s why Vue’s initial development startup is slow, and it can’t have any errors, whereas React’s JIT will only show errors if you route to a page with errors.

Interpreted Languages equals dynamic language?

languagelanguage kindtype checking
Pythoninterpreteddynamic
JavaScriptinterpreteddynamic
Javacompiledstatic
C++compiledstatic
Exception
Haskell, OCamlInterpretedstatic
Erlangcompileddynamic

Correlation is not causation.

Big Thank to these resources

https://www.spreered.com/compiler_for_dummies/

https://stackoverflow.com/questions/1210873/difference-between-dll-and-exe

https://stackoverflow.com/questions/11701063/is-cil-an-assembly-language-and-jit-an-assembler

https://www.geeksforgeeks.org/difference-between-byte-code-and-machine-code/

https://techterms.com/definition/bytecode

https://stackoverflow.com/questions/466790/assembly-code-vs-machine-code-vs-object-code

https://www.geeksforgeeks.org/language-processors-assembler-compiler-and-interpreter/

JVM, CLR

https://pediaa.com/what-is-the-difference-between-jvm-and-clr/

https://stackify.com/net-ecosystem-runtime-tools-languages/

https://blog.csdn.net/yinfourever/article/details/108258319

https://stackoverflow.com/questions/34987202/net-framework-net-core-net-native-dnx-core-clr-cil-pcl-simple-explain

https://www.vskills.in/certification/tutorial/net-technology-framework-and-common-language-runtimeclr/

https://niraj-vishwakarma.medium.com/how-unity-supports-cross-platform-feature-ae722321cfa

https://stackoverflow.com/questions/3434202/what-is-the-difference-between-native-code-machine-code-and-assembly-code

Angular

https://levelup.gitconnected.com/just-in-time-jit-and-ahead-of-time-aot-compilation-in-angular-8529f1d6fa9d

https://medium.com/@Sujithnath/angular-aot-vs-jit-comparison-ce1d96ede491

EOF