TON Virtual Machine
The Open Network Virtual Machine (TVM or TON VM) is a virtual machine designed to execute smart contracts in TON blockchain. It is the basis for transaction processing and network state management. Developers use TVM tools to create decentralized applications.
TVM processes incoming messages and persistent data, creates new messages and modifies data. The requirements for TVM include support for future extensions while maintaining backward compatibility, high code density to save space on the blockchain, and complete determinism (every code run must produce the same result).
History and Evolution of TVM
TON Virtual Machine was developed by the Telegram team in 2018. After its initial development, it has continued to evolve through several key milestones:
- The first version of TVM (2018) — it was introduced alongside the launch of TON test network. This version focused on the functionality of smart contracts and their integration into TON blockchain.
- March 2020 — detailed technical documentation on TVM was published, describing aspects of the virtual machine, including support for different data types and optimizing operations.
- Core Network Launch — after testing and refinement, TON core network was launched in 2020. At this point, TVM already supported more complex operations and provided full deterministic code execution.
TVM Basics
TVM uses notations (a way of representing data) for bit strings, represented in two formats:
- Hexadecimal notation, where bit strings of lengths divisible by multiples of four are divided into groups of four bits, each represented by a hexadecimal digit from 0 to F. If the length is not a multiple of four, the string is appended to the multiple with a special completion tag added to indicate the modification.
- Serialization to a sequence of octets. Here, for representation as bytes, the string is divided into groups of 8 bits. If necessary, the string is expanded to divide by eight.
TVM as a Stack machine
TVM functions as a stack machine. A stack machine is a model of computation in which data is stored in a stack-type data structure (last in, first out) rather than in variables or registers. Basic operations, including arithmetic operations, retrieve arguments from the stack and return the result there.
Value types in TVM
TVM operates with predefined value types:
- Integer — 257-bit signed integers.
- Cell — objects containing data up to 1023 bits and up to 4 references.
- Tuple — ordered collections of values of different types
- Slice — subcells containing part of the data and references to the original cell.
- Builder — tools for serializing data into new cells.
- Continuation — objects that represent the continuation points of an execution.
- Null — a type representing the absence of a value, used to represent empty or uninitialized values.
TVM Instructions
TVM instructions are categorized by the type of operations:
- Stack and tuple primitives (stack reordering and tuple manipulation)
- Constant primitives (inserting predefined values into the stack)
- Arithmetic primitives (performing standard arithmetic operations)
- Cell primitives (creating new cells and accessing data in existing cells)
- Flow Control and Continuation Primitives (controlling the flow of program execution)
- Custom Primitives (specific operations required for TON applications)
TVM Cells
TVM cells are the basic elements of memory and persistent storage, containing up to 1023 bits of data and up to 4 references, forming a directed acyclic graph (DAG). All blockchain data is represented as collections of cells, which simplifies storage and processing. (creating new cells and accessing data in existing cells)
Cell Types
- Normal cells (type-1) — standard cells containing data and references.
- Exotic cells — have unique types (1-255) and peculiarities in deserialization and hashing. These include truncated branches, library reference cells, and Merkle cells.
Flow control, Continuations and Exceptions in TVM
Continuations in TVM are central elements for controlling the flow of program execution, used for subroutine calls, conditional and iterative operations, and exception handling. A continuation is a run marker that can be activated later, allowing program execution to continue from a saved state.
Types of Continuations
- Normal continuations — include executable code, stack, a list of stored control register values, and a code page for interpreting instructions.
- Simple continuations — characterized by code and code page (cp) only, with no additional stack data.
- Current continuations (cc) — reflect the code that is being executed. This is a key component of TVM state that defines the current operation.
Continuations are actively involved in decoding and executing operations. They can decode an instruction, execute it, and update the current program state until they run out of code or encounter a return instruction.
Continuations are controlled through the JMP and RET instructions, which allow switching between continuations by passing control and necessary data.
Exceptions in TVM are managed by special continuations that are activated when errors or specific conditions occur. These continuations handle exceptions by providing error parameters for further action. TVM includes conditional and iteration primitives such as IF, WHILE, and REPEAT. They enrich the language with constructs to control code execution in response to dynamically changing conditions. Continuations can be used to mimic object-oriented structures, where each object is represented by a continuation, and the object's methods are activated through these continuations.
Code Pages and Instruction Coding
Code pages provide transparent execution of code written for different versions of TVM, and allow interaction between instances of code written at different times.
The code pages in the continuations of each regular continuation include a 16-bit code page (cp) field that specifies which page will be used to execute its code. This allows each continuation to use different versions of the instruction encodings needed to execute it, which is critical to maintaining compatibility between different versions of TVM.
Instructions in TVM are encoded with a binary prefix code, which ensures unambiguity in decoding. An invalid opcode exception is generated if the prefix of the current continuation code does not match a valid instruction of the current code page.
It is possible to automatically switch between code pages depending on the operation performed, which allows optimizing code execution. For example, specialized code pages can be used for certain types of operations, such as stack manipulation or data processing, where instructions of the same type often follow each other.
Comparison of TVM with Ethereum Virtual Machine (EVM)
TON Virtual Machine and Ethereum Virtual Machine are stackable virtual machines for running smart contracts, but they have differences.
TVM uses the «cell packet» model, where data is represented as cells, each containing up to 128 bytes of data and links to other cells. This helps to handle complex data structures such as trees and directed acyclic graphs (DAGs). EVM, on the other hand, uses 256-bit integers and Merkle Patricia Trie (MPT) to store state. This data structure requires more resources because 256-bit numbers and MPTs need more memory and computational power.
TVM also has automated overflow checking for arithmetic operations, which improves code reliability. In EVM, such checking requires manual implementation, which complicates development.
TVM's cryptographic features include support for Curve25519, Weil pairings for zk-SNARKs, and sha256, while EVM uses secp256k1 and keccak256. In other words, TVM supports more complex cryptographic operations.
The main programming language for TVM is FunC, which offers support for static types and algebraic data types, unlike Solidity, the language for EVM.