The Bitcoin scripting language and its specification

The Bitcoin scripting language and its specification

Jun 24, 2022

In our series of blogs giving you a preview of our introductory course to Bitcoin Infrastructure, Bitcoin Curriculum Specialists, Brendan Lee and Evan Freeman have given you insight into the difference between a node and a miner, the double-spending problem of the digital economy, Bitcoin’s first seen rule, misunderstandings about Bitcoin’s decentralisationincentive-driven behaviours built into Bitcoin and how nodes benefit from light-speed tx propagation.

Today, Brendan talks about Bitcoin’s scripting language, Bitcoin transaction rules, and formal grammar for Bitcoin script.

The need for a unanimous approach to Bitcoin opcodes

One of the aspects of Bitcoin which gives it such a broad range of applicability and function is its scripting language.

Based on Forth, the language is stack based and uses Reverse Polish Notation as the means to enter and process data. While the language appears simple, when used properly it can provide a Turing complete environment within which complex and diverse applications can be built.

The scripting language is made up of a set of 186 opcodes which each conduct operations on the processing stack. Any node that wishes to process transactions on the Bitcoin network must ensure that their node client implementation is processing each of the opcodes in a way that exactly matches the outcomes expected by every other node on the network at that time, including whether or not those opcodes are enabled or not.

Even a minor change in the way opcodes are processed can result in transactions that were committed to the ledger being rendered unspendable causing irreparable damage to the system’s integrity and usability so it is of vital importance that every node process each opcode in the script in exactly the same manner.

Upholding bugs

Interestingly, this also means that bugs that existed in the execution algorithms must also be upheld. Notably, a bug in the OP_CHECKMULTISIG opcode requires that an extra data item be added to the stack before the first signature or the opcode will fail to execute properly.

For this reason, anyone spending an output with OP_CHECKMULTISIG in it must add one extra data item to their script. Additionally, any node clients that did not take this known bug into account would incorrectly validate scripts that use the opcode, potentially causing them to reject transactions or blocks that should be considered valid, or accepting transactions and blocks which the rest of the network considers invalid.

Data Types

All data items in Bitcoin Script are a byte sequence. Some operations interpret their parameters as numeric or boolean values and require the item to fulfil the specifications of those types. Some operations produce items on the stack which are valid numeric or boolean values.

A byte sequence has a length and a value. The length of the byte sequence must be an integer greater or equal to zero and less than or equal to 2^32-1 (UINT32_MAX).

The byte sequence of length zero is called the “null value”.

Any data item can be interpreted as a boolean value. If the data item consists entirely of bytes with value zero, or the data item is the null value, then the boolean value of the item is false. Otherwise, the boolean value of the item is true.

A data item can be interpreted as a numeric value. The numeric value is encoded in a byte sequence using little-endian notation. When script items are processed using opcodes that perform mathematical functions, the node will treat any byte sequence of up to 7500 bytes long as a numeric value, allowing for 'bignum' calculations to be performed in script.

Formal Grammar for Bitcoin Script

The Formal Grammar for Bitcoin Script is set by node operators. This contains the full set of approved opcodes and their exact spelling and function.

It’s also worth highlighting the following features of this formal grammar:

  • The complete script consists of two sections, the unlocking script (scriptSig) and the locking script (scriptPubKey). The locking script is from the transaction output that is being spent, while the unlocking script is included in the transaction input that is spending the output.

  • Current consensus rules state that an unlocking script can only contain the first 96 opcodes, which allow constants and data to be pushed onto the stack. This requirement is a part of the Validity of Script Consensus Rule, defined later.

  • A branching operator (OP_IF or OP_NOTIF) must have a matching OP_ENDIF.

  • An OP_ELSE can only be included between a branching operator and OP_ENDIF pair. There can only be at most one OP_ELSE between a branching operator and an OP_ENDIF.

  • OP_RETURN may appear at any location in a valid script. The functionality of OP_RETURN has been restored and is defined later in the section OP_RETURN Functionality. Grammatically, any bytes after an OP_RETURN that is not in a branch block are not evaluated and there are no grammatical requirements for those bytes.

Note that disabled operations are part of this grammar. A disabled operation is grammatically correct but will produce a failure if executed.

An introduction to Bitcoin infrastructure

If the topic of Bitcoin mining and infrastructure is within your professional purview, you’re sure to benefit from the BSV Academy’s introduction to Bitcoin Infrastructure course.
The introduction to Bitcoin Infrastructure course is focused on providing students with a solid understanding of the role that nodes and node operators play in the construction of the network. In particular, it will focus on the incentives that drive enterprise operators to spend large sums of money to build and operate their infrastructure.
To sign up for this free course, head over here.
Brendan Lee

Training and Development Manager - Bitcoin Association