yellow paper

29
ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW DR. GAVIN WOOD CO-FOUNDER & LEAD, ETHEREUM PROJECT [email protected] Abstract. The blockchain paradigm when coupled with cryptographically-secured transactions has demonstrated its utility through a number of projects, not least Bitcoin. Each such project can be seen as a simple application on a decentralised, but singleton, compute resource. We can call this paradigm a transactional singleton machine with shared-state. Ethereum implements this paradigm in a generalised manner. Furthermore it provides a plurality of such resources, each with a distinct state and operating code but able to interact through a message-passing framework with others. We discuss its design, implementation issues, the opportunities it provides and the future hurdles we envisage. 1. Introduction With ubiquitous internet connections in most places of the world, global information transmission has become incredibly cheap. Technology-rooted movements like Bit- coin have demonstrated, through the power of the default, consensus mechanisms and voluntary respect of the social contract that it is possible to use the internet to make a decentralised value-transfer system, shared across the world and virtually free to use. This system can be said to be a very specialised version of a cryptographically se- cure, transaction-based state machine. Follow-up systems such as Namecoin adapted this original “currency appli- cation” of the technology into other applications albeit rather simplistic ones. Ethereum is a project which attempts to build the gen- eralised technology; technology on which all transaction- based state machine concepts may be built. Moreover it aims to provide to the end-developer a tightly integrated end-to-end system for building software on a hitherto un- explored compute paradigm in the mainstream: a trustful object messaging compute framework. 1.1. Driving Factors. There are many goals of this project; one key goal is to facilitate transactions be- tween consenting individuals who would otherwise have no means to trust one another. This may be due to geographical separation, interfacing difficulty, or perhaps the incompatibility, incompetence, unwillingness, expense, uncertainty, inconvenience or corruption of existing legal systems. By specifying a state-change system through a rich and unambiguous language, and furthermore archi- tecting a system such that we can reasonably expect that an agreement will be thus enforced autonomously, we can provide a means to this end. Dealings in this proposed system would have several attributes not often found in the real world. The incor- ruptibility of judgement, often difficult to find, comes nat- urally from a disinterested algorithmic interpreter. Trans- parency, or being able to see exactly how a state or judge- ment came about through the transaction log and rules or instructional codes, never happens perfectly in human- based systems since natural language is necessarily vague, information is often lacking, and plain old prejudices are difficult to shake. Overall, I wish to provide a system such that users can be guaranteed that no matter with which other individ- uals, systems or organisations they interact, they can do so with absolute confidence in the possible outcomes and how those outcomes might come about. 1.2. Previous Work. Buterin [2013] first proposed the kernel of this work in late November, 2013. Though now evolved in many ways, the key functionality of a block- chain with a Turing-complete language and an effectively unlimited inter-transaction storage capability remains un- changed. Hashcash, introduced by Back [2002] (in a five-year retrospective), provided the first work into the usage of a cryptographic proof of computational expenditure as a means of transmitting a value signal over the Internet. Though not widely adopted, the work was later utilised and expanded upon by Nakamoto [2008] in order to de- vise a cryptographically secure mechanism for coming to a decentralised social consensus over the order and contents of a series of cryptographically signed financial transac- tions. The fruits of this project, Bitcoin, provided a first glimpse into a decentralised transaction ledger. Other projects built on Bitcoin’s success; the alt-coins introduced numerous other currencies through alteration to the protocol. Some of the best known are Litecoin and Primecoin, discussed by Sprankel [2013]. Other projects sought to take the core value content mechanism of the protocol and repurpose it; Aron [2012] discusses, for ex- ample, the Namecoin project which aims to provide a de- centralised name-resolution system. Other projects still aim to build upon the Bitcoin net- work itself, leveraging the large amount of value placed in the system and the vast amount of computation that goes into the consensus mechanism. The Mastercoin project, first proposed by Willett [2013], aims to build a richer protocol involving many additional high-level features on top of the Bitcoin protocol through utilisation of a num- ber of auxiliary parts to the core protocol. The Coloured Coins project, proposed by Rosenfeld [2012], takes a sim- ilar but more simplified strategy, embellishing the rules 1

Upload: koffka-khan

Post on 07-Sep-2015

248 views

Category:

Documents


11 download

DESCRIPTION

ethereum

TRANSCRIPT

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER

    FINAL DRAFT - UNDER REVIEW

    DR. GAVIN WOODCO-FOUNDER & LEAD, ETHEREUM PROJECT

    [email protected]

    Abstract. The blockchain paradigm when coupled with cryptographically-secured transactions has demonstrated itsutility through a number of projects, not least Bitcoin. Each such project can be seen as a simple application ona decentralised, but singleton, compute resource. We can call this paradigm a transactional singleton machine withshared-state.

    Ethereum implements this paradigm in a generalised manner. Furthermore it provides a plurality of such resources,each with a distinct state and operating code but able to interact through a message-passing framework with others.We discuss its design, implementation issues, the opportunities it provides and the future hurdles we envisage.

    1. Introduction

    With ubiquitous internet connections in most placesof the world, global information transmission has becomeincredibly cheap. Technology-rooted movements like Bit-coin have demonstrated, through the power of the default,consensus mechanisms and voluntary respect of the socialcontract that it is possible to use the internet to makea decentralised value-transfer system, shared across theworld and virtually free to use. This system can be saidto be a very specialised version of a cryptographically se-cure, transaction-based state machine. Follow-up systemssuch as Namecoin adapted this original currency appli-cation of the technology into other applications albeitrather simplistic ones.

    Ethereum is a project which attempts to build the gen-eralised technology; technology on which all transaction-based state machine concepts may be built. Moreover itaims to provide to the end-developer a tightly integratedend-to-end system for building software on a hitherto un-explored compute paradigm in the mainstream: a trustfulobject messaging compute framework.

    1.1. Driving Factors. There are many goals of thisproject; one key goal is to facilitate transactions be-tween consenting individuals who would otherwise haveno means to trust one another. This may be due togeographical separation, interfacing difficulty, or perhapsthe incompatibility, incompetence, unwillingness, expense,uncertainty, inconvenience or corruption of existing legalsystems. By specifying a state-change system through arich and unambiguous language, and furthermore archi-tecting a system such that we can reasonably expect thatan agreement will be thus enforced autonomously, we canprovide a means to this end.

    Dealings in this proposed system would have severalattributes not often found in the real world. The incor-ruptibility of judgement, often difficult to find, comes nat-urally from a disinterested algorithmic interpreter. Trans-parency, or being able to see exactly how a state or judge-ment came about through the transaction log and rulesor instructional codes, never happens perfectly in human-based systems since natural language is necessarily vague,

    information is often lacking, and plain old prejudices aredifficult to shake.

    Overall, I wish to provide a system such that users canbe guaranteed that no matter with which other individ-uals, systems or organisations they interact, they can doso with absolute confidence in the possible outcomes andhow those outcomes might come about.

    1.2. Previous Work. Buterin [2013] first proposed thekernel of this work in late November, 2013. Though nowevolved in many ways, the key functionality of a block-chain with a Turing-complete language and an effectivelyunlimited inter-transaction storage capability remains un-changed.

    Hashcash, introduced by Back [2002] (in a five-yearretrospective), provided the first work into the usage ofa cryptographic proof of computational expenditure as ameans of transmitting a value signal over the Internet.Though not widely adopted, the work was later utilisedand expanded upon by Nakamoto [2008] in order to de-vise a cryptographically secure mechanism for coming to adecentralised social consensus over the order and contentsof a series of cryptographically signed financial transac-tions. The fruits of this project, Bitcoin, provided a firstglimpse into a decentralised transaction ledger.

    Other projects built on Bitcoins success; the alt-coinsintroduced numerous other currencies through alterationto the protocol. Some of the best known are Litecoin andPrimecoin, discussed by Sprankel [2013]. Other projectssought to take the core value content mechanism of theprotocol and repurpose it; Aron [2012] discusses, for ex-ample, the Namecoin project which aims to provide a de-centralised name-resolution system.

    Other projects still aim to build upon the Bitcoin net-work itself, leveraging the large amount of value placed inthe system and the vast amount of computation that goesinto the consensus mechanism. The Mastercoin project,first proposed by Willett [2013], aims to build a richerprotocol involving many additional high-level features ontop of the Bitcoin protocol through utilisation of a num-ber of auxiliary parts to the core protocol. The ColouredCoins project, proposed by Rosenfeld [2012], takes a sim-ilar but more simplified strategy, embellishing the rules

    1

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 2

    of a transaction in order to break the fungibility of Bit-coins base currency and allow the creation and tracking oftokens through a special chroma-wallet-protocol-awarepiece of software.

    Additional work has been done in the area with discard-ing the decentralisation foundation; Ripple, discussed byBoutellier and Heinzen [2014], has sought to create a fed-erated system for currency exchange, effectively creatinga new financial clearing system. It has demonstrated thathigh efficiency gains can be made if the decentralisationpremise is discarded.

    Early work on smart contracts has been done by Szabo[1997] and Miller [1997]. Around the 1990s it became clearthat algorithmic enforcement of agreements could becomea significant force in human cooperation. Though no spe-cific system was proposed to implement such a system,it was proposed that the future of law would be heavilyaffected by such systems. In this light, Ethereum maybe seen as a general implementation of such a crypto-lawsystem.

    2. The Blockchain Paradigm

    Ethereum, taken as a whole, can be viewed as atransaction-based state machine: we begin with a gene-sis state and incrementally execute transactions to morphit into some final state. It is this final state which we ac-cept as the canonical version of the world of Ethereum.The state can include such information as account bal-ances, reputations, trust arrangements, data pertainingto information of the physical world; in short, anythingthat can currently be represented by a computer is admis-sible. Transactions thus represent a valid arc between twostates; the valid part is importantthere exist far moreinvalid state changes than valid state changes. Invalidstate changes might, e.g. be things such as reducing anaccount balance without an equal and opposite increaseelsewhere. A valid state transition is one which comesabout through a transaction. Formally:

    (1) t+1 (t, T )

    where is the Ethereum state transition function. InEthereum, , together with are considerably more pow-erful then any existing comparable system; allows com-ponents to carry out arbitrary computation, while al-lows components to store arbitrary state between trans-actions.

    Transactions are collated into blocks; blocks arechained together using a cryptographic hash as a meansof reference. Blocks function as a journal, recording a se-ries of transactions together with the previous block andan identifier for the final state (though do not store thefinal state itselfthat would be far too big). They alsopunctuate the transaction series with incentives for nodesto mine. This incentivisation takes places as a state-transition function, adding value to a nominated account.

    Mining is the process of dedicating effort (working) tobolster one series of transactions (a block) over any otherpotential competitor block. It is achieved thanks to acryptographically secure proof. This scheme is known asa proof-of-work and is discussed in detail in section 11.5.

    Formally, we expand to:

    t+1 (t, B)(2)B (..., (T0, T1, ...))(3)

    (, B) (B,((, T0), T1)...)(4)Where is the block-finalisation state transition func-

    tion (a function that rewards a nominated party); B isthis block, which includes a series of transactions amongstsome other components; and is the block-level state-transition function.

    This is the basis of the blockchain paradigm, a modelthat forms the backbone of not only Ethereum, but all de-centralised consensus-based transaction systems to date.

    2.1. Value. In order to incentivise computation withinthe network, there needs to be an agreed method for trans-mitting value. To address this issue, Ethereum has an in-trinsic currency, Ether, known also as ETH and sometimesreferred to by the Old English D. The smallest subdenom-ination of Ether, and thus the one in which all integer val-ues of the currency are counted, is the Wei. One Ether isdefined as being 1018 Wei. There exist other subdenomi-nations of Ether:

    Multiplier Name

    100 Wei1012 Szabo1015 Finney1018 Ether

    Throughout the present work, any reference to value,in the context of Ether, currency, a balance or a payment,should be assumed to be counted in Wei.

    2.2. Which History? Since the system is decentralisedand all parties have an opportunity to create a new blockon some older pre-existing block, the resultant structure isnecessarily a tree of blocks. In order to form a consensus asto which path, from root (the genesis block) to leaf (theblock containing the most recent transactions) throughthis tree structure, known as the blockchain, there mustbe an agreed-upon scheme. If there is ever a disagree-ment between nodes as to which root-to-leaf path downthe block tree is the best blockchain, then a fork occurs.

    This would mean that past a given point in time(block), multiple states of the system may coexist: somenodes believing one block to contain the canonical transac-tions, other nodes believing some other block to be canoni-cal, potentially containing radically different or incompat-ible transactions. This is to be avoided at all costs as theuncertainty that would ensue would likely kill all confi-dence in the entire system.

    The scheme we use in order to generate consensus is asimplified version of the GHOST protocol introduced bySompolinsky and Zohar [2013]. This process is describedin detail in section 10.

    3. Conventions

    I use a number of typographical conventions for theformal notation, some of which are quite particular to thepresent work:

    The two sets of highly structured, top-level, state val-ues, are denoted with bold lowercase Greek letters. Theyfall into those of world-state, which are denoted (or avariant thereupon) and those of machine-state, .

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 3

    Functions operating on highly structured values aredenoted with an upper-case greek letter, e.g. , theEthereum state transition function.

    For most functions, an uppercase letter is used, e.g.C, the general cost function. These may be subscriptedto denote specialised variants, e.g. CSSTORE, the cost func-tion for the SSTORE operation. For specialised and possiblyexternally defined functions, I may format as typewritertext, e.g. the Keccak-256 hash function (as per the winningentry to the SHA-3 contest) is denoted KEC (and generallyreferred to as plain Keccak).

    Tuples are typically denoted with an upper-case letter,e.g. T , is used to denote an Ethereum transaction. Thissymbol may, if accordingly defined, be subscripted to re-fer to an individual component, e.g. Tn, denotes the nonceof said transaction. The form of the subscript is used todenote its type; e.g. uppercase subscripts refer to tupleswith subscriptable components.

    Scalars and fixed-size byte sequences (or, synony-mously, arrays) are denoted with a normal lower-case let-ter, e.g. n is used in the document to denote a transactionnonce. Those with a particularly special meaning may begreek, e.g. , the number of items required on the stackfor a given operation.

    Arbitrary-length sequences are typically denoted as abold lower-case letter, e.g. o is used to denote the byte-sequence given as the output data of a message call. Forparticularly important values, a bold uppercase letter maybe used.

    Throughout, we assume scalars are positive integersand thus belong to the set P. The set of all byte sequencesis B, formally defined in Appendix B. If such a set of se-quences is restricted to those of a particular length, it isdenoted with a subscript, thus the set of all byte sequencesof length 32 is named B32 and the set of all positive in-tegers smaller than 2256 is named P256. This is formallydefined in section 4.3.

    Square brackets are used to index into and referenceindividual components or subsequences of sequences, e.g.s[0] denotes the first item on the machines stack. Forsubsequences, ellipses are used to specify the intendedrange, to include elements at both limits, e.g. m[0..31]denotes the first 32 items of the machines memory.

    In the case of the global state , which is a sequence ofaccounts, themselves tuples, the square brackets are usedto reference an individual account.

    When considering variants of existing values, I followthe rule that within a given scope for definition, if weassume that the unmodified input value be denoted bythe placeholder then the modified and utilisable valueis denoted as , and intermediate values would be , &c. On very particular occasions, in order to max-imise readability and only if unambiguous in meaning, Imay use alpha-numeric subscripts to denote intermediatevalues, especially those of particular note.

    When considering the use of existing functions, givena function f , the function f denotes a similar, element-wise version of the function mapping instead between se-quences. It is formally defined in section 4.3.

    I define a number of useful functions throughout. Oneof the more common is `, which evaluates to the last itemin the given sequence:

    (5) `(x) x[x 1]

    4. Blocks, State and Transactions

    Having introduced the basic concepts behindEthereum, we will discuss the meaning of a transaction, ablock and the state in more detail.

    4.1. World State. The world state (state), is a map-ping between addresses (160-bit identifiers) and accountstates (a data structure serialised as RLP, see AppendixB). Though not stored on the blockchain, it is assumedthat the implementation will maintain this mapping in amodified Merkle Patricia tree (trie, see Appendix D). Thetrie requires a simple database backend that maintains amapping of bytearrays to bytearrays; we name this under-lying database the state database. This has a number ofbenefits; firstly the root node of this structure is crypto-graphically dependent on all internal data and as such itshash can be used as a secure identity for the entire sys-tem state. Secondly, being an immutable data structure,it allows any previous state (whose root hash is known) tobe recalled by simply altering the root hash accordingly.Since we store all such root hashes in the blockchain, weare able to trivially revert to old states.

    The account state comprises the following four fields:

    nonce: A scalar value equal to the number of trans-actions sent from this address or, in the caseof accounts with associated code, the number ofcontract-creations made by this account. For ac-count of address a in state , this would be for-mally denoted [a]n.

    balance: A scalar value equal to the number of Weiowned by this address. Formally denoted [a]b.

    storageRoot: A 256-bit hash of the root node of aMerkle Patricia tree that encodes the storage con-tents of the account (a mapping between 256-bitinteger values), encoded into the trie as a mappingfrom the 256-bit integer keys to the RLP-encoded256-bit integer values. The hash is formally de-noted [a]s.

    codeHash: The hash of the EVM code of thisaccountthis is the code that gets executedshould this address receive a message call; it isimmutable and thus, unlike all other fields, can-not be changed after construction. All such codefragments are contained in the state database un-der their corresponding hashes for later retrieval.This hash is formally denoted [a]c, and thus thecode may be denoted as b, given that KEC(b) =[a]c.

    Since I typically wish to refer not to the tries root hashbut to the underlying set of key/value pairs stored within,I define a convenient equivalence:

    (6) TRIE(LI([a]s)

    ) [a]sThe collapse function for the set of key/value pairs in

    the trie, LI , is defined as the element-wise transformationof the base function LI , given as:

    (7) LI((k, v)

    ) (k, RLP(v))where:

    (8) k B32 v P

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 4

    It shall be understood that [a]s is not a physicalmember of the account and does not contribute to its laterserialisation.

    If the codeHash field is the Keccak-256 hash of theempty string, i.e. [a]c = KEC

    (()), then the node repre-

    sents a simple account, sometimes referred to as a non-contract account.

    Thus we may define a world-state collapse function LS :

    (9) LS() {p(a) : [a] 6= }where

    (10) p(a) (a, RLP(([a]n,[a]b,[a]s,[a]c)))This function, LS , is used alongside the trie function

    to provide a short identity (hash) of the world state. Weassume:

    (11) a : [a] = (a B20 v([a]))where v is the account validity function:

    v(x) xn P256 xb P256 xs B32 xc B32(12)

    4.2. The Transaction. A transaction (formally, T ) is asingle cryptographically-signed instruction constructed byan actor externally to the scope of Ethereum. It is as-sumed that the external actor will be human in nature,though software tools will be used in its construction anddissemination. There are two types of transactions: thosewhich result in message calls and those which result inthe creation of new accounts with associated code (knowninformally as contract creation). Both types specify anumber of common fields:

    nonce: A scalar value equal to the number of trans-actions sent by the sender; formally Tn.

    gasPrice: A scalar value equal to the number ofWei to be paid per unit of gas for all computa-tion costs incurred as a result of the execution ofthis transaction; formally Tp.

    gasLimit: A scalar value equal to the maximumamount of gas that should be used in executingthis transaction. This is paid up-front, before anycomputation is done and may not be increasedlater; formally Tg.

    to: The 160-bit address of the message calls recipi-ent or, for a contract creation transaction, , usedhere to denote the only member of B0 ; formallyTt.

    value: A scalar value equal to the number of Wei tobe transferred to the message calls recipient or,in the case of contract creation, as an endowmentto the newly created account; formally Tv.

    v, r, s: Values corresponding to the signature of thetransaction and used to determine the sender ofthe transaction; formally Tw, Tr and Ts. This isexpanded in Appendix F.

    Additionally, a contract creation transaction contains:

    init: An unlimited size byte array specifying theEVM-code for the account initialisation proce-dure, formally Ti.

    init is an EVM-code fragment; it returns the body,a second fragment of code that executes each time theaccount receives a message call (either through a trans-action or due to the internal execution of code). init is

    executed only once at account creation and gets discardedimmediately thereafter.

    In contrast, a message call transaction contains:

    data: An unlimited size byte array specifying theinput data of the message call, formally Td.

    Appendix F specifies the function, S, which mapstransactions to the sender, and happens through theECDSA of the SECP-256k1 curve, using the hash of thetransaction (excepting the latter three signature fields) asthe datum to sign. For the present we simply assert thatthe sender of a given transaction T can be representedwith S(T ).

    (13)

    LT (T ) {

    (Tn, Tp, Tg, Tt, Tv, Ti, Tw, Tr, Ts) if Tt = (Tn, Tp, Tg, Tt, Tv, Td, Tw, Tr, Ts) otherwise

    Here, we assume all components are interpreted by theRLP as integer values, with the exception of the arbitrarylength byte arrays Ti and Td.

    (14) Tn P256 Tv P256 Tp P256 Tg P256 Tw P5 Tr P256 Ts P256 Td B Ti B

    where

    (15) Pn = {P : P P P < 2n}The address hash Tt is slightly different: it is either a

    20-byte address hash or, in the case of being a contract-creation transaction (and thus formally equal to ), it isthe RLP empty byte-series and thus the member of B0:

    (16) Tt {B20 if Tt 6= B0 otherwise

    4.3. The Block. The block in Ethereum is the collec-tion of relevant pieces of information (known as the blockheader), H, together with information corresponding tothe comprised transactions, T, and a set of other blockheaders U that are known to have a parent equal to thepresent blocks parents parent (such blocks are known asuncles). The block header contains several pieces of infor-mation:

    parentHash: The Keccak 256-bit hash of the par-ent blocks header, in its entirety; formally Hp.

    unclesHash: The Keccak 256-bit hash of the uncleslist portion of this block; formally Hu.

    coinbase: The 160-bit address to which all fees col-lected from the successful mining of this block betransferred; formally Hc.

    stateRoot: The Keccak 256-bit hash of the rootnode of the state trie, after all transactions areexecuted and finalisations applied; formally Hr.

    transactionsRoot: The Keccak 256-bit hash of theroot node of the trie structure populated witheach transaction in the transactions list portionof the block; formally Ht.

    receiptsRoot: The Keccak 256-bit hash of the rootnode of the trie structure populated with the re-ceipts of each transaction in the transactions listportion of the block; formally He.

    logsBloom: The Bloom filter composed from in-dexable information (logger address and log top-ics) contained in each log entry from the receipt of

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 5

    each transaction in the transactions list; formallyHb.

    difficulty: A scalar value corresponding to the dif-ficulty level of this block. This can be calculatedfrom the previous blocks difficulty level and thetimestamp; formally Hd.

    number: A scalar value equal to the number of an-cestor blocks. The genesis block has a number ofzero; formally Hi.

    gasLimit: A scalar value equal to the current limitof gas expenditure per block; formally Hl.

    gasUsed: A scalar value equal to the total gas usedin transactions in this block; formally Hg.

    timestamp: A scalar value equal to the reasonableoutput of Unixs time() at this blocks inception;formally Hs.

    extraData: An arbitrary byte array containingdata relevant to this block. This must be 1024bytes or fewer; formally Hx.

    nonce: A 256-bit hash which proves that a sufficientamount of computation has been carried out onthis block; formally Hn.

    The other two components in the block are simply alist of uncle block headers (of the same format as above)and a series of the transactions. Formally, we can refer toa block B:

    (17) B (BH , BT, BU)4.3.1. Transaction Receipt. In order to encode informa-tion about a transaction concerning which it may be use-ful to form a zero-knowledge proof, or index and search,we encode a receipt of each transaction containing cer-tain information from concerning its execution. Each re-ceipt, denoted BR[i] for the ith transaction) is placed inan index-keyed trie and the root recorded in the header asHe.

    The transaction receipt is a tuple of four items com-prising the post-transaction state, R, the cumulative gasused in the block containing the transaction receipt as ofimmediately after the transaction has happened, Ru, theset of logs created through execution of the transaction, Rland the Bloom filter composed from information in thoselogs, Rb:

    (18) R (R, Ru, Rb, Rl)The function LR trivially prepares a transaction receipt

    for being transformed into an RLP-serialised byte array:

    (19) LR(R) (TRIE(LS(R)), Ru, Rb, Rl)thus the post-transaction state, R is encoded into a triestructure, the root of which forms the first item.

    We assert Ru, the cumulative gas used is a positive in-teger and that the logs Bloom, Rb, is a hash of size 512bits (64 bytes):

    (20) Ru P Rb B64The log entries, Rl, is a series of log entries, termed,

    for example, (O0, O1, ...). A log entry, O, is a tuple of aloggers address, Oa, a series of 32-bytes log topics, Otand some number of bytes of data, Od:

    (21) O (Oa, (Ot0, Ot1, ...), Od)

    (22) Oa B20 tOt : t B32 Od B

    We define the Bloom filter function, M , to reduce a logentry include a single 64-byte hash:

    (23) M(O)

    t{Oa}Ot

    (M3:512(t)

    )whereM3:512 is a specialised Bloom filter that sets three

    bits out of 512, given an arbitrary byte series. It does thisthrough taking the low-order 9 bits of each of the firstthree pairs of bytes in a Keccak-256 hash of the byte se-ries. Formally:

    M3:512(x : x B) y : y B64 where:(24)y = (0, 0, ..., 0) except:(25)

    i{0,2,4} : Bm(x,i)(y) = 1(26)m(x, i) KEC(x)[i, i+ 1] mod 512(27)

    where B is the bit reference function such that Bj(x)equals the bit of index j (indexed from 0) in the byte arrayx.

    4.3.2. Holistic Validity. We can assert a blocks validity ifand only if it satisfies several conditions: it must be in-ternally consistent with the uncle and transaction blockhashes and the given transactions BT (as specified in sec11), when executed in order on the base state (derivedfrom the final state of the parent block), result in a newstate of the identity Hr:(28)Hr TRIE(LS((, B))) Hu KEC(RLP(LH(BU))) Ht TRIE({i < BT, i P : p(i, LT (BT[i]))}) He TRIE({i < BT, i P : p(i, LR(BR[i]))}) Hb

    rBR

    (rb)

    where p(k, v) is simply the pairwise RLP transforma-tion, in this case, the first being the index of the trans-action in the block and the second being the transactionreceipt:

    (29) p(k, v) (RLP(k), RLP(v))Furthermore:

    (30) TRIE(LS()) = P (BH)Hr

    Thus TRIE(LS()) is the root node hash of the MerklePatricia tree structure containing the key-value pairs ofthe state with values encoded using RLP, and P (BH)is the parent block of B, defined directly.

    The values stemming from the computation of transac-tions, specifically the transaction receipts, BR, and thatdefined through the transactions state-accumulation func-tion, , are formalised later in section 11.4.

    4.3.3. Serialisation. The function LB and LH are thepreparation functions for a block and block header respec-tively. Much like the transaction receipt preparation func-tion LR, we assert the types and order of the structure forwhen the RLP transformation is required:

    LH(H) ( Hp, Hu, Hc, Hr, Ht, He, Hb, Hd,Hi, Hl, Hg, Hs, Hx, Hn )

    (31)

    LB(B) (LH(BH), L

    T (BT), L

    H(BU)

    )(32)

    With LT and LH being element-wise sequence trans-

    formations, thus:(33)f((x0, x1, ...)

    ) (f(x0), f(x1), ...) for any function f

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 6

    The component types are defined thus:

    (34) Hp B32 Hu B32 Hc B20 Hr B32 Ht B32 He B32 Hb B64 Hd P Hi P Hl P Hg P Hs P Hx B Hn B32

    where

    (35) Bn = {B : B B B = n}We now have a rigorous specification for the construc-

    tion of a formal block structure. The RLP function RLP(see Appendix B) provides the canonical method for trans-forming this structure into a sequence of bytes ready fortransmission over the wire or storage locally.

    4.3.4. Block Header Validity. We define P (BH) to be theparent block of B, formally:

    (36) P (H) B : KEC(RLP(BH)) = HpThe block number is the parents block number incre-

    mented by one:

    (37) Hi P (H)Hi + 1The canonical difficulty of a block of header H is de-

    fined as D(H):(38)

    D(H)

    222 if Hi = 0

    P (H)Hd + bP (H)Hd1024 c if Hs < P (H)Hs + 8max{1024, x} otherwise

    where:

    (39) x = P (H)Hd bP (H)Hd

    1024c

    The canonical gas limit of a block of header H is de-fined as L(H):

    L(H)

    106 if Hi = 0

    125000 if L(H) < 125000

    L(H) otherwise

    (40)

    L(H) 1023P (H)Hl + b 65P (H)Hgc1024

    (41)

    Hs is the timestamp of block H and must fulfil therelation:

    (42) Hs > P (H)Hs

    This mechanism enforces a homeostasis in terms of thetime between blocks; a smaller period between the last twoblocks results in an increase in the difficulty level and thusadditional computation required, lengthening the likelynext period. Conversely, if the period is too large, thedifficulty, and expected time to the next block, is reduced.

    The nonce, Hn, must satisfy the relation:

    (43) PoW(H,Hn) 62256

    Hd

    Where PoW is the proof-of-work function (see section11.5): this evaluates to an pseudo-random number cryp-tographically dependent on the parameters H and Hn.Given an approximately uniform distribution in the range[0, 2256), the expected time to find a solution is propor-tional to the difficulty, Hd.

    This is the foundation of the security of the blockchainand is the fundamental reason why a malicious node can-not propagate newly created blocks that would otherwise

    overwrite (rewrite) history. Because the nonce mustsatisfy this requirement, and because its satisfaction de-pends on the contents of the block and in turn its com-posed transactions, creating new, valid, blocks is difficultand, over time, requires approximately the total computepower of the trustworthy portion of the mining peers.

    Thus we are able to define the block header validityfunction V (H):

    V (H) PoW(H,Hn) 6 2256

    Hd(44)

    Hd = D(H) (45)Hl = L(H) (46)Hs > P (H)Hs (47)Hi = P (H)Hi + 1 (48)Hx 1024(49)

    Noting additionally that extraData must be at most1024 bytes.

    5. Gas and Payment

    In order to avoid issues of network abuse and to side-step the inevitable questions stemming from Turing com-pleteness, all programmable computation in Ethereum issubject to fees. The fee schedule is specified in units ofgas (see Appendix G for the fees associated with var-ious computation). Thus any given fragment of pro-grammable computation (this includes creating contracts,making message calls, utilising and accessing account stor-age and executing operations on the virtual machine) hasa universally agreed cost in terms of gas.

    Every transaction has a specific amount of gas associ-ated with it: gasLimit. This is the amount of gas whichis implicitly purchased from the senders account balance.The purchase happens at the according gasPrice, alsospecified in the transaction. The transaction is consideredinvalid if the account balance cannot support such a pur-chase. It is named gasLimit since any unused gas at theend of the transaction is refunded (at the same rate of pur-chase) to the senders account. Gas does not exist outsideof the execution of a transaction. Thus for accounts withtrusted code associated, a relatively high gas limit may beset and left alone.

    In general, Ether used to purchase gas that is not re-funded is delivered to the coinbase address, the addressof an account typically under the control of the miner.Transactors are free to specify any gasPrice that theywish, however miners are free to ignore transactions asthey choose. A higher gas price on a transaction will there-fore cost the sender more in terms of Ether and deliver agreater value to the miner and thus will more likely beselected for inclusion by more miners. Miners, in general,will choose to advertise the minimum gas price for whichthey will execute transactions and transactors will be freeto canvas these prices in determining what gas price tooffer. Since there will be a (weighted) distribution of min-imum acceptable gas prices, transactors will necessarilyhave a trade-off to make between lowering the gas priceand maximising the chance that their transaction will bemined in a timely manner.

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 7

    6. Transaction Execution

    The execution of a transaction is the most complex partof the Ethereum protocol: it defines the state transitionfunction . It is assumed that any transactions executedfirst pass the initial tests of intrinsic validity. These in-clude:

    (1) The transaction is well-formed RLP, with no ad-ditional trailing bytes;

    (2) the transaction signature is valid;(3) the transaction nonce is valid (equivalent to the

    sender accounts current nonce);(4) the gas limit is no smaller than the intrinsic gas,

    g0, used by the transaction;(5) the sender account balance contains at least the

    cost, v0, required in up-front payment.

    Formally, we consider the function , with T being atransaction and the state:

    (50) = (, T )

    Thus is the post-transactional state. We also defineg to evaluate to the amount of gas used in the executionof a transaction and l to evaluate to the transactionsaccrued log items, both to be formally defined later.

    6.1. Substate. Throughout transaction execution, weaccrue certain information that is acted upon immediatelyfollowing the transaction. We call this transaction sub-state, and represent it as A, which is a tuple:

    (51) A (As, Al, Ar)

    The tuple contents include As, the suicide set: a setof accounts that will be discarded following the transac-tions completion. Al is the log series: this is a series ofarchived and indexable checkpoints in VM code execu-tion that allow for contract-calls to be easily tracked byonlookers external to the Ethereum world (such as decen-tralised application front-ends). Finally there is Ar, therefund balance, increased through using the SSTORE in-struction in order to reset contract storage to zero fromsome non-zero value. Though not immediately refunded,it is allowed to partially offset the total execution costs.

    For brevity, we define the empty substate A0 to haveno suicides, no logs and a zero refund balance:

    (52) A0 (, (), 0)

    6.2. Execution. We define intrinsic gas g0, the amountof gas this transaction requires to be paid prior to execu-tion, as follows:(53)

    g0

    iTi,Td

    {Gtxdatazero if i = 0

    Gtxdatanonzero otherwise+Gtransaction

    where Ti, Td means the series of bytes of the trans-actions associated data and initialisation EVM-code,depending on whether the transaction is for contract-creation or message-call. G is defined in Appendix G.

    The up-front cost v0 is calculated as:

    (54) v0 TgTp + Tv

    The validity is determined as:

    (55) S(T ) 6= [S(T )] 6=

    Tn = [S(T )]n g0 6 Tg v0 6 [S(T )]b Tg 6 BHl `(BR)u

    Note the final condition; the sum of the transactionsgas limit, Tg, and the gas utilised in this block prior, givenby `(BR)u, must be no greater than the blocks gasLimit,BHl.

    The execution of a valid transaction begins with anirrevocable change made to the state: the nonce of theaccount of the sender, S(T ), is incremented by one andthe balance is reduced by the up-front cost, v0. The gasavailable for the proceeding computation, g, is defined asTg g0. The computation, whether contract creation ora message call, results in an eventual state (which maylegally be equivalent to the current state), the change towhich is deterministic and never invalid: there can be noinvalid transactions from this point.

    We define the checkpoint state 0:

    0 except:(56)0[S(T )]b [S(T )]b TgTp(57)0[S(T )]n [S(T )]n + 1(58)

    Evaluating P from 0 depends on the transactiontype; either contract creation or message call; we definethe tuple of post-execution provisional state P , remain-ing gas g and substate A:(59)

    (P , g, A)

    (0, S(T ), To,

    g, Tp, Tv, Ti, 0) if Tt = 3(0, S(T ), To,

    Tt, Tt, g, Tp, Tv, Td, 0) otherwise

    where g is the amount of gas remaining after deductingthe basic amount required to pay for the existence of thetransaction:

    (60) g Tg g0and To is the original transactor, which can differ from thesender in the case of a message call or contract creationnot directly triggered by a transaction but coming fromthe execution of EVM-code.

    Note we use 3 to denote the fact that only the firstthree components of the functions value are taken; thefinal represents the message-calls output value (a bytearray) and is unused in the context of transaction evalua-tion.

    After the message call or contract creation is processed,the state is finalised by determining the amount to be re-funded, g from the remaining gas, g, plus some allowancefrom the refund counter, to the sender at the original rate.

    (61) g g + min{Tg g

    2

    , Ar}

    The total refundable amount is the legitimately remain-ing gas g, added to Ar, with the latter component beingcapped up to a maximum of half (rounded down) of thetotal amount used Tg g.

    The Ether for the gas is given to the miner, whose ad-dress is specified as the coinbase of the present block B. So

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 8

    we define the pre-final state in terms of the provisionalstate P :

    P except(62)[S(T )]b P [S(T )]b + gTp(63)

    [m]b P [m]b + (Tg g)Tp(64)m BHc(65)

    The final state, , is reached after deleting all accountsthat appear in the suicide list:

    except(66)i As : [i] (67)

    And finally, we specify g, the total gas used in thistransaction and l, the logs created by this transaction:

    g(, T ) Tg g(68)l(, T ) Al(69)

    These are used to help define the transaction receipt,discussed later.

    7. Contract Creation

    There are number of intrinsic parameters used whencreating an account: sender (s), original transactor (o),available gas (g), gas price (p), endowment (v) togetherwith an arbitrary length byte array, i, the initialisationEVM code and finally the present depth of the message-call/contract-creation stack (e).

    We define the creation function formally as the func-tion , which evaluates from these values, together withthe state to the tuple containing the new state, remain-ing gas and accrued transaction substate (, g, A), as insection 6:

    (70) (, g, A) (, s, o, g, p, v, i, e)The address of the new account is defined as being the

    rightmost 160 bits of the Keccak hash of RLP encodingof the structure containing only the sender and the nonce.Thus we define the resultant address for the new accounta:

    (71) a B96..255(KEC(RLP(

    (s,[s]n 1))))

    where KEC is the Keccak 256-bit hash function, RLP isthe RLP encoding function, Ba..b(X) evaluates to binaryvalue containing the bits of indices in the range [a, b] ofthe binary data X and [x] is the address state of x or ifnone exists. Note we use one fewer than the senders noncevalue; we assert that we have incremented the sender ac-counts nonce prior to this call, and so the value usedis the senders nonce at the beginning of the responsibletransaction or VM operation.

    The accounts nonce is initially defined as zero, thebalance as the value passed, the storage as empty and thecode hash as the Keccak 256-bit hash of the empty string;the senders balance is also reduced by the value passed.Thus the mutated state becomes :

    (72) except:

    [a] (0, v + v, TRIE(), KEC(()))(73)[s]b [s]b v(74)

    where v is the accounts pre-existing value, in the eventit was previously in existence:

    (75) v {

    0 if [a] = [a]b otherwise

    Finally, the account is initialised through the executionof the initialising EVM code i according to the executionmodel (see section 9). Code execution can effect severalevents that are not internal to the execution state: theaccounts storage can be altered, further accounts can becreated and further message calls can be made. As such,the code execution function evaluates to a tuple of theresultant state , available gas remaining g, the ac-crued substate A and the body code of the account b.

    Code execution depletes gas; thus it may exit beforethe code has come to a natural halting state. In this (andseveral other) exceptional cases we say an Out-of-Gas ex-ception has occurred: The evaluated state is defined asbeing the empty set and the entire create operationshould have no effect on the state, effectively leaving itas it was immediately prior to attempting the creation.The gas remaining should be zero in any such exceptionalcondition. If the creation was conducted as the receptionof a transaction, then this doesnt affect payment of theintrinsic cost: it is paid regardless.

    If such an exception does not occur, then the remaininggas is refunded to the originator and the now-altered stateis allowed to persevere. Thus formally, we may specify theresultant state, gas and substate as (, g, A) where:

    (76) (, g, A,o) (, g, I)

    g

    0 if = g if g < c

    g c otherwise(77)

    if = if g < c

    except:

    [a]c = KEC(o) otherwise

    (78)

    Where I contains the parameters of the execution envi-ronment as defined in section 9.

    Ia a(79)Io o(80)Ip p(81)Id ()(82)Is s(83)Iv v(84)Ib i(85)Ie e(86)

    where c is the code-deposit cost:

    (87) c Gcreatedata |o|Id evaluates to the empty tuple as there is no input

    data to this call. IH has no special treatment and is de-termined from the blockchain. The exception in the deter-mination of dictates that the resultant byte sequencefrom the execution of the initialisation code specifies thefinal body code for the newly-created account, with [a]c

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 9

    being the Keccak 256-bit hash of the newly created ac-counts body code and o the output byte sequence of thecode execution.

    No code is deposited in the state if the gas does notcover the additional per-byte contract deposit fee, how-ever, the value is still transferred and the execution side-effects take place.

    7.1. Subtleties. Note that while the initialisation codeis executing, the newly created address exists but with nointrinsic body code. Thus any message call received byit during this time causes no code to be executed. If theinitialisation execution ends with a SUICIDE instruction,the matter is moot since the account will be deleted beforethe transaction is completed. For a normal STOP code,or if the code returned is otherwise empty, then the stateis left with a zombie account, and any remaining balancewill be locked into the account forever.

    8. Message Call

    In the case of executing a message call, several param-eters are required: sender (s), transaction originator (o),recipient (r), the account whose code is to be executed (c,usually the same as recipient), available gas (g), value (v)and gas price (p) together with an arbitrary length bytearray, d, the input data of the call and finally the presentdepth of the message-call/contract-creation stack (e).

    Aside from evaluating to a new state and transactionsubstate, message calls also have an extra componenttheoutput data denoted by the byte array o. This is ignoredwhen executing transactions, however message calls canbe initiated due to VM-code execution and in this casethis information is used.

    (88) (, g, A,o) (, s, o, r, c, g, p, v,d, e)We define 1, the first transitional state as the orig-

    inal state but with the value transferred from sender torecipient:

    (89) 1[r]b [r]b + v 1[s]b [s]b vThroughout the present work, it is assumed that if

    1[r] was originally undefined, it will be created as an ac-count with no code or state and zero balance and nonce.Thus the previous equation should be taken to mean:

    (90) 1 1 except:

    (91) 1[s]b 1[s]b v

    (92) and 1 except:

    (93)

    {1[r] (v, 0, KEC(()), TRIE()) if [r] = 1[r]b [r]b + v otherwise

    The accounts associated code (identified as the frag-ment whose Keccak hash is [c]c) is executed according tothe execution model (see section 9). Just as with contractcreation, if the execution halts in an exceptional fashion(i.e. due to an exhausted gas supply, stack underflow, in-valid jump destination or invalid instruction), then no gasis refunded to the caller and the state is reverted to thepoint immediately prior to balance transfer (i.e. ).

    { if = otherwise

    (94)

    (, g, s,o)

    ECREC(1, g, I) if a = 1

    SHA256(1, g, I) if a = 2

    RIP160(1, g, I) if a = 3

    ID(1, g, I) if a = 4

    (1, g, I) otherwise

    (95)

    Ia a(96)Io o(97)Ip p(98)Id d(99)Id d(100)Is s(101)Iv v(102)Ie e(103)

    Let KEC(Ib) = [c]c(104)

    It is assumed that the client will have stored the pair(KEC(Ib), Ib) at some point prior in order to make the de-termination of Ib feasible.

    As can be seen, there are four exceptions to the usageof the general execution framework for evaluation of themessage call: these are four so-called precompiled con-tracts, meant as a preliminary piece of architecture thatmay later become native extensions. The four contractsin addresses 1, 2, 3 and 4 execute the elliptic curve publickey recovery function, the SHA2 256-bit hash scheme, theRIPEMD 160-bit hash scheme and the identity functionrespectively.

    Their full formal definition is in Appendix E.

    9. Execution Model

    The execution model specifies how the system state isaltered given a series of bytecode instructions and a smalltuple of environmental data. This is specified through aformal model of a virtual state machine, known as theEthereum Virtual Machine (EVM). It is a quasi-Turing-complete machine; the quasi qualification comes from thefact that the computation is intrinsically bounded througha parameter, gas, which limits the total amount of com-putation done.

    9.1. Basics. The EVM is a simple stack-based architec-ture. The word size of the machine (and thus size ofstack item) is 256-bit. This was chosen to facilitate theKeccak-256 hash scheme and elliptic-curve computations.The memory model is a simple word-addressed byte array.The stack has an unlimited size. The machine also has anindependent storage model; this is similar in concept tothe memory but rather than a byte array, it is a word-addressable word array. Unlike memory, which is volatile,storage is non volatile and is maintained as part of thesystem state. All locations in both storage and memoryare well-defined initially as zero.

    The machine does not follow the standard von Neu-mann architecture. Rather than storing program code ingenerally-accessible memory or storage, it is stored sepa-rately in a virtual ROM interactable only through a spe-cialised instruction.

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 10

    The machine can have exceptional execution for sev-eral reasons, including stack underflows and invalid in-structions. Like the out-of-gas (OOG) exception, theydo not leave state changes intact. Rather, the machinehalts immediately and reports the issue to the executionagent (either the transaction processor or, recursively, thespawning execution environment) and which will deal withit separately.

    9.2. Fees Overview. Fees (denominated in gas) arecharged under three distinct circumstances, all three asprerequisite to the execution of an operation. The firstand most common is the fee intrinsic to the computa-tion of the operation. Most operations require a singlegas fee to be paid for their execution; exceptions includeSSTORE, SLOAD, CALL, CREATE, BALANCE, EXP, LOGand SHA3. Secondly, gas may be deducted in order toform the payment for a subordinate message call or con-tract creation; this forms part of the payment for CRE-ATE, CALL and CALLCODE. Finally, gas may be paiddue to an increase in the usage of the memory.

    Over an accounts execution, the total fee for memory-usage payable is proportional to smallest multiple of 32bytes that are required such that all memory indices(whether for read or write) are included in the range. Thisis paid for on a just-in-time basis; as such, referencing anarea of memory at least 32 bytes greater than any previ-ously indexed memory will certainly result in an additionalmemory usage fee. Due to this fee it is highly unlikelyaddresses will ever go above 32-bit bounds. That said,implementations must be able to manage this eventuality.

    Storage fees have a slightly nuanced behaviourto in-centivise minimisation of the use of storage (which corre-sponds directly to a larger state database on all nodes),the execution fee for an operation that clears an entry inthe storage is not only waived, a qualified refund is given;in fact, this refund is effectively paid up-front since theinitial usage of a storage location costs substantially morethan normal usage.

    See Appendix H for a rigorous definition of the EVMgas cost.

    9.3. Execution Environment. In addition to the sys-tem state , and the remaining gas for computation g,there are several pieces of important information used inthe execution environment that the execution agent mustprovide; these are contained in the tuple I:

    Ia, the address of the account which owns thecode that is executing.

    Io, the sender address of the transaction that orig-inated this execution.

    Ip, the price of gas in the transaction that origi-nated this execution.

    Id, the byte array that is the input data to thisexecution; if the execution agent is a transaction,this would be the transaction data.

    Is, the address of the account which caused thecode to be executing; if the execution agent is atransaction, this would be the transaction sender.

    Iv, the value, in Wei, passed to this account aspart of the same procedure as execution; if theexecution agent is a transaction, this would bethe transaction value.

    Ib, the byte array that is the machine code to beexecuted.

    IH , the block header of the present block. Ie, the depth of the present message-call or

    contract-creation (i.e. the number of CALLs orCREATEs being executed at present).

    The execution model defines the function , which cancompute the resultant state , the remaining gas g, thesuicide list s, the log series l, the refunds r and the resul-tant output, o, given these definitions:

    (105) (, g, s, l, r,o) (, g, I)9.4. Execution Overview. We must now define the function. In most practical implementations this will bemodelled as an iterative progression of the pair comprisingthe full system state, and the machine state, . For-mally, we define it recursively with a function X. Thisuses an iterator function O (which defines the result of asingle cycle of the state machine) together with functionsZ which determines if the present state is an exceptionalhalting state of the machine and H, specifying the outputdata of the instruction if and only if the present state is anormal halting state of the machine.

    The empty sequence, denoted (), is not equal to theempty set, denoted ; this is important when interpretingthe output of H, which evaluates to when execution is tocontinue but a series (potentially empty) when executionshould halt.

    (, g, I) X0,1,2,4((,, A0, I)

    )(106)

    g g(107)pc 0(108)m (0, 0, ...)(109)i 0(110)s ()(111)

    (112)

    X((,, A, I)

    ) (,, A0, I, ()

    )if Z(,, I)

    O(,, A, I) o if o 6= X(O(,, A, I)

    )otherwise

    where

    o H(, I)(113)(a, b, c) d (a, b, c, d)(114)

    Note that we must drop the fourth value in the tuplereturned by X to correctly evaluate , hence the subscriptX0,1,2,4.X is thus cycled (recursively here, but implementations

    are generally expected to use a simple iterative loop) untileither Z becomes true indicating that the present state isexceptional and that the machine must be halted and anychanges discarded or until H becomes a series (rather thanthe empty set) indicating that the machine has reached acontrolled halt.

    9.4.1. Machine State. The machine state is defined asthe tuple (g, pc,m, i, s) which are the gas available, theprogram counter pc P256 , the memory contents, the ac-tive number of words in memory (counting continuouslyfrom position 0), and the stack contents. The memorycontents m are a series of zeroes of size 2

    256.For the ease of reading, the instruction mnemonics,

    written in small-caps (e.g. ADD), should be interpreted

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 11

    as their numeric equivalents; the full table of instructionsand their specifics is given in Appendix H.

    For the purposes of defining Z, H and O, we define was the current operation to be executed:

    (115) w {Ib[pc] if pc < IbSTOP otherwise

    We also assume the fixed amounts of and , specify-ing the stack items removed and added, both subscript-able on the instruction and an instruction cost function Cevaluating to the full cost, in gas, of executing the giveninstruction.

    9.4.2. Exceptional Halting. The exceptional halting func-tion Z is defined as:

    (116) Z(,, I) g < C(,, I) w = s < w (w {JUMP, JUMPI} s[0] / D(Ib))

    This states that the execution is in an exceptional halt-ing state if there is insufficient gas, if the instruction is in-valid (and therefore its subscript is undefined), if thereare insufficient stack items, if a JUMP/JUMPI destinationis invalid or if a CALL, CALLCODE or CREATE instructionis executed when the call stack limit of 1024 is reached.The astute reader will realise that this implies that no in-struction can, through its execution, cause an exceptionalhalt.

    9.4.3. Jump Destination Validity. We previously used Das the function to determine the set of valid jump desti-nations given the code that is being run. We define thisas any position in the code occupied by a JUMPDEST in-struction.

    All such positions must be on valid instruction bound-aries, rather than sitting in the data portion of PUSHoperations and must appear within the explicitly definedportion of the code (rather than in the implicitly definedSTOP operations that trail it).

    Formally:

    (117) D(c) DJ(c,0) Y (c, 0)

    where:

    (118) Y (c, i) {{} if i > |c|{i} Y (c, N(i, c[i])) otherwise

    (119)

    DJ(c, i)

    {} if i > |c|{i} DJ(c, N(i, c[i])) if c[i] = JUMPDESTDJ(c, N(i, c[i])) otherwise

    where N is the next valid instruction position in thecode, skipping the data of a PUSH instruction, if any:(120)

    N(i, w) {i+ w PUSH1 + 2 if w [PUSH1,PUSH32]i+ 1 otherwise

    9.4.4. Normal Halting. The normal halting function H isdefined:(121)

    H(, I)

    HRETURN() if w = RETURN

    () if w {STOP, SUICIDE} otherwise

    The data-returning halt operation, RETURN, has aspecial function HRETURN, defined in Appendix H.

    9.5. The Execution Cycle. Stack items are added orremoved from the left-most, lower-indexed portion of theseries; all other items remain unchanged:

    O((,, A, I)

    ) (,, A, I)(122) w w(123)

    s s+ (124)x [w, s) : s[x] s[x+ ](125)

    The gas is reduced by the instructions gas cost andfor most instructions, the program counter increments oneach cycle, for the three exceptions, we assume a functionJ , subscripted by one of two instructions, which evaluatesto the according value:

    g g C(,, I)(126)

    pc

    JJUMP() if w = JUMP

    JJUMPI() if w = JUMPI

    N(pc, w) otherwise

    (127)

    In general, we assume the memory, suicide list and sys-tem state dont change:

    m m(128)i i(129)A A(130) (131)

    However, instructions do typically alter one or severalcomponents of these values. Altered components listed byinstruction are noted in Appendix H, alongside values for and and a formal description of the gas requirements.

    10. Blocktree to Blockchain

    The canonical blockchain is a path from root to leafthrough the entire block tree. In order to have consensusover which path it is, conceptually we identify the paththat has had the most computation done upon it, or, theheaviest path. Clearly one factor that helps determine theheaviest path is the block number of the leaf, equivalentto the number of blocks, not counting the unmined genesisblock, in the path. The longer the path, the greater thetotal mining effort that must have been done in order toarrive at the leaf. This is akin to existing schemes, suchas that employed in Bitcoin-derived protocols.

    This scheme notably ignores so-called stale blocks:valid, mined blocks, which were propagated too late intothe network and thus were beaten to network consensus bya sibling block (one with the same parent). Such blocksbecome more common as the network propagation timeapproaches the ideal inter-block time. However, by count-ing the computation work of stale block headers, we areable to do better: we can utilise this otherwise wastedcomputation and put it to use in helping to buttress the

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 12

    more popular blockchain making it a stronger choice overless popular (though potentially longer) competitors.

    This increases overall network security by making itmuch harder for an adversary to silently mine a canonicalblockchain (which, it is assumed, would contain differenttransactions to the current consensus) and dump it on thenetwork with the effect of overriding existing blocks andreversing the transactions within.

    In order to validate the extra computation, a givenblock B may include the block headers from any knownuncle blocks (i.e. blocks whose parent is equivalent to theNth generation grandparent of B, for a limited set of N).Since a block header includes the nonce, a proof-of-work,then the header alone is enough to validate the computa-tion done. Any such blocks contribute toward the totalcomputation or total difficulty of a chain that includesthem. To incentivise computation and inclusion, a rewardis given both to the miner of the stale block and the minerof the block that references it.

    Thus we define the total difficulty of block B recur-sively as:

    Bt Bt +Bd +

    UBUUd(132)

    B P (BH)(133)As such given a block B, Bt is its total difficulty, B

    is its parent block, Bd is its difficulty and BU is its set ofuncle blocks.

    11. Block Finalisation

    The process of finalising a block involves four stages:

    (1) Validate (or, if mining, determine) uncles;(2) validate (or, if mining, determine) transactions;(3) apply rewards;(4) verify (or, if mining, compute a valid) state and

    nonce.

    11.1. Uncle Validation. The validation of uncle head-ers means nothing more than verifying that each uncleheader is both a valid header and satisfies the relation ofNth-generation uncle to the present block where N 6.Formally:

    (134)

    UBUV (U) k(U,P (BH), 6)

    where k denotes the is-kin property:

    (135) k(U,H, n)

    false if n = 0

    s(U,H)

    k(U,P (H), n 1) otherwiseand s denotes the is-sibling property:

    (136) s(U,H) (P (H) = P (U) H 6= U)

    11.2. Transaction Validation. The given gasUsedmust correspond faithfully to the transactions listed:BHu, the total gas used in the block, must be equal tothe accumulated gas used according to the final transac-tion:

    (137) BHg = `(R)u

    11.3. Reward Application. The application of rewardsto a block involves raising the balance of the accounts ofthe coinbase address of the block and each uncle by a cer-tain amount. We raise the blocks coinbase account by Rb;for each uncle, we raise the blocks coinbase by an addi-tional 1

    32of the block reward and the coinbase of the uncle

    by 1516

    of the reward. Formally we define the function :

    (B,) : = except:(138)[BHc]b = [BHc]b + (1 +

    |BU|32

    )Rb(139)

    UBU [Uc]b = [Uc]b +15

    16Rb(140)

    If there are collisions of the coinbase addresses betweenuncles and the block (i.e. two uncles with the same coin-base address or an uncle with the same coinbase addressas the present block), additions are applied cumulatively.

    We define the block reward as 1500 Finney:

    (141) Let Rb = 1.5 1018

    11.4. State & Nonce Validation. We may now definethe function, , that maps a block B to its initiation state:(142)

    (B) {0 if P (BH) = i : TRIE(LS(i)) = P (BH)Hr otherwise

    Here, TRIE(LS(i)) means the hash of the root node ofa trie of state i; it is assumed that implementations willstore this in the state database, trivial and efficient sincethe trie is by nature an immutable data structure.

    And finally define , the block transition function,which maps an incomplete block B to a complete blockB:

    (B) B : B = B except:(143)Bn = n : PoW(B

    , n) s[1]

    0 otherwise

    0x12 SLT 2 1 Signed less-than comparision.

    s[0] {

    1 if s[0] < s[1]

    0 otherwise

    Where all values are treated as twos complement signed 256-bit integers.

    0x13 SGT 2 1 Signed greater-than comparision.

    s[0] {

    1 if s[0] > s[1]

    0 otherwise

    Where all values are treated as twos complement signed 256-bit integers.

    0x14 EQ 2 1 Equality comparision.

    s[0] {

    1 if s[0] = s[1]

    0 otherwise

    0x15 ISZERO 1 1 Simple not operator.

    s[0] {

    1 if s[0] = 0

    0 otherwise

    0x16 AND 2 1 Bitwise AND operation.i [0..255] : s[0]i s[0]i s[1]i

    0x17 OR 2 1 Bitwise OR operation.i [0..255] : s[0]i s[0]i s[1]i

    0x18 XOR 2 1 Bitwise XOR operation.i [0..255] : s[0]i s[0]i s[1]i

    0x19 NOT 1 1 Bitwise NOT operation.

    i [0..255] : s[0]i {

    1 if s[0]i = 0

    0 otherwise

    0x1a BYTE 2 1 Retrieve single byte from word.

    i [0..255] : s[0]i {s[1](i+8s[0]) if i < 8 s[0] < 320 otherwise

    For Nth byte, we count from the left (i.e. N=0 would be the most significant in big endian).

    20s: SHA3

    Value Mnemonic Description

    0x20 SHA3 2 1 Compute Keccak-256 hash.s[0] Keccak(m[s[0] . . . (s[0] + s[1] 1)])i M(i,s[0],s[1])

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 24

    30s: Environmental Information

    Value Mnemonic Description

    0x30 ADDRESS 0 1 Get address of currently executing account.s[0] Ia

    0x31 BALANCE 1 1 Get balance of the given account.

    s[0] {[s[0]]b if [s[0] mod 2

    160] 6= 0 otherwise

    0x32 ORIGIN 0 1 Get execution origination address.s[0] IoThis is the sender of original transaction; it is never an account with non-emptyassociated code.

    0x33 CALLER 0 1 Get caller address.s[0] IsThis is the address of the account that is directly responsible for this execution.

    0x34 CALLVALUE 0 1 Get deposited value by the instruction/transaction responsible for this execution.s[0] Iv

    0x35 CALLDATALOAD 1 1 Get input data of current environment.s[0] Id[s[0] . . . (s[0] + 31)]This pertains to the input data passed with the message call instruction or transaction.

    0x36 CALLDATASIZE 0 1 Get size of input data in current environment.s[0] IdThis pertains to the input data passed with the message call instruction or transaction.

    0x37 CALLDATACOPY 3 0 Copy input data in current environment to memory.

    i{0...s[2]1}m[s[0] + i] {Id[s[1] + i] if i < Id0 otherwise

    This pertains to the input data passed with the message call instruction or transaction.

    0x38 CODESIZE 0 1 Get size of code running in current environment.s[0] Ib

    0x39 CODECOPY 3 0 Copy code running in current environment to memory.

    i{0...s[2]1}m[s[0] + i] {Ib[s[1] + i] if i < IbSTOP otherwise

    0x3a GASPRICE 0 1 Get price of gas in current environment.s[0] IpThis is gas price specified by the originating transaction.

    0x3b EXTCODESIZE 1 1 Get size of an accounts code.s[0] [s[0] mod 2160]c

    0x3c EXTCODECOPY 4 0 Copy an accounts code to memory.

    i{0...s[3]1}m[s[1] + i] {

    c[s[2] + i] if i < cSTOP otherwise

    where c [s[0] mod 2160]c

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 25

    40s: Block Information

    Value Mnemonic Description

    0x40 BLOCKHASH 1 1 Get the hash of one of the 256 most recent complete blocks.s[0] P (IHp ,s[0], 0)where P is the hash of a block of a particular number, up to a maximum age.0 is left on the stack if the looked for block number is greater than the current block numberor more than 256 blocks behind the current block.

    P (h, n, a)

    0 if n > Hi a = 256 h = 0h if n = Hi

    P (Hp, n, a+ 1) otherwise

    and we assert the header H can be determined as its hash is the parent hashin the block following it.

    0x41 COINBASE 0 1 Get the blocks coinbase address.s[0] IHc

    0x42 TIMESTAMP 0 1 Get the blocks timestamp.s[0] IHs

    0x43 NUMBER 0 1 Get the blocks number.s[0] IHi

    0x44 DIFFICULTY 0 1 Get the blocks difficulty.s[0] IHd

    0x45 GASLIMIT 0 1 Get the blocks gas limit.s[0] IHl

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 26

    50s: Stack, Memory, Storage and Flow Operations

    Value Mnemonic Description

    0x50 POP 1 0 Remove item from stack.

    0x51 MLOAD 1 1 Load word from memory.s[0] m[s[0] . . . (s[0] + 31)]i max(i, d(s[0] + 32) 32e)

    0x52 MSTORE 2 0 Save word to memory.m[s[0] . . . (s[0] + 31)] s[1]i max(i, d(s[0] + 32) 32e)

    0x53 MSTORE8 2 0 Save byte to memory.m[s[0]] (s[1] mod 256)i max(i, d(s[0] + 1) 32e)

    0x54 SLOAD 1 1 Load word from storage.s[0] [Ia]s[s[0]]

    0x55 SSTORE 2 0 Save word to storage.[Ia]s[s[0]] s[1]

    CSSTORE(,)

    Gsset if s[1] 6= 0 [Ia]s[s[0]] = 0Gsclear if s[1] = 0 [Ia]s[s[0]] 6= 0Gsreset otherwise

    Ar Ar +{Rsclear if s[1] = 0 [Ia]s[s[0]] 6= 00 otherwise

    0x56 JUMP 1 0 Alter the program counter.JJUMP() s[0]This has the effect of writing said value to pc. See section 9.

    0x57 JUMPI 2 0 Conditionally alter the program counter.

    JJUMPI() {s[0] if s[1] 6= 0pc + 1 otherwise

    This has the effect of writing said value to pc. See section 9.

    0x58 PC 0 1 Get the program counter.s[0] pc

    0x59 MSIZE 0 1 Get the size of active memory in bytes.s[0] 32i

    0x5a GAS 0 1 Get the amount of available gas.s[0] g

    0x5b JUMPDEST 0 0 Mark a valid destination for jumps.This operation has no effect on machine state during execution.

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 27

    60s & 70s: Push Operations

    Value Mnemonic Description

    0x60 PUSH1 0 1 Place 1 byte item on stack.s[0] c(pc + 1)where c(x)

    {Ib[x] if x < Ib0 otherwise

    The bytes are read in line from the program codes bytes array.The function c ensures the bytes default to zero if they extend past the limits.The byte is right-aligned (takes the lowest significant place in big endian).

    0x61 PUSH2 0 1 Place 2-byte item on stack.s[0] c

    ((pc + 1) . . . (pc + 2)

    )where c is defined as above.The bytes are right-aligned (takes the lowest significant place in big endian).

    ......

    ......

    ...

    0x7f PUSH32 0 1 Place 32-byte (full word) item on stack.s[0] c

    ((pc + 1) . . . (pc + 32)

    )where c is defined as above.The bytes are right-aligned (takes the lowest significant place in big endian).

    80s: Duplication Operations

    Value Mnemonic Description

    0x80 DUP1 1 2 Duplicate 1st stack item.s[0] s[0]

    0x81 DUP2 2 3 Duplicate 2nd stack item.s[0] s[1]

    ......

    ......

    ...

    0x8f DUP16 16 17 Duplicate 16th stack item.s[0] s[15]

    90s: Exchange Operations

    Value Mnemonic Description

    0x90 SWAP1 2 2 Exchange 1st and 2nd stack items.s[0] s[1]s[1] s[0]

    0x91 SWAP2 3 3 Exchange 1st and 3rd stack items.s[0] s[2]s[2] s[0]

    ......

    ......

    ...

    0x9f SWAP16 17 17 Exchange 1st and 17th stack items.s[0] s[16]s[16] s[0]

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 28

    a0s: Logging Operations

    For all logging operations, the state change is to append an additional log entry on to the substates log series:Al Al (Ia, t,m[s[0] . . . (s[0] + s[1] 1)])The entrys topic series, t, differs accordingly:

    Value Mnemonic Description

    0xa0 LOG0 2 0 Append log record with no topics.t ()

    0xa1 LOG1 3 0 Append log record with one topic.t (s[2])

    ......

    ......

    ...

    0xa4 LOG4 6 0 Append log record with four topics.t (s[2],s[3],s[4],s[5])

  • ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER FINAL DRAFT - UNDER REVIEW 29

    f0s: System operations

    Value Mnemonic Description

    0xf0 CREATE 3 1 Create a new account with associated code.i m[s[1] . . . (s[1] + s[2] 1)](,g, A

    +) {

    (, Ia, Io,g, Ip,s[0], i, Ie + 1) if s[0] 6 [Ia]b Ie < 1024(,g,

    )otherwise

    except [Ia]n = [Ia]n + 1A A uniondblA+ which implies: As As A+s Al Al A+l Ar Ar +A+rs[0] xwhere x = 0 if the code execution for this operation failed due to lack of gas or Ie = 1024(the maximum call depth limit is reached) or s[0] > [Ia]b (balance of the caller is toolow to fulfil the value transfer); and otherwise x = A(Ia,[Ia]n), the address of the newlycreated account, otherwise.i M(i,s[1],s[2])Thus the operand order is: value, input offset, input size.

    0xf1 CALL 7 1 Message-call into an account.i m[s[3] . . . (s[3] + s[4] 1)]o m[s[5] . . . (s[5] + s[6] 1)]

    (, g, A+,o)

    (, Ia, Io, t, t,

    s[0], Ip,s[2], i, Ie + 1)if s[2] 6 [Ia]b Ie < 1024

    (, g,,o) otherwiseg g + gs[0] xA A uniondblA+t s[1]where x = 0 if the code execution for this operation failed due to lack of gas or ifs[2] > [Ia]b (not enough funds) or Ie = 1024 (call depth limit reached); x = 1otherwise.i M(M(i,s[3],s[4]),s[5],s[6])Thus the operand order is: gas, to, value, in offset, in size, out offset, out size.

    0xf2 CALLCODE 7 1 Message-call into this account with alternative accounts code.Exactly equivalent to CALL except:

    (, g, A+,o)

    (, Ia, Io, Ia, t,

    s[0], Ip,s[2], i, Ie + 1)if s[2] 6 [Ia]b Ie < 1024

    (, g,,o) otherwiseNote the change in the fourth parameter to the call from the 2nd stack value s[1](as in CALL) to the present address Ia. This means that the recipient is in fact thesame account as at present, simply that the code is overridden altered.

    0xf3 RETURN 2 0 Halt execution returning output data.HRETURN() m[s[0] . . . (s[0] + s[1] 1)]This has the effect of halting the execution at this point with output defined.See section 9.i M(i,s[0],s[1])

    0xff SUICIDE 1 0 Halt execution and register account for later deletion.As As {Ia}[s[0] mod 2

    160]b [s[0] mod 2160]b + [Ia]b[Ia]b 0

    Appendix I. Genesis Block

    The genesis block is 13 items, and is specified thus:

    (216)((

    0256, KEC(RLP(())), 0160, stateRoot, 0, 2

    22, 0, 0, 1000000, 0, 0, 0, KEC((42)

    )), (), ()

    )Where 0256 refers to the parent hash, a 256-bit hash which is all zeroes; 0160 refers to the coinbase address, a 160-bit

    hash which is all zeroes; 222 refers to the difficulty; 0 refers to the timestamp (the Unix epoch); the transaction trie rootand extradata are both 0, being equivalent to the empty byte array. The sequences of both uncles and transactions areempty and represented by (). KEC

    ((42)

    )refers to the Keccak hash of a byte array of length one whose first and only byte

    is of value 42. KEC(RLP(()))

    value refers to the hash of the uncle lists in RLP, both empty lists.The proof-of-concept series include a development premine, making the state root hash some value stateRoot. The

    latest documentation should be consulted for the value of the state root.