Writing Proof Of Work Blockchain full-node from scratch
In this article, I want to cover a simplified but working example of decentralized Blockchain based on Proof Of Work algorithm, some sort of simplified Bitcoin. You can think about it as a simplified version of Bitcoin.
What I will describe:
- PoW blockchain theory
- How to write blockchain in Python from scratch
- How to write Full-Node and create decentralized nodes to run the blockchain on them
The code you can find in my GitHub: https://github.com/creotiv/full_node_blockchain
What the point in all of this?
First you should understand that Blockchain and Cryptocurrencies are not the same. Blockchain is a cryptography algorithm to store data in a decentralized way when you cant trust anyone. Cryptocurrencies are little part of all projects that used Blockchain.
Second. Idea of decentralized data storing and processing lies in the ability to create conditions where the system can be controlled only by most of the people and not by just some of them. Which blocks people, parties, countries from enforcing their rules on users of the system. Basically some sort of Free speech, but for the internet services.
What need to understand before proceeding
- No off chain data. As we can’t trust anyone, that implies that we cant trust any data outside our blockchain database. That means that we can’t use things like time, API calls to any services, etc. Which makes things much harder.
Some systems that used on-chain services called Oracles to access off-chain data, but that’s another story.
- Blockchain trilemma. Which come from the CAP theorem. You can pick only two. So there can’t be Blazing Fast, Decentralized, and Secure solution.
Proof of Work is one of the oldest algorithms of consensus used in a Blockchain. Consensus — is a way how decentralized actors/nodes can agree on something that happens in the system, like data update.
So for us Blockchain is a decentralized database where we have transactions to update it state and have a consensus algorithm so nodes can agree that the update is valid. And that what is Blockchain from a global perspective.
So assume we have many nodes, each of it has acurrent data state. So we need some way to update that state and save an order of our updates(in most cases like money transfers it matters).
We have 3 ways of doing our transactions: save new data state in the transaction, save diff of states(+$100/-$100), save state flow.
In the first one we can’t verify the chain of our modification, in case of error or attack. Even current centralized financial solutions doesn’t save just your current money balance, instead they save log of all updates to your balance, from which are calculation your current balance.
That’s why most(didn’t see them all :) ) blockchain solutions use diff of state(Ethereum) or state flow pattern(Bitcoin) to describe transactions.
Here is how looks like state flow:
Each Transaction has Inputs and Outputs, where each Input pointing on the previous output. Each Output can be used as an input only 1 time.
Each input is signed by a private key of user wallet to prove that he is the owner of that Output.
So for example you have one Output from the exchange that saying that you were bought 10 coins to your wallet. And now you transferring 5 coins to your friend wallet. You left 5 coins from the previous output not spent, if you left them like this system will count them as a fee for processing your Transaction. To not pay a fee, you can add additional Output that will move money back to your wallet.
That’s the idea behind the transactions in a bitcoin-like system.
But now we need add some order to Txs, as we have a decentralized system and data can mess up.
We cant use time as other systems, because it not trusted data. That where we get Blocks, from which came the name Blockchain.
Blocks add an order for our transactions. All Txs that go in the same block considering running at the same time.
Here is how they look like:
Each block consists of data(list of transactions), hash of the block(got from mining), and link to the previous block hash, and some additional data like time, nonce, block index.
First transaction in a block is a COINBASE transaction, that doesnt have previous hash and is used to give reward for mining.
Each block hash is constructed based on Tx hashes, previous block hash, nonce and other data. Thus you cant replace block from inside the chain. The only way how you can change the data in the whole blockchain is to recompute hashes for all blocks. And that’s where the Proof of Work algorithm takes his place.
Proof Of Work algorithm
As recomputing hashes it’s not a problem, we need some mechanism to make this unreal. PoW one of such things.
Instead of just use any hash for our block, PoW set rules on which hash we can consider valid. Such rules add a level of difficulty to calculate the hash, cause now it need many iterations to find a valid hash, thus it takes time and resources.
From that point we see that attackers can’t modify running blockchain without having at least 51% of all resources used in it. Because not forget that block mined decentralized by thousands of different nodes.
Mining is a process of finding hash for the Block that will be considered valid by the system.
If you look at the block description you will find nonce field. That field is used to change the hash for mining, as we can change block data.
That’s working because even small modification to data, makes hash totally different.
For each mined Block miner receive some reward and also all fees from Txs.
With some amount of blocks reward and difficulty of hash rules may change. That’s how the total supply of coins is limited.
Merkel Tree — is an algorithm to hash data with a tree structure, which adds a possibility to update a resulting hash without whole tree re-computation when new data added.
It is used to hash list of transactions in a Block and save resources during mining by removing the need to recompute all Txs hashes each time.
Also, it gives the ability to find if some data in a tree with less computation needed.
We have some number of nodes. Each of them has a duplicate of the whole data of the blockchain(such nodes called Full-Node).
Nodes are using the gossip communication principle — If we get data that we didn’t see we broadcast it to other nodes that we know. In such a way data like Txs, blocks, new nodes addresses, etc, are shared across all nodes.
When some time/number of Txs passed from the adding(mining) of the last block, nodes start mining for a new block concurrently, the first who mine it, add it to his chain and share to other nodes, they validate it and if it ok add it also and broadcast it farther.
In some situations Split Brain situation may occur, when two nodes mine two different blocks at the ~same time. In such case some nodes continue mining new Block based on first Block, and other on the second. The first chain which will be longer wins and will be added to the blockchain.
- Running mining concurrently and increasing difficulty use too much resources, that could be used more useful.
- As difficulty too big for one person to mine, people gathered in a mining pools, which uses their resources to mine block and spread reward among all participants.
As we see if 3 pool merges, they will have more than 51% mining resources, which will give them ability to compromise the network. Also it shows that bitcoin is not such decentralized solution as many people think.
You can proceed straight to the code: https://github.com/creotiv/full_node_blockchain
I tried to not add different hard things that are not related to the blockchain directly, to minimize code amount so you are not lost in it. Also nodes don’t save their states on shut down.
What i covered in this demo:
- Blockchain based on Proof Of Work algorithm
- Transaction spent control. Each Tx Input pointed to the previous Tx Output
- Signed Inputs by wallet private key
- Using Merkel Tree for faster Block hash computation during mining
- Mining process
- Sync process between nodes
- Transaction and Block verifiers
- Same configuration on reward and difficulty for all blocks. Thus no supply limits.
- Nodes blocks, txs, nodes address gossip broadcast
- Covering Blockchain split brain situation but only for one level.
- Openapi schema + UI (generated by FastAPI)
- Some tests for blockchain. Cause it very simple to mess things up with all these hashes
What i didn’t covered in this demo:
- Multi-level split brain
- Automatic node discovery in subnets through service discovery protocol and ping
- Integration testing
- Byzantine testing
- Many things that real blockchain solution has. If you interesting in such, you can open Bitcoin or Ethereum after reading this.
- Multiprocessing for mining
- Light client
- No limitation on block sizes or number of Txs. Block mining starts after previous mining ends.
We create some wrapper around RSA python library. Wallet consists from 2 keys: public and private. The Public key is our blockchain address, and it used to verify the signature on data, which we make with our private key(which is not shared with anyone).
In Bitcoin and other solution ECDSA cryptography are used instead of RSA
In a blocks.py we describe our blockchain building blocks as Txs, Input, Output and Block . Each class has hash, as_dict, from_dict methods.
We sign each Input with our wallet instance.
Output class has field input_hash that used to create a unique hash for each output in a transaction, in other cases it would be similar in many cases
As i said before we use the Merkel Tree algorithm to hash all transactions in a block to speed up mining
One of the main parts of our system, as we need to be sure that the data that we get from the other nodes are valid.
Previous Inputs to our Txs controlled with internal DB, that updated with each block to remove needs of passing through thewhole blockchain(27GB now) to find needed data. Basically it’s how Blockchain is saved on nodes in a network.
Method force_block used to run mining of the new block by gathering some number of Txs ordered by fee and add Coinbase Tx to them.
After block added the chain we use rollover_block to update our DB with new data.
In case when new block(that we got from a different node) create Split Brain issue we use rollback_block to rollback the chain to the previous block and merge the new longest chain(code do not support multi level split brain, as it almost impossible in the real world)
There also some tests to verify that blockchain code works normaly.
Now we need to make this tun concurrently
Creating full-node with FastApi
I used FastApi as it fast, simple, use asyncio and can build OpenApi schema and debug UI from code(which is awesome).
To wrap some additional functionality around blockchain code i added some API layer between node and blockchain.
Here we have 3 async tasks: mine, broadcast and sync_data.
We should run mining as a separate process, but this will add more code, so right now it just running in the same thread, which is ok for the test. Broadcast is used to spread data across known nodes. And Data Sync getting blockchain from the node on start or if get outperforming block.
Mining are running without any stop, block after block, if we dont add any Txs then it will have only coinbase transaction.
If we get a new block before mining of the block ends we stop mining and proceed with mining from the new block.
All duplicates Tx, Blocks removed and not broadcasted to the network.