Bitcoin Parser: Understanding Raw Transactions
Learn how to parse raw Bitcoin transactions manually — from hex to human-readable
📖 What is a Raw Bitcoin Transaction?
A raw Bitcoin transaction is the actual data that gets broadcast to the network and stored on the blockchain. It's not human-readable — it's a string of hexadecimal numbers that represent:
- Version: Which transaction format is being used
- Inputs: Which previous outputs are being spent
- Outputs: Where the bitcoins are being sent
- Locktime: The earliest time the transaction can be added to a block
When you use a wallet, it hides all this complexity. But understanding the raw structure helps you:
- Debug transaction issues
- Understand how Bitcoin actually works
- Build custom transaction tools
- Verify what your wallet is doing
🏗️ Transaction Structure Overview
Bitcoin uses little-endian for most integer fields. That means the bytes are stored in reverse order. For example, 01000000 in little-endian means 1.
Currently 1 or 2. Determines which rules apply.
Encoded as a variable-length integer (varint).
The UTXO being spent comes from this transaction.
0-indexed position of the UTXO.
Variable-length integer.
Contains DER signature + sighash type + public key.
Used for replace-by-fee (RBF) and locktime.
ffffffff = disabled.
Where the bitcoins are being sent.
1 BTC = 100,000,000 satoshis.
Variable-length integer.
76a914{20-byte hash}88ac = standard Bitcoin address.
0 = no locktime. Can be a block height or UNIX timestamp.
🐍 Bitcoin Transaction Parser (Python)
This parser reads a raw transaction hex string and breaks it down into its components:
def parse_varint(data_bytes, offset):
"""
Parse a variable-length integer (varint) used in Bitcoin transactions.
Bitcoin uses a compact encoding for integers:
- If value < 0xFD (253): stored as a single byte
- If value <= 0xFFFF: stored as 0xFD followed by 2 bytes
- If value <= 0xFFFFFFFF: stored as 0xFE followed by 4 bytes
- Otherwise: stored as 0xFF followed by 8 bytes
Returns:
(value, bytes_consumed)
"""
fbyte = data_bytes[offset]
if fbyte < 0xFD:
return (fbyte, 1)
elif fbyte == 0xFD:
value = int.from_bytes(data_bytes[offset+1:offset+3], 'little')
return (value, 3)
elif fbyte == 0xFE:
value = int.from_bytes(data_bytes[offset+1:offset+5], 'little')
return (value, 5)
else:
value = int.from_bytes(data_bytes[offset+1:offset+9], 'little')
return (value, 9)
def parse_tx(tx):
"""
Parse a raw Bitcoin transaction hex string into a structured dictionary.
Transaction structure:
- Version (4 bytes, little-endian)
- Input Count (varint)
- For each input:
- Previous TX Hash (32 bytes, reversed for display)
- Output Index (4 bytes, little-endian)
- ScriptSig Length (varint)
- ScriptSig (variable bytes)
- Sequence (4 bytes, little-endian)
- Output Count (varint)
- For each output:
- Amount (8 bytes, little-endian, satoshis)
- ScriptPubKey Length (varint)
- ScriptPubKey (variable bytes)
- Locktime (4 bytes, little-endian)
"""
hbytes = bytes.fromhex(tx)
offset = 0
# Version (4 bytes, little-endian)
version = int.from_bytes(hbytes[offset:offset+4], 'little')
offset += 4
# Input Count (varint)
input_count, vbytes = parse_varint(hbytes, offset)
offset += vbytes
inputs = []
for _ in range(input_count):
# Previous Transaction Hash (32 bytes, reversed for display)
prev_tx_hash = hbytes[offset:offset+32][::-1].hex()
offset += 32
# Output Index (4 bytes, little-endian)
output_index = int.from_bytes(hbytes[offset:offset+4], 'little')
offset += 4
# ScriptSig Length (varint)
scriptSig_len, vbytes = parse_varint(hbytes, offset)
offset += vbytes
script_bytes = hbytes[offset:offset+scriptSig_len]
# Parse the ScriptSig (signature + public key)
pos = 0
if scriptSig_len == 0:
der_sign, sighash_type, pub_key = None, None, None
elif script_bytes[0] == 0x30:
# DER signature starts with 0x30
der_len = script_bytes[pos+1]
der_sign = script_bytes[pos:pos+der_len+2]
pos += der_len + 2
else:
# Handle other cases (rare)
pos += 1
der_len = script_bytes[pos+1]
der_sign = script_bytes[pos:pos+der_len+2]
pos += der_len + 2
# Sighash type (1 byte: 0x01 = SIGHASH_ALL)
sighash_type = script_bytes[pos]
pos += 1
# Public key length and value
pubkey_len = script_bytes[pos]
pos += 1
pub_key = script_bytes[pos:pos+pubkey_len]
offset += scriptSig_len
# Sequence number (4 bytes, little-endian)
sequence = int.from_bytes(hbytes[offset:offset+4], 'little')
offset += 4
inputs.append({
"prev_hash": prev_tx_hash,
"index": output_index,
"scriptSig_len": scriptSig_len,
"der_sign": der_sign.hex() if der_sign else None,
"sighash_type": sighash_type,
"pub_key": pub_key.hex() if pub_key else None,
"sequence": sequence
})
# Output Count (varint)
output_count, vbytes = parse_varint(hbytes, offset)
offset += vbytes
outputs = []
for _ in range(output_count):
# Amount (8 bytes, little-endian, in satoshis)
amount = int.from_bytes(hbytes[offset:offset+8], 'little')
offset += 8
# ScriptPubKey Length (varint)
script_pubkey_len, vbytes = parse_varint(hbytes, offset)
offset += vbytes
# ScriptPubKey (locking script)
script_pubkey = hbytes[offset:offset+script_pubkey_len].hex()
offset += script_pubkey_len
outputs.append({
"amount_satoshis": amount,
"amount_btc": amount / 100000000,
"script_pubkey_len": script_pubkey_len,
"script_pubkey": script_pubkey
})
# Locktime (4 bytes, little-endian)
locktime = int.from_bytes(hbytes[offset:offset+4], 'little')
return {
"version": version,
"input_count": input_count,
"inputs": inputs,
"output_count": output_count,
"outputs": outputs,
"locktime": locktime
}
# Example usage
if __name__ == "__main__":
# Raw transaction hex (simplified for demonstration)
tx_hex = "0100000001a530bdca8a35b98eb5a62c196191b9782cb119a19c423f765c32a7d33877f8dd000000008a47304402200102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f2002202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f400141040102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40ffffffff0140420f00000000001976a914000000000000000000000000000000000000000088ac00000000"
result = parse_tx(tx_hex)
print("=" * 60)
print("BITCOIN TRANSACTION PARSER")
print("=" * 60)
print(f"Version: {result['version']}")
print(f"Input Count: {result['input_count']}")
print("-" * 60)
for i, inp in enumerate(result['inputs']):
print(f"INPUT #{i+1}:")
print(f" Previous TX Hash: {inp['prev_hash']}")
print(f" Output Index: {inp['index']}")
print(f" ScriptSig Length: {inp['scriptSig_len']}")
if inp['der_sign']:
print(f" DER Signature: {inp['der_sign'][:64]}...")
print(f" Sighash Type: {inp['sighash_type']} (SIGHASH_ALL)")
if inp['pub_key']:
print(f" Public Key: {inp['pub_key'][:64]}...")
print(f" Sequence: {inp['sequence']}")
print("-" * 60)
print(f"Output Count: {result['output_count']}")
for i, out in enumerate(result['outputs']):
print(f"OUTPUT #{i+1}:")
print(f" Amount: {out['amount_btc']} BTC ({out['amount_satoshis']} satoshis)")
print(f" ScriptPubKey Len: {out['script_pubkey_len']}")
print(f" ScriptPubKey: {out['script_pubkey']}")
print("-" * 60)
print(f"Locktime: {result['locktime']}")
print("=" * 60)
🔍 Understanding Each Field
Let's break down every component of a raw Bitcoin transaction. Each field has a specific purpose and follows strict rules.
1 or 2
01000000 = 1, 02000000 = 2
01 (1 input) or 02 (2 inputs)
SHA256(SHA256(previous_tx_data))
SegWit transactions have scriptSig_len = 0. The signature moves to the witness field (not parsed here).
0x30.
0x30 | length | 0x02 | r-length | r | 0x02 | s-length | s
SIGHASH_ALL (0x01). It means "I agree to this exact transaction — no changes allowed."
02 or 03 (even/odd y-coordinate), followed by 32-byte x-coordinate. More efficient.
04, followed by 32-byte x and 32-byte y coordinates. Older format.
00e1f50500000000Reverse bytes:
0000000005f5e100Hexadecimal value:
0x05F5E100Decimal value: 100,000,000 satoshis = 1 BTC
00e1f50500000000 = 100,000,000 sats (1 BTC)00ca9a3b00000000 = 1,000,000 sats (0.01 BTC)0065cd1d00000000 = 500,000 sats (0.005 BTC)0094357700000000 = 2,000,000 sats (0.02 BTC)
0x77359400 in hex, fits in 8 bytes).
76a914{20-byte pubkey hash}88ac — "Pay to Public Key Hash". Most common. Requires a signature + public key.
a914{20-byte script hash}87 — "Pay to Script Hash". Used for multisignature wallets.
0014{20-byte pubkey hash} — Native SegWit. Lower fees.
6a{data} — Provably unspendable. Used to store data on the blockchain.
0000e0ff = 16,711,680 (far in the future).
📊 Quick Reference Table
| Field | Size | Purpose | Typical Value |
|---|---|---|---|
| Version | 4 bytes | Transaction format | 01000000 (1) |
| Input Count | varint | Number of inputs | 01 |
| Previous TX Hash | 32 bytes | Points to previous UTXO | a530bdca... |
| Output Index | 4 bytes | Which output to spend | 00000000 |
| ScriptSig Length | varint | Unlocking script size | 8a (138) |
| DER Signature | ~70-73B | ECDSA proof of ownership | 30440220... |
| Sighash Type | 1 byte | What is signed | 01 (ALL) |
| Public Key | 33/65B | Your public key | 04010203... |
| Sequence | 4 bytes | Replacement/locktime | ffffffff |
| Output Count | varint | Number of outputs | 01 |
| Amount | 8 bytes | Value in satoshis | 40420f0000000000 |
| ScriptPubKey Length | varint | Locking script size | 19 (25) |
| ScriptPubKey | variable | Spending conditions | 76a914...88ac |
| Locktime | 4 bytes | Earliest spend time | 00000000 |
📏 Variable-Length Integers (Varint)
Bitcoin uses a compact encoding for integers to save space. This is called varint or compact size uint.
| Value Range | Encoding | Bytes Used |
|---|---|---|
| 0 to 252 | As a single byte | 1 |
| 253 to 65,535 | 0xFD + 2 bytes little-endian | 3 |
| 65,536 to 4,294,967,295 | 0xFE + 4 bytes little-endian | 5 |
| 4,294,967,296 to 2^64-1 | 0xFF + 8 bytes little-endian | 9 |
Most transactions have 1-2 inputs and outputs, so using 1 byte for counts saves space. Larger counts use more bytes only when necessary.
💻 Running the Parser
# Save the code to a file
python bitcoin_parser.py
# Or run it directly in Python
python -c "exec(open('bitcoin_parser.py').read())"
Replace the tx_hex variable with any raw transaction from a block explorer:
- Go to Mempool.space
- Click on any transaction
- Look for "Raw Transaction" or "Hex"
- Copy the hex and paste it into the code
📜 Understanding Scripts
ScriptSig (Unlocking Script)
Contains the signature and public key that "unlock" the UTXO. Format: <DER signature><sighash><public key>
ScriptPubKey (Locking Script)
Contains the conditions for spending. Common types:
| Type | Script | Meaning |
|---|---|---|
| P2PKH | 76a914{20-byte hash}88ac | Pay to Public Key Hash (standard address) |
| P2SH | a914{20-byte hash}87 | Pay to Script Hash (multisig, etc.) |
| P2WPKH | 0014{20-byte hash} | Pay to Witness Public Key Hash (SegWit) |
| OP_RETURN | 6a{data} | Provably unspendable (store data) |
🖥️ Live Transaction Parser
Paste any raw Bitcoin transaction hex below to see it parsed in real-time.