A hash function is a series of mathematical steps or algorithms that you
can perform on some input data, resulting in a fingerprint, or digest, or
simply, a hash. There are basic hash functions (not used in blockchains)
and cryptographic hash functions (used in blockchains).
We’ll need to understand basic hash functions before moving to
cryptographic hash functions.
Basic Hash Function
A really basic hash function might be ‘Use the first character of the input’.
So using this function you’d get: Hash(‘What time is it?’) => ‘W’
The input to this function is ‘What time is it?’ and is sometimes called the
preimage or the message.
The output of this function is ‘W’ and is called the digest, the hash value,
or simply the hash.
Hash functions are deterministic because the output is determined by the
input. If a function is deterministic, it always produces the same output
for any given input. All mathematical functions are deterministic (adding,
multiplying, dividing, etc).
Cryptographic Hash Functions A cryptographic hash function is
special and has some characteristics that makes it useful in
cryptography and for cryptocurrencies, as we will see later.
Wikipedia70 states that the ideal cryptographic hash function
has five main properties (my comments in parentheses): 1. It
is deterministic so the same message always results in the
same hash 2. It is quick to compute the hash value for any
given message (you can easily go ‘forwards’) 3. It is not
feasible to generate a message from its hash value except by
trying all possible messages (you can’t go ‘backwards’) 4. A
small change to a message should change the hash value so
extensively that the new hash value appears uncorrelated with
the old hash value (a small change makes a big difference) 5.
It is not feasible to find two different messages with the same
hash value (it is hard to create a hash clash) What does this
mean? The combination of properties 2 (you can easily go
‘forwards’) and 3 (you can’t go ‘backwards’) means that
cryptographic functions are sometimes called ‘trapdoor
function’. It is easy to create a hash from a message, but you
can’t re-create the input from the hash. Nor can you guess or
infer what the message may be by looking at the hash (property
4). The only way to go backwards is to try every possible
combination of inputs and see if the hash value matches the
one you are trying to reverse. This is called a brute force attack.
So would our previous hash function (‘Use the first character’) be a good
cryptographic hash function? Let’s see: 1. Yes, it is deterministic. ‘What
time is it?’ always hashes to ‘W’.
2. Yes, it is quick to compute the output, you simply take the first
3. Yes, by knowing only ‘W’ it is not feasible to guess the original
sentence (but see 5).
4. No, a small change in the message doesn’t necessarily change the
output. ‘What time is at?’ also hashes down to ‘W’.
5. No, we can easily create loads of inputs that will all hash down to the
same output. Anything starting with ‘W’ will work.
So our earlier hash function is no good as a cryptographic hash function.
So what is a good cryptographic hash function? There are some
established industry standard cryptographic hash functions that meet all
of these criteria. They have names like MD571 (Message Digest) or SHA-
256 (Secure Hash Algorithm), and they have an additional benefit in that
their output is usually of a fixed length. This means that whatever you use
as an input to the hash function, whether it is a sentence, a file, a hard
drive, or an entire data centre, you will always get a short digest back.
Here is the kind of output you get:
You can even try this on your computer. If you have a Mac, run the
Terminal application and type: md5 -s “What time is it?”
echo “What time is it?” | shasum -a 256
You will see that your results are the same as mine. Of course, that is the
whole point in a cryptographic hash—it is deterministic.
If you change the input slightly, you get a very different result:
Hash functions can be used for proving that two things are the same
without revealing the two things. For example, let’s say that you want to
make a prediction and don’t want others to know the prediction, but you
want to be able to reveal the prediction later. You’d write the prediction
down privately, hash it, and display the hash to your audience. People can
see that you’ve committed to a prediction but can’t back-calculate what
your prediction is. Later, you can reveal the prediction, and others can
calculate the hash and see that it matches the hash you published.
Cryptographic hashes, the output from cryptographic hash functions, are
used in Bitcoin in a number of places: • In the mining process
• As identifiers for transactions
• As identifiers for blocks, in order to link them in a chain •
Ensuring that data tampering is immediately evident