A hash function is a series of mathematical steps or algorithms that you

can perform on some input data, resulting in a fingerprint, or digest, or

simply, a hash. There are basic hash functions (not used in blockchains)

and cryptographic hash functions (used in blockchains).

We’ll need to understand basic hash functions before moving to

cryptographic hash functions.

Basic Hash Function

A really basic hash function might be ‘Use the first character of the input’.

So using this function you’d get: Hash(‘What time is it?’) => ‘W’

The input to this function is ‘What time is it?’ and is sometimes called the

preimage or the message.

The output of this function is ‘W’ and is called the digest, the hash value,

or simply the hash.

Hash functions are deterministic because the output is determined by the

input. If a function is deterministic, it always produces the same output

for any given input. All mathematical functions are deterministic (adding,

multiplying, dividing, etc).

Cryptographic Hash Functions A cryptographic hash function is

special and has some characteristics that makes it useful in

cryptography and for cryptocurrencies, as we will see later.

Wikipedia70 states that the ideal cryptographic hash function

has five main properties (my comments in parentheses): 1. It

is deterministic so the same message always results in the

same hash 2. It is quick to compute the hash value for any

given message (you can easily go ‘forwards’) 3. It is not

feasible to generate a message from its hash value except by

trying all possible messages (you can’t go ‘backwards’) 4. A

small change to a message should change the hash value so

extensively that the new hash value appears uncorrelated with

the old hash value (a small change makes a big difference) 5.

It is not feasible to find two different messages with the same

hash value (it is hard to create a hash clash) What does this

mean? The combination of properties 2 (you can easily go

‘forwards’) and 3 (you can’t go ‘backwards’) means that

cryptographic functions are sometimes called ‘trapdoor

function’. It is easy to create a hash from a message, but you

can’t re-create the input from the hash. Nor can you guess or

infer what the message may be by looking at the hash (property

4). The only way to go backwards is to try every possible

combination of inputs and see if the hash value matches the

one you are trying to reverse. This is called a brute force attack.

So would our previous hash function (‘Use the first character’) be a good

cryptographic hash function? Let’s see: 1. Yes, it is deterministic. ‘What

time is it?’ always hashes to ‘W’.

2. Yes, it is quick to compute the output, you simply take the first

character.

3. Yes, by knowing only ‘W’ it is not feasible to guess the original

sentence (but see 5).

4. No, a small change in the message doesn’t necessarily change the

output. ‘What time is at?’ also hashes down to ‘W’.

5. No, we can easily create loads of inputs that will all hash down to the

same output. Anything starting with ‘W’ will work.

So our earlier hash function is no good as a cryptographic hash function.

So what is a good cryptographic hash function? There are some

established industry standard cryptographic hash functions that meet all

of these criteria. They have names like MD571 (Message Digest) or SHA-

256 (Secure Hash Algorithm), and they have an additional benefit in that

their output is usually of a fixed length. This means that whatever you use

as an input to the hash function, whether it is a sentence, a file, a hard

drive, or an entire data centre, you will always get a short digest back.

Here is the kind of output you get:

You can even try this on your computer. If you have a Mac, run the

Terminal application and type: md5 -s “What time is it?”

or

echo “What time is it?” | shasum -a 256

You will see that your results are the same as mine. Of course, that is the

whole point in a cryptographic hash—it is deterministic.

If you change the input slightly, you get a very different result:

Hash functions can be used for proving that two things are the same

without revealing the two things. For example, let’s say that you want to

make a prediction and don’t want others to know the prediction, but you

want to be able to reveal the prediction later. You’d write the prediction

down privately, hash it, and display the hash to your audience. People can

see that you’ve committed to a prediction but can’t back-calculate what

your prediction is. Later, you can reveal the prediction, and others can

calculate the hash and see that it matches the hash you published.

Cryptographic hashes, the output from cryptographic hash functions, are

used in Bitcoin in a number of places: • In the mining process

• As identifiers for transactions

• As identifiers for blocks, in order to link them in a chain •

Ensuring that data tampering is immediately evident