Hashing is very different from encryption even though so many people believe it is an encryption protocol. Yes it does scramble the data, so from that aspect it is an encryption program, but the big difference between hashing and encryption, is that Hashing is mathematically impossible to reverse. Now I’m not the kind of person that believes that something is impossible, but I’ve had to surrender myself to this concept. This is not to say that it cannot be broken…. because it can… it just cannot be reversed.
If we look at a hashing algorithm such as MD5, which is 128-bits, it’s role and purpose is to take any data you wish, and turn it INTO 128-bits. Now if you imagine I have a manual of 800 pages and I was to run it through the MD5 algorithm, the output would be 128-bits…. that’s 128 1’s and 0’s… how is it possible to turn 128 1’s and 0’s back into a 800 page manual? 128-bits might be big enough to indicate to us what language the book was written in, but what font type? Font size? Where is the page numbering? Are there pictures? Etc…
So if it’s impossible to reverse something that has been hashed, what is it used for? The simple answer is Integrity. Integrity is there for us to prove that the data has not been tampered with, or changed in any way, and to proof it came from the correct person.
As an example, If I was to send you an e-mail that said, “please pay John Doe $100”, and John Doe was to intercept that e-mail and changed it to say “please pay John Doe $1000” I would not be too happy when my account was debited with the wrong amount. So what if instead of just sending you a clear message, I was to take the original data “please pay John Doe $100” and then I was to take a secret word that only you and I knew about like “secretpassword” and hashed both values together. This would result in a 128-bit hash value (Result1), that I would then attach to the original message “please pay John Doe $100”
When you receive the e-mail it will have the original message “please pay John Doe $100” and it will have ‘Result1’. You will take the original message, and take the password that only you and I know about “secretpassword” and hash them together. You would end up with a result (Result2). If ‘Result1’ is equal to ‘Result2’ then the message is correct and has not been tampered with. If the two input fields “please pay John Doe $100” and “secretpassword” are used on both sides, the result has to be the same…. If the result is not the same, the two inputs used on my side are not the same as the two input fields on your side. Assuming we both have used the same password, then the only that could have changed is the message, proving the message has been tampered with, and we can throw it away.
Hashing is also used extensively in passwords for authentication. When I log onto my computer in the morning, I type my username “user” and I type my password “password”. My computer sends my username to my Domain Controller in clear text (no encryption or hashing), and sends the HASH of my password not the actual password! My Domain Controller knows what my password is supposed to be, so it checks my user account in its database, retrieves what my password should be, then it hashes my password that it retrieved from its database and compares that with what I sent it. If the two results are the same, I typed my password in correctly, if they are different, I got my password wrong. This is really good from a security point of view, as if someone was to ‘listen in’ on my conversation to try receive my password as it’s sent to the Domain Controller, all they would get is the Hash value, and not my password.
Note: When computers hash passwords they also include extra information in the equation such as the session number, which prevents the Hash from been re-played by someone else.