Data hashing

The core of the method

Hashing is converting data array of any length or volume into resulting bit-string of fixed size. These transformations are also called hash-functions or compression functions. Values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes. Popular hash-algorithms are: MD5 (Message Digest version 5) and SHA-1 (Standard Hash Algorithm). Newer and stronger hashing algorithms are WHIRLPOOL, SHA-512, SHA-384, HAVAL, Tiger (2) etc.

Hashing algorithms are irrevocable one-way functions. In other words, you can convert plain text into hash code and cannot convert hash code into plain text back. Hash ensures data integrity; e.g. information was not modified while transmitting.

How it works

Hash is a sort of checksum of data or its fingerprint. Like finger prints are unique for every human being – hash code is unique for every unique message or data array. This feature is used to quickly check large data volumes for changes or updates. It may seem to human eye that there is no change in the message while hashes prove the contrary – someone did work on it.

For example:

Original Message

The quick brown fox jumps over the lazy dog

9e107d9d372bb6826bd81d3542a419d6

Looks identical?

The quick brown fox jumps оver the lazy dog

c6384847658ab31c1d6c7be43571a6e9

As you can see visually messages look alike. Simple replacing of one of the “o” letters to, let’s say, Cyrillic “о” won’t even hint on change (guess which one was changed!). For such cases hashes come into play – they clearly shows that messages are not identical at the bottom.

Hashing benefits

Hash code typically is smaller of its original plain text. For this reason, to compare two large data volumes for equivalence it compares their checksums, which were calculated using same algorithm for both data arrays. It’s a way much faster than comparing every character and every single bit in two data arrays.

Two edge features of hashing – (1) impossibility to get plain text from hash code and (2) unique hash for every new bit of information – are used for authentication and creation of digital signatures.