- Hashing is a technique used for storing and retrieving keys in a rapid manner.
- Hashing can also be defined as a concept of distributing keys in an array called a hash table using a predetermined function called hash function.
Hash Table
- Hash table is a table (array) where we store the original string. Index of the table is the hashed key while the value is the original string.
- The
size of hash table is usually several orders of magnitude lower than the total
number of possible string, so several string might have a same hashed-key. example of hash fuction:
There
are many ways to hash a string into a key for
constructing hash functions:
- Mid-Square Square the string/identifier and
then using an appropriate number of bits from the middle of the square to
obtain the hash-key.
Steps:1. Square the value of the key. (k2)2. Extract the middle r bits of the result obtained in Step 1.
Function : h(k) = s
k = key
s = the hash key obtained by
selecting r bits from k2
2. Division
Divide the string/identifier by
using the modulus operator.
It’s the most simple method of
hashing an integer x.
Function:
h(z) = z mod M
z = key
M
= the value using to divide the key, usually using a prime number, the table
size or the size of memory used.
3. Folding
The Folding method works in two
steps :
a. Divide the key value into a number
of parts where each part has the same number of digits except the last part
which may have lesser digits than the other parts.
b. Add the individual parts. That is
obtain the sum of part1 + part2 + ... + part n. The hash value produced by
ignoring the last carry, if any.
4. Digit Extraction
A predefined digit of the given number is considered as the hash address.
5. Rotating Hash
Use any hash method (such as division or mid-square method)
After getting the hash code/address from the hash method, do rotation.
Rotation is performed by shifting the digits to get a new hash address.
Collision
What
happened if we want to store these strings using the previous hash function (use the first character of each string)•define, float, exp, char, atan, ceil, acos, floor.
There are several strings which have the same hash-key, it’s float and floor (hash-key:5), char and ceil (hash-key: 2). It’s called a collision. How can we handle this? there are 2 way:
1. Linear
Probing
Search
the next empty slot and put the string there.
Linear
probing has a bad
search complexity if there are many collisions.
example:
h[
]
|
Value
|
Step
|
0
|
atan
|
1
|
1
|
acos
|
2
|
2
|
char
|
1
|
3
|
define
|
1
|
4
|
exp
|
1
|
5
|
float
|
1
|
6
|
ceil
|
5
|
7
|
floor
|
3
|
…
|
In
chaining, we store each string
in a chain (linked list).
So
if there is collision, we only need
to iterate on that chain.
example:
h[
]
|
Value
|
0
|
atan
à
acos
|
1
|
NULL
|
2
|
char
à
ceil
|
3
|
define
|
4
|
exp
|
5
|
float
à
floor
|
6
|
NULL
|
7
|
NULL
|
…
|
Blockchain
A blockchain is also a distributed data structure but its purpose is completely different.
Think of it as a history, or a ledger. The purpose is to store a continuously-growing list of record without the possibility of tampering and revision.
It is mainly used in the bitcoin currency system for keeping track of transactions. Its property of being tamper-proof let everybody know the exact balance of an account by knowing its history of transaction.
In a blockchain, each node of the network stores the full data. So it is absolutely not the same idea as the DHT in which data are divided among nodes. Every new entry in the blockchain must be validated by a process called mining whose details are out of the scope of this answer but this process insure consensus of the data.
The two structures are both distributed data structure but serve different purposes. DHT aims to provide an efficient (in term of lookup time and storage footprint) structure to divide data on a network and blockchain aims to provide a tamper-proof data structure.