Encoding vs Encryption vs Hashing vs Obfuscation

Introduction
Encoding, Encryption, Hashing, Obfuscation all are applied to data for some or another purpose and all of them create significant confusion when to use which one. Lets clear out the confusion so that we can be confident while using these terms in our next tech discussion with our clients/colleagues.
Encoding
Definition: Process of changing data from one format to another format using a schema.
Purpose: To transform data so that it can be properly (and safely) consumed by a different type of system.
Goal: Here the goal is not to keep the information secret, rather to ensure that the encoded data to be properly consumed.
Reversible: Yes, the recipient has to know the schema used for encoding to reverse it.
Ex: ASCII, BASE64, UNICODE
Encryption
Definition: Process of transforming data into an unreadable format, so that no attacker or hacker able to steal or manipulate the data and only the intended person can convert the unreadable to readable.
Purpose: To maintain security & confidentiality of data.
Goal: To ensure the data cannot be consumed by anyone other than the intended recipient.
Reversible: Yes, the recipient has to know the key used for encryption and the algorithm.
Ex: When data is sent to a website over HTTPS it is encrypted using the public key type.
Based on the type of keys used encryption classified as
- Symmetric encryption - AES, DES (Single key used to encrypt and decrypt the data)
- Asymmetric encryption - RSA, ECC, Diffie–Hellman (two keys used - public key for encryption and private key for decryption)
Hashing:
Definition: Process of transforming data into a fixed-length mathematical summary.
Purpose: To make sure that the data is not tampered with in between data transfer.
Goal: To maintain the integrity of the data.
Reversible: No, (almost) impossible to generate the data from the hash value.
Ex: MD5, SHA256
Obfuscation:
Definition: Transform data to make it harder to understand.
Purpose: To make it harder for one entity to understand.
Goal: To make it more difficult to attack or copy (by Human).
Reversible: Yes, takes time but can be done.
Ex: Progaurd, DexGuard, JS Obfuscator
As we learn these terms lets put them to some real time use-cases
Securing the source code
Obfuscation is mostly used to secure the source code from being copied by others, We at Creditvidya use obfuscation for all the apps that we publish to the play store and all the jars that we develop and share with clients.
Note: Obfuscation makes the process of reverse engineering difficult but not impossible. Obfuscation makes it harder for humans but can easily be understood by machines (computers)
Sharing files
Data transfer uses hashing technique to maintain the integrity of the data file. A file along with hash will be sent over the network, once the file is downloaded at the server-side, the hash will be generated for it and compared with the client’s hash if both of them are the same then the file has not tampered.
Hash generated using a hash function should follow below rules
- The same input should provide the same hash value all the times
- Multiple inputs should not produce the same hash value.
- Even with the smallest modification to input, hash value should vary drastically from the previous hash. (this is called Avalanche Effect)
Encoding can also be used in conjunction with hashing not only to maintain the integrity but also to increase the performance, A Encoded file will be less in size when compared to the original, encoding removes redundancies from data, the size of your files will be a lot smaller. This results in faster input speed and lowers storage space on the disk.
Storing Passwords
The authentication process (login) uses hashing technique and is recommended to store passwords after salting in the database.
Salting: Process of adding a random string to a password that only the authentication process system knows
Sending confidential data
If your APIs or Files that are shared across systems (be it internal or third-party) are having PII (Personally identifiable information) or sensitive data (credit card numbers, passwords) it is recommended to use an encryption mechanism. We at creditvidya use a hybrid encryption mechanism (Symmetric encryption for data and Asymmetric encryption for the Key) to make it impossible for attackers to gain access.
Hope this clears the confusion while using the aforementioned terminology
Recap
|
Encoding |
Encryption |
Hashing |
Obfuscation |
Purpose |
Accessibility of data |
Confidentiality of data |
Integrity of data |
Security of data (to some extent) |
Key Involved |
No |
Yes |
No |
No |
Reversible |
Yes |
Yes, you must have the encryption key used |
No |
Yes (but hard) |
Data after conversion readable |
No |
No |
No |
No (hard to read) |