172: Hashes Offer The Best Error Detection.
Take Up Code - A podcast by Take Up Code: build your own computer games, apps, and robotics with podcasts and live classes
Categories:
If you receive some information, how do you know if it’s intact or has been changed? You use a hash in a similar way to a checksum by including it along with the data. If you want to verify that the data is correct, recalculate the hash and compare. There are many ways to calculate a hash. This episode will describe two broad categories of hash functions, non-cryptographic hash functions and cryptographic hash functions. You can create or use hash functions that avoid accidental collisions or sometimes don’t worry much at all about collisions. I was at a pharmacy recently and noticed behind the counter was space to hold prescriptions that were waiting to be picked up. Each prescription was placed in a paper bag with what looked like the first two letters of the person’s last name. That’s a hash function. Not very complicated, but still a hash function. Two letters gives you 26 times 26 for a total of 676 possible hash values. There’s many reasons this is not a very good hash function for general use including the fact that it’s not very random. This means that some hash values will be more common than others. And I don’t know anybody who’s last name begins with double Q’s. But for the pharmacy, using the first two letters of a person’s last name works great. It’s a good hash function for their needs. There’s no need for anything more complicated or elaborate. Other hash functions could be more complicated and make use of thousands or millions or even more possible hash values. These are all non-cryptographic hash functions because while they can help detect accidental errors, they won’t help protect your information against attacks done on purpose. A non-cryptographic hash function is designed to meet your application’s needs assuming you don’t have to worry about your hash values being used by an attacker. For that, you need a cryptographic hash function. Listen to the whole episode for more information about well-known cryptographic hash functions MD5, SHA-1, SHA-2 (including SHA-256 and SHA-512), and SHA-3. You can also read the full transcript below. Transcript You might want to listen to episode 43 about hash tables. It’s a different way to use hashes but I explained several properties of hashes that apply even here to error detection. I’ll assume you already know what I’m talking about when I refer to hash collisions. You use a hash in a similar way to a checksum by including it along with the data. If you want to verify that the data is correct, recalculate the hash and compare. There are many ways to calculate a hash. This episode will describe two broad categories of hash functions, non-cryptographic hash functions and cryptographic hash functions. Before that, though, why are hash functions sometimes better than checksums? One reason is checksums can’t detect if extra zero values are inserted or if some portions of the data are swapped. Let’s say you have a simple checksum that really does add bytes. If you have a short message with the values 5 and 3, then the sum is 8. But what’s 5 plus 3 plus zero? Still 8. What’s 5 plus 0 plus 3? Still 8. What about 3 plus 5? Still 8. Depending on how you’re calculating the hash, it could detect these types of errors. You can create or use hash functions that avoid accidental collisions or sometimes don’t worry much at all about collisions. I was at a pharmacy recently and noticed behind the counter was space to hold prescriptions that were waiting to be picked up. Each prescription was placed in a paper bag with what looked like the first two letters of the person’s last name. That’s a hash function. Not very complicated, but still a hash function. Two letters gives you 26 times 26 for a total of 676 possible hash values. There’s many reasons this is not a very good hash function for general use including the fact that