Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's helpful to understand that the algorithm wasn't designed the way it's presented in this illustration, and consists of somewhat discrete components.

It's an iterated design, like a block cipher, meaning that it's built around a simple round function that's repeated a bunch of times on each input, rather than a super-complicated function run once.

It belongs to a large, important family of cryptographic hashes called Merkle-Damgård, which pads messages to a fixed block size, chunks them into blocks, feeds and feeds each block to an iterated "compression function" (the heart of the hash) that takes a block and the chained n-1th compression value and spits out the nth compression value. So a lot of the mechanism in this illustration is just the vanilla mechanics of an MD hash.

Iterated cryptographic designs generally have "schedule" or "expansion" functions that take the (small) input to the iterated function and "expand" it so that not all the iterations are working on exactly the same data; in the same way, a block cipher will have a "key schedule" that expands your 128-bit key into distinct "round keys" for each round. Message and key schedules, to me, feel like tiny little ciphers embedded in the larger cipher, and they're yet more of the complexity you're looking at, but can be studied independently (the SHA2 message schedule is apparently a big part of what makes it difficult to attack).

When you see magic numbers in a block cryptography design like this, two relatively safe bets you can make is that they're either "nothing up my sleeve" numbers chosen from e.g. mathematical constants, just for the sake of avoiding arguments, or that they're the product of statistical testing to maximize things like diffusion or minimize things like linear or differential characteristics.

With all block cipher cryptography one goal you always have is introducing nonlinearity; you can accidentally design a simple iterated function that is actually linear, and that cipher will be solvable simply with Gaussian elimination. People have shown me CTF levels that, for instance, linearized the AES round function. So when you see particular operations chained together in a cipher/hash design, keep in mind that they're balancing goals of (1) ensuring nonlinearity, (2) maximizing diffusion, the rate at which a single-bit change in the message totally scrambles the output in the shortest number of rounds, (3) optimizing metrics to avoid differential and linear cryptanalysis, (4) maximizing performance on target hardware.

As was suggested downthread: a good way to come at this stuff is to start with MD4, and then read about MD5 (vs. MD4), then SHA1, and then take a closer look at SHA2. This stuff didn't get figured out all at once, and all these hashes are related. You might find MD4 easier to get your head around.

For the linear and differential cryptanalysis stuff, which is surprisingly (given its age) important in cipher/hash design, a great starting point is the Heys tutorial, which is built around worked examples of both attacks:

https://ioactive.com/wp-content/uploads/2015/07/ldc_tutorial...



You might find MD4 easier to get your head around.

You can even generate collisions for it by hand calculation:

https://www.iacr.org/archive/fse2007/45930331/45930331.pdf


What a great question and what an excellent answer. Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: