NIST Draft: The KMAC, TupleHash, and FPH Functions [pdf]

tptacek · on June 22, 2016

In case you're wondering, these are new NIST standards for things you can do with SHA-3 beyond simply using it as a hash.

KMAC is SHA-3's alternative to HMAC. You can use HMAC with SHA-3, but it's wasteful, since part of the point of SHA-3 is to make the HMAC construction (which applies the hash function twice to achieve separation between the key and the data being authenticated) unnecessary.

TupleHash is a construction that reliably hashes an array of byte strings, some entries of which might be the empty string.

FPH is a construction that exploits parallelism to more quickly hash huge strings.

niftich · on June 22, 2016

These slides from 2015 [1] detail some of the rationale that went into designing these functions.

Specifically, slide 13 says these 'Domain Separated' functions are "completely unrelated", ie. they can't be computed from each other. According to them "this should make it a little harder to shoot yourself in the foot with these functions".

It appears the designers are realizing that assembling crypto building blocks in ways that remain secure and make sense is difficult, so they're providing ready-made versions made for specific uses.

[1] http://csrc.nist.gov/news_events/cif_2015/research/day1_rese...

tptacek · on June 22, 2016

Two interesting bits of context:

* Being usable in a straightforward way as a MAC --- something SHA-2 isn't, like MD5 and SHA-1 --- was one of the design criteria for the SHA-3 contest.

* This is also part of a heartening trend in crypto research towards "misuse-resistant" cryptography. The "misusers" in MR crypto are... wait for it... application developers. :)

A good example of a crypto primitive designed specifically for misuse resistance is Rogaway's SIV mode, which is making a comeback in GCM-SIV. The most popular example of misuse-resistant crypto is of course the Nacl/Sodium library.

zokier · on June 22, 2016

The others I understand but tuplehash seems bit odd. It essentially just defines a naive serialization format for tuples, but I would have thought that generally it would be left to applications to figure it out how to get a hashable blob from their data structures.

Is there some large application for tuplehashes that explain its inclusion in a standard? Because I don't know any.

zAy0LfpBZLC8mAC · on June 22, 2016

Because people often don't realize that if you want to hash a bunch of pieces of data in a cryptographically secure fashion, you have to serialize them (with an injective function). People tend to think that just somehow feeding it all into the hash function will take care of it, so simple concatenation is common, with occasionally disastrous consequences.

lotharrr · on June 22, 2016

In particular, there's all sorts of mischief an attacker can do when serialized(A,B)==serialized(C,D) but A!=C and B!=D. I've seen OAuth implementations do this, allowing someone to submit the signature from a "good" message along with "bad" arguments (of their own choosing), with a very different meaning.

An impatient application developer might just use hash(concat(args)). Raw concatenation is serialization, but it's ambiguous: s("manage", "able") == s("man", "ageable"). The safest approach is to ensure that the serialization function is reversible (even though you're just going to throw it into a hash, which is obviously not). When your arguments are arbitrary bytes, you either use a delimiter character (and escape any places where it appears naturally), or prefix each string with (a fixed-length representation of) its length. Either way, if you could conceivably parse the serialized form back into the original arguments, then you're safe from tricky format-confusion attacks.

zAy0LfpBZLC8mAC · on June 22, 2016

That's what "injective" means :-)

Also, I think that serialization is actually injective by definition. If you can't deserialize it, it's not a serialization, it's just an arbitrary sequence of symbols that you maybe happen to have generated based on some non-serial structure.

And actually, it's not just the safest way, it is the only way. Though it's true that the hash is not injective, it is collision resistant (based on current understanding), that is what makes it a cryptographic hash function. The only theoretical alternative to (injective) serialization would be to feed the data into a collision resistant function that takes tuple inputs, which by definition would be a cryptographic hash function, which would make it pointless to feed the result into another cryptographic hash function.

The solutions you suggest are good advice, though. Also, don't forget to tag message types if you use a key for multiple purposes, so one message cannot be replayed as another type of message.

jonstewart · on June 22, 2016

I have been so desperate for an expert-crafted standardized tree-hash. Hallelujah!

niftich · on June 22, 2016

Though it does not meet all your criteria -- it's not standardized by an institute -- BLAKE2 [1] also supports tree-hashing, is expert-crafted from a SHA-3 finalist, and has seen adoption in the last two years.

Notably, on recent architectures, it's faster than SHA-3, SHA-256, or even SHA-1.

[1] https://blake2.net/

jonstewart · on June 25, 2016

It is really the standardization that's the thing, as my application would be hashing of disk images for digital forensics purposes/evidence authentication purposes. BLAKE2 seems like a fine algorithm otherwise.

amluto · on June 22, 2016

Without actually comparing the spec to the Sakura paper [1]: are these instantiations of the Sakura scheme?

[1] http://keccak.noekeon.org/Sakura.pdf

niftich · on June 22, 2016

The FPH (Fast Parallel Hash) schemes are, as their core is CSHAKE, which is a user-customizable (parameterized) SHAKE, which uses the Sakura construction.

In the Sakura paper's conclusion, they state:

"In the current FIPS 202 draft, the SHA-3 extendable-output functions chosen by NIST adopt SAKURA for sequential hashing as a special case of tree hashing." Similarly, the FIPS 202 document refers to SAKURA when it talks about the XOFs SHAKE128 and SHAKE256:

"The two SHA-3 XOFs are also specified in a manner that allows for the development of dedicated variants. Moreover, the SHA-3 XOFs are compatible with the Sakura coding scheme for tree hashing in order to support the development of parallelizable variants of the XOFs, to be specified in a separate document." [1]

[1] http://www.nist.gov/customcf/get_pdf.cfm?pub_id=919061

zaroth · on June 22, 2016

But is it trustworthy?