It's a very fun historical article, but it's no longer worth it on modern CPUs, ...

enriquto · on May 25, 2020

It's more a comment about the syntax of the C switch statement than about micro-optimization.

klyrs · on May 25, 2020

There's two kinds of C programmers: those familiar with Duff's device, and those who wouldn't expect Duff's device to even compile.

AdmiralAsshat · on May 26, 2020

I'm familiar with it and I still don't think it should compile...

burfog · on May 26, 2020

I'm disappointed that gcc's nested function extension doesn't let me use a switch to have a second entry point for the inner function.

I also can't seem to get there with a plain goto. I can't jump out either. Probably the computed goto extension will do the job, but that is cheating.

marvy · on May 26, 2020

Using computed goto would (probably) make the code compile, but it wouldn't actually work. That's would be heading directly into undefined behavior territory (aka land of the nasal demons).

To give just one or two examples of why this is doomed: imagine you want to jump "inward". If the inner function takes any arguments, you haven't passed them! And if you jump inward, and the inner function then hits a return statement, do you return to the outer function, or to its caller?

And after you resolve all such semantic issues, you need to actually implement all this, without slowing down code which doesn't use this feature.

And all this complexity doesn't buy you much because you have two other options:

1. just call the function to jump inwards, and simply return to jump out.

2. Don't bother with nested functions, and just inline the code.

marvy · on May 26, 2020

So it turns out I'm kind of wrong! My arguments for jumping "inward" still seem pretty valid, but they don't generalize to jumping "outward". And indeed, GCC actually allows this, but you need to use the right syntax:

https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html

enriquto · on May 26, 2020

> There's two kinds of C programmers: those familiar with Duff's device, and those who wouldn't expect Duff's device to even compile.

There's also the C programmers that come from assembler for whom the construction expressed by Duff's device is very natural. They would have a hard time believing that somebody find it surprising. For those people, it is obvious that the code should compile and its meaning is clear, as can be mapped directly to the compiled code.

tedunangst · on May 25, 2020

Sounds like a benchmark is in order!

marvy · on May 25, 2020

What's not worth it? Loop unrolling? Something else?

chrisseaton · on May 25, 2020

Does duff's device create irreducible control flow? I couldn't tell you off the top of my head but seems likely and if it does that's a good reason to avoid as some modern compilers do not like irreducible control flow.

tom_mellior · on May 25, 2020

Yes, it creates a loop with multiple entry points, which is irreducible. (I think it's even one of several equivalent definitions of irreducibility.)

Symmetry · on May 26, 2020

Putting the code into Godbolt the control flow

      switch (count % 8) {

turns into

          jmp     [QWORD PTR .L4[0+rdx*8]]

just like you'd do if you were doing the assembly by hand.

chrisseaton · on May 26, 2020

Yes that’s how gcc implements a switch. But what’s they got to do with irreducibility?

throwaway_pdp09 · on May 25, 2020

'irriducible'?

chrisseaton · on May 25, 2020

It comes from 'reduce' - I think it's spelt 'irreducible', but maybe you're American and I'm British.

throwaway_pdp09 · on May 25, 2020

Made me smile. No, I'm questioning the meaning here. I have a vague inkling it's a property of the dataflow graph of a program, I'm just struggling to remember what. Is it literally that repeated collapses of basic node types (if/while/sequence) will end up with a single node? Basically 'does it properly nest'? Or is it something else? (and I'm a brit too BTW)

chrisseaton · on May 25, 2020

Yes it means properly-nested, or really the same thing as 'structured' as in 'structured programming.' I'm sure there's some graph-theoretic definition but I don't know it off the top of my head.

x3blah · on May 25, 2020

https://raw.githubusercontent.com/lemire/fastbase64/master/s...

m463 · on May 26, 2020

modern cpus rarely do i/o that way anymore