Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a very fun historical article, but it's no longer worth it on modern CPUs, where it actually harms performance.


It's more a comment about the syntax of the C switch statement than about micro-optimization.


There's two kinds of C programmers: those familiar with Duff's device, and those who wouldn't expect Duff's device to even compile.


I'm familiar with it and I still don't think it should compile...


I'm disappointed that gcc's nested function extension doesn't let me use a switch to have a second entry point for the inner function.

I also can't seem to get there with a plain goto. I can't jump out either. Probably the computed goto extension will do the job, but that is cheating.


Using computed goto would (probably) make the code compile, but it wouldn't actually work. That's would be heading directly into undefined behavior territory (aka land of the nasal demons).

To give just one or two examples of why this is doomed: imagine you want to jump "inward". If the inner function takes any arguments, you haven't passed them! And if you jump inward, and the inner function then hits a return statement, do you return to the outer function, or to its caller?

And after you resolve all such semantic issues, you need to actually implement all this, without slowing down code which doesn't use this feature.

And all this complexity doesn't buy you much because you have two other options:

1. just call the function to jump inwards, and simply return to jump out.

2. Don't bother with nested functions, and just inline the code.


So it turns out I'm kind of wrong! My arguments for jumping "inward" still seem pretty valid, but they don't generalize to jumping "outward". And indeed, GCC actually allows this, but you need to use the right syntax:

https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html


> There's two kinds of C programmers: those familiar with Duff's device, and those who wouldn't expect Duff's device to even compile.

There's also the C programmers that come from assembler for whom the construction expressed by Duff's device is very natural. They would have a hard time believing that somebody find it surprising. For those people, it is obvious that the code should compile and its meaning is clear, as can be mapped directly to the compiled code.


Sounds like a benchmark is in order!


What's not worth it? Loop unrolling? Something else?


Does duff's device create irreducible control flow? I couldn't tell you off the top of my head but seems likely and if it does that's a good reason to avoid as some modern compilers do not like irreducible control flow.


Yes, it creates a loop with multiple entry points, which is irreducible. (I think it's even one of several equivalent definitions of irreducibility.)


Putting the code into Godbolt the control flow

      switch (count % 8) {
turns into

          jmp     [QWORD PTR .L4[0+rdx*8]]
just like you'd do if you were doing the assembly by hand.


Yes that’s how gcc implements a switch. But what’s they got to do with irreducibility?


'irriducible'?


It comes from 'reduce' - I think it's spelt 'irreducible', but maybe you're American and I'm British.


Made me smile. No, I'm questioning the meaning here. I have a vague inkling it's a property of the dataflow graph of a program, I'm just struggling to remember what. Is it literally that repeated collapses of basic node types (if/while/sequence) will end up with a single node? Basically 'does it properly nest'? Or is it something else? (and I'm a brit too BTW)


Yes it means properly-nested, or really the same thing as 'structured' as in 'structured programming.' I'm sure there's some graph-theoretic definition but I don't know it off the top of my head.



modern cpus rarely do i/o that way anymore




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: