Well, yes. It's much easier to diagnose after the problem has been found :)
The 'gotcha' part in this case was that the SSL keygen code did not just use a random number seed but was grasping for entropy from other sources too (PIDs, perhaps? I can't recall the detail). That unknown made initial attempts to recreate the problem difficult.
Plus the failing test in question was not directly testing the SSL, merely making use of it. There were other SSL-specific tests run separately but they missed this strange corner case (in 'normal' use, it wasn't like 1 in 256 transactions would fail, otherwise that would have been much more obvious and we'd have had spurious-seeming failures all over the place).
That brings up another test pain: You can write lots of specific unit tests for every individual feature your code has, and they can all pass just fine. But when feature A, D and H happen to all be in use at once, you hit a separate problem. Onwards to 100% coverage...
> Well, yes. It's much easier to diagnose after the problem has been found :)
That's kind of unfair, it's pretty much the logical deduction if you think about randomness in your unit tests. I.e., suppose the test fails, then what? You want to reproduce, but really all you know is that the test indicates the possibility of a bug. In order to investigate, you really need to follow the same code path that the test did. How do you do that? Capture the seed.
There are so many other possible factors: timings, interactions with other processes, memory allocations, network conditions, etc. A random seed is just one of several non-deterministic factors that could have been the cause.
Fair enough, though interactions with other processes and network conditions are not relevant for unit tests. Timings maybe, but then you still need a strategy to turn a failing test into information you can use to fix a bug.
I have horrible memories of time spent tracking down other test failures caused by helper programs not shutting down cleanly, and causing obscure bugs in later tests. e.g. a test failing because it couldn’t bind to port 443, because an earlier test helper program sometimes didn’t clean up properly. That’s just one possibility for a process and/or network condition that ruins a test.
The 'gotcha' part in this case was that the SSL keygen code did not just use a random number seed but was grasping for entropy from other sources too (PIDs, perhaps? I can't recall the detail). That unknown made initial attempts to recreate the problem difficult.
Plus the failing test in question was not directly testing the SSL, merely making use of it. There were other SSL-specific tests run separately but they missed this strange corner case (in 'normal' use, it wasn't like 1 in 256 transactions would fail, otherwise that would have been much more obvious and we'd have had spurious-seeming failures all over the place).
That brings up another test pain: You can write lots of specific unit tests for every individual feature your code has, and they can all pass just fine. But when feature A, D and H happen to all be in use at once, you hit a separate problem. Onwards to 100% coverage...