I wouldn’t call it thread-safe when race conditions are possible.

jldugger · on Oct 17, 2021

Then we need a term for when code race conditions are possible but rare enough that nobody using the software notices. thread-timebomb?

nuerow · on Oct 17, 2021

> Then we need a term for when code race conditions are possible but rare enough that nobody using the software notices. thread-timebomb?

There's already a term for that: not thread-safe.

The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occurr. It only assesses whether a specific class of problems is eliminated or not.

jldugger · on Oct 17, 2021

>The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occur.

Well, obviously.

The challenge I am putting forth on HN is to meaningfully describe _usable_ thread-unsafe software. If you've spent enough time outside university, you'll be aware that there are all kinds of theoretical race conditions that are not triggered in practical use.

klyrs · on Oct 17, 2021

If you've worked at industrial scale, you'll be aware that even the most theoretical-seeming race condition will be triggered frequently.

The_Colonel · on Oct 17, 2021

That reminds me how I was called to fix some Java service, which was successfully in production for 10 years with hardly any incident, but it suddenly started crashing hard, all the time. It was of course a thread safety issue (concurrent non-synchronized access to hashmap) which laid dormant for 10 years only to wreak havoc later.

Nothing obvious changed (it was still running a decade old JRE), perhaps it was a kernel security patch, perhaps a RAM was replaced or even just the runtime data increased/changed in some way which woke up this monster.

jldugger · on Oct 18, 2021

> If you've worked at industrial scale,

Fun fact, I actually do! It's from that perspective I wrote that: every time you perturb the software environment, a new set of bugs that didn't happen in the old env before arises.

dkersten · on Oct 17, 2021

That's not useful. If you have a race condition, you will eventually hit it and when you do, you may get incorrect results or corrupt data. Thread unsafe is thread unsafe, regardless how rare it appears to be.

Also, rare on one computer (or today's computer) might not be rare on another (tomorrows faster one for example).

These types of bugs are also very hard to detect. You might not know your data is corrupted. Reminds me of how bad calculations in excel has cost companies billions of dollars, except now, the calculations could be "correct" and the error sitting dormant, just waiting for the right timings to happen. Much better to not make assumptions about the safety and think about it up front: if you are using multiple threads, you need to carefully consider your thread safety.

phkahler · on Oct 18, 2021

>> then we need a term for when code race conditions are possible but rare enough

There is no such thing as "rare enough". Random or probabilistic bugs are one of the worst things software can have.

lazide · on Oct 18, 2021

While I agree - plenty of CIOs and managers vote the opposite with their wallets.

Why not just set it to auto reboot every week, that seems to fix it - right?

phkahler · on Oct 18, 2021

MCAS only made the planes crash sometimes and then only when a sensor failed

formerly_proven · on Oct 17, 2021

Heisenbug.