Hacker Newsnew | past | comments | ask | show | jobs | submit | eMSF's commentslogin

Related to this, GNU's libstdc++ shared_ptr implementation actually opts not to use atomic arithmetic when it infers that the program is not using threads.


I never heard of this and went to check in the source and it really does exist: https://codebrowser.dev/llvm/include/c++/11/ext/concurrence....


The code you linked is a compile-time configuration option, which doesn't quite match "infer" IMO. I think GP is thinking of the way that libstdc++ basically relies on the linker to tell it whether libpthread is linked in and skips atomic operations if it isn't [0].

[0]: https://snf.github.io/2019/02/13/shared-ptr-optimization/


It's a compile-time flag which is defined when libpthread is linked into the binary.


Sure, but I think that's independent of what eMSF was describing. From libgcc/gthr.h:

    /* If this file is compiled with threads support, it must
           #define __GTHREADS 1
       to indicate that threads support is present.  Also it has define
       function
         int __gthread_active_p ()
       that returns 1 if thread system is active, 0 if not.
I think the mechanism eMSF was describing (and the mechanism in the blogpost I linked) corresponds to __gthread_active_p().

I think the distinction between the two should be visible in some cases - for example, what happens for shared libraries that use std::shared_ptr and don't link libpthread, but are later used with a binary that does link libpthread?


Hm, not sure. I can see that shared_ptr::_M_release [0] is implemented in terms of __exchange_and_add_dispatch [1] and which is implemented in terms of __is_single_threaded [2]. __is_single_threaded will use __gthread_active_p iff __GTHREADS is not defined and <sys/single_threaded.h> header not included.

Implementation of __gthread_active_p is indeed a runtime check [3] which AFAICS applies only to single-threaded programs. Perhaps the shared-library use-case also fits here?

Strange optimization IMHO so I wonder what was the motivation behind it. The cost function being optimized in this case is depending on WORD being atomic [4] without actually using the atomics [5].

[0] https://codebrowser.dev/llvm/include/c++/11/bits/shared_ptr_...

[1] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....

[2] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....

[3] https://codebrowser.dev/kde/include/x86_64-linux-gnu/c++/11/...

[4] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....

[5] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....


> Implementation of __gthread_active_p is indeed a runtime check [3] which AFAICS applies only to single-threaded programs. Perhaps the shared-library use-case also fits here?

The line you linked is for some FreeBSD/Solaris versions which appear to have some quirks with the way pthreads functions are exposed in their libc. I think the "normal" implementation of __gthread_active_p is on line 248 [0], and that is a pretty straightforwards check against a weak symbol.

> Strange optimization IMHO so I wonder what was the motivation behind it.

I believe the motivation is to avoid needing to pay the cost of atomics when there is no parallelism going on.

> The cost function being optimized in this case is depending on WORD being atomic [4] without actually using the atomics [5].

Not entirely sure what you're getting at here? The former is used for single-threaded programs so there's ostensibly no need for atomics, whereas the latter is used for non-single-threaded programs.

[0]: https://codebrowser.dev/kde/include/x86_64-linux-gnu/c++/11/...


> Not entirely sure what you're getting at here?

> I believe the motivation is to avoid needing to pay the cost of atomics when there is no parallelism going on.

Obviously yes. What I am wondering is what benefit does it bring in practice. Single-threaded program with shared-ptr's using atomics vs shared-ptr's using WORDs seem like a non-problem to me - e.g. I doubt it has a measurable performance impact. Atomics are slowing down the program only when it comes to contention, and single-threaded programs can't have them.


> What I am wondering is what benefit does it bring in practice. Single-threaded program with shared-ptr's using atomics vs shared-ptr's using WORDs seem like a non-problem to me - e.g. I doubt it has a measurable performance impact.

I mean, the blog post basically starts with an example where the performance impact is noticeable:

> I found that my Rust port of an immutable RB tree insertion was significantly slower than the C++ one.

And:

> I just referenced pthread_create in the program and the reference count became atomic again.

> Although uninteresting to the topic of the blog post, after the modifications, both programs performed very similarly in the benchmarks.

So in principle an insert-heavy workload for that data structure could see a noticeable performance impact.

> Atomics are slowing down the program only when it comes to contention, and single-threaded programs can't have them.

Not entirely sure I'd agree? My impression is that while uncontended atomics are not too expensive they aren't exactly free compared to the corresponding non-atomic instruction. For example, Agner Fog's instruction tables [0] states:

> Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices, then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. This also applies to the XCHG instruction with a memory operand.

And there's this blog post [1], which compares the performance of various concurrency mechanisms/implementations including uncontended atomics and "plain" code and shows that uncontended atomics are still slower than non-atomic operations (~3.5x if I'm reading the raw data table correctly).

So if the atomic instruction is in a hot loop then I think it's quite plausible that it'll be noticeable.

[0]: https://www.agner.org/optimize/instruction_tables.pdf

[1]: https://travisdowns.github.io/blog/2020/07/06/concurrency-co...


Thanks, I'll revisit your comment. Some interesting things you shared.


Even with a 5" screen, it was bigger than for example iPhone 17 in every single dimension due to its hefty bezels (and not insignificantly so; iPhone 17 is closer in width to iPhone 13 Mini than a Dell Streak).

Screen diameter is in general a bit misleading figure for phones from different generations as "full screen" phones tend to have a taller aspect ratio and hence larger diameter even with same body dimensions.


Only a little bigger but you're right, I didn't think the bezels on the Streak looked that big but I see they're really pretty substantial. It's the dimensions of a modern 6.5" phone, basically.


While it is true that only class types have member functions in C++, that does not mean that objects of other types demand the use of macros. C++ also supports non-member functions, and the standard library contains a fair amount of these; including `std::size` that can be used to "get" the length of an array.

(C++ arrays are different from arrays in many other programming languages, though not necessarily Rust, in that their type specifies their length, so in a way this is something you already "have" but certainly there are cases where it is convenient to "get" this information from an object.)


Ah, std::size, since c++17. No wonder nobody knows about it, that’s after we all started switching to rust.


How could you ever continue after the second statement without checking if you actually read an integer or not? How would you know what you can do with a?


You couldn't or wouldn't. but why have a read statement like cin>> which looks so nice and clean when you then have to go and check everything with flags and boolean casts on stateful objects.

I agree. It's lunacy. just be explicit and use functions or equivalent like literally every other language.


Well in a language like Haskell you could solve this with monads and do-notation. The general idiom in Haskell is to use a Maybe or Either monad to capture success/failure and you assume you’re on the happy path. Then you put the error handling at the consumer end of the pipeline when you unwrap the Maybe or Either.

I believe Rust has adopted similar idioms. I’ve heard the overall idea referred to as Railway-oriented programming.

In C++ you could implement it with exceptions, though they bring in a bunch of their own baggage that you don’t have to deal with when using monads.


from_chars_result closely matches the strtol line of functions we've had for decades. Not returning the end position would be weird here!


Well, obviously it doesn't have parentheses. It's not like this is the only instance where adding parentheses affects the end result.

You could write even more complex declarators (but don't have to), but that would not prove that some other syntax is inherently intuitive. Case in point, I cannot parse the Go syntax as I do not know Go.

In my experience pointers to arrays are rather uncommon and I'm not sure that I've ever written a function returning one, having even less of a need for a pointer to such. (Thus out of all these, only your first example is somewhat common in practice.)


> I cannot parse the Go syntax as I do not know Go.

Or you probably never even tried. You should be immediately able to parse it if I provide a hint that `*`, `&` and `[10]` mean roughly the same thing as C, because `*[10]int` has no reasonable reason to be parsed as an array of 10 copies of something. You can't do so in C.


Right. These are just "old chestnuts" used to scare C noobs particularly in interviews. IIRC the K&R C book itself had a example program to convert C declarations to English and also there exists a utility program called "cdecl" to do the same.


Better to use that English explanation as a model of readable syntax.


It is but you just have to know how to map it.


ABP was initially released almost a full decade before UBO was a thing (although I don't know if the Chrome version is related), so I wouldn't judge someone for just using it...


Why does everything have to fit in the memory all of a sudden? Open files, there's your infinite tape.


Your position in a file has to be uniquely specifiable with fpos_t, so you can't have an infinite file in C.

> fpos_t is a complete object type other than an array type capable of recording all the information needed to specify uniquely every position within a file.

[7.23.1p3 of N3054 working draft of C2x, as that's what I have open right now.]


> Your position in a file has to be uniquely specifiable

`socket()` has entered the conversation.


`socket()` is not part of C99.


It will eventually not matter, but I didn't suggest using a single file.


The Finnish caption is quite a bit worse than "all your base are belong to us" type of invalid grammar. Translating it back, even with best intentions (ignoring the naive attempt of translating "from" in place) it reads "Create images from AI generated words".


A lot of folks write char const * (and T const& in C++), and the "rule of thumb" is to read the declaration right to left. In this case, pointer (to) const char. Works also with multiple consts or levels of indirection.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: