Async2 – The .NET Runtime Async experiment concludes

cube2222 · on Aug 22, 2024

This all makes sense but

> Green threads are different. The memory of a green thread is allocated on the heap. But all of this comes with a cost: As they aren't managed by the OS, they can't take advantage of multiple cores inherently. But for I/O-bound operations, they are a good fit.

this is clearly not true? Am I missing some nuance here, as I'm sure the author knows what they're talking about?

Green threads can totally use a multi-threaded runtime, like e.g. Go does, and it works just fine. The main hurdle with them is arguably FFI.

neonsunset · on Aug 22, 2024

What this likely means is for you to take advantage of the underlying runtime multiplexing green threads over multiple physical ones running on multiple cores, you need to explicitly fork the execution flow.

This could be as simple as a web server firing off a new green thread or a goroutine for an incoming request, or as contrived as doing so manually within a function scope.

In practice, there really is not much difference with async/await. "Green threads" is a combination of implementation details and a subset of what async/await abstractions achieve.

Effectively, Goroutines are in many ways similar to C# Task<T>s. The difference is that in Go you are expected to explicitly send the result via a channel or some other data structure and then synchronize the completion of the execution, where-as with tasks you simply await that.

There could be an argument made about preference of implicit suspend (Go, Java, BEAM family) over explicit suspend (C#/F#, Rust, JS, Python, C++ co_await, Swift), but for practical purposes invoking a function with 'go' keyword in Golang is very similar to firing off a synchronous method with Task.Run in C#, or calling an asynchronous method (with sufficiently short body before first yield) and not immediately awaiting it.

As I usually post it on HN, tasks make the following patterns trivial:

    using var http = new HttpClient {
        BaseAddress = new("https://news.ycombinator.com/")
    };

    // not immediately awaited requests are executed in parallel
    var frontPage = http.GetStringAsync("news?p=1");
    var secondPage = http.GetStringAsync("news?p=2");

    Console.WriteLine($"{await frontPage}\n\n{await secondPage}");

spinningslate · on Aug 22, 2024

> The difference is that in Go you are expected to explicitly send the result via a channel or some other data structure and then synchronize the completion of the execution, where-as with tasks you simply await that.

That may be be the case in Go but it's not an inherent property of green threads. See, for example, Gleam Tasks [0] which are based on green threads and provide the syntatic convenience of being able to await the result rather than receiving a message:

    let task = task.async(fn() { do_some_work() })
    let value = do_some_other_work()
    value + task.await(task, 100)

They do so without the disadvantage of bifurcating the code base into sync and async functions.

[0] https://hexdocs.pm/gleam_otp/gleam/otp/task.html

neonsunset · on Aug 22, 2024

Of course.

The discussion regarding Goroutines is to highlight that, despite prevalent claims of otherwise, they are not doing something unique and for developers who are used to languages with powerful concurrency primitives look like an incomplete task abstraction. "Green Threads" really is an implementation detail, in many ways orthogonal to pros/cons of implicit and explicit suspend points.

I hope your opinion about C#'s task system has improved since the last time[0], given what Gleam (and, in many ways, Elixir) does looks practically identical :)

[0]: https://news.ycombinator.com/item?id=40427935

spinningslate · on Aug 22, 2024

>I hope your opinion about C#'s task system has improved since the last time[0], given what Gleam (and, in many ways, Elixir) does looks practically identical :)

Well no, not really I'm afraid. My reservation has always been with the codebase birfircation per my previous post. Gleam/BEAM languages, Go, and now Java, don't have async and sync functions. They have one kind of function which can be called either synchronously or asynchronously. The difference is in who decides: the function caller or function implementer. That a sync function can't call an async one amplifies the problem.

I know you dislike the "coloured function" metaphor [0] but for me it's a significant issue. I look at lots of C# and Python code, and see libraries now encumbered with both sync and async function variants (e.g. [1] [2]). That, to me, is a significant downside to async/await as implemented in those languages.

The Gleam example has all the convenience and readability of its C#/Python counterpart - but without the downsides.

I don't doubt you'll disagree :).

[0] https://news.ycombinator.com/item?id=39497110

[1] https://learn.microsoft.com/en-us/dotnet/api/system.io.strin...

[2] https://fastapi.tiangolo.com/async/

ikekkdcjkfke · on Aug 22, 2024

Does anyone actually do anything other than immediately await the async thing? But all callers need to wrap everything in Task<> and awaits and async and whatnot... If you want to do some parallel processing on a collection I'm sure we could find a way to do that instead of adding all the clutter we have now

andygocke · on Aug 23, 2024

> The Gleam example has all the convenience and readability of its C#/Python counterpart - but without the downsides.

This was mentioned in the write-up, but the big downside is interop. Green threads have significant downside when going across OS threads.

This is the same reason why Rust ended up with async. Async is basically the cost you pay for C interop. However, C# runtime-async will likely be much simpler than Rust async since ownership is GC-managed and doesn't need to be transferred across threads.

All that said, I'm also not convinced the codebase bifurcation is a bad thing. Async ~= I/O. As a regular C# user, I'm not particularly unhappy about splitting my app into "I/O things" and "not I/O" things.

neonsunset · on Aug 22, 2024

> The Gleam example has all the convenience and readability of its C#/Python counterpart - but without the downsides.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/... :)

Assuming it is unlikely Gleam does something that makes it outperform Erlang, which it compiles to. Now, it comes down to just how BEAM VM is, but nonetheless an argument - C# is a close to the metal programming language where your choices directly affect what happens under the hood, including the use of async/await. For some it may be an undesirable trait but it's precisely what makes it so fast, in domains where languages that are "more abstracted away" used to historically struggle and continue to have lower performance ceiling.

As usual, the "coloring" point misses the patterns that async/await enables, and that it is in many ways an "I/O Monad". Still, mixing faux-concept "differently colored threads" in .NET does not come with the same degree of pain it does in Rust or elsewhere (and there are good reasons for that).

You can block threads if you have to (which includes synchronously waiting for tasks), and Threadpool is desiged to deal with that appropriately and increase/decrease worker threads count to maintain optimal throughput. You just don't have to pay for it always, and as multitude of alternate implementations suggest there is no free lunch, as usual.

Also, sync vs non-async overloads, as I previously discussed, often do I/O in a completely different way. Other languages sell it with loud buzzwords like "NIO", while .NET keeps it boring - the workload will scale without your explicit effort, never throttling independent execution flows.

I continue to be convinced of the sheer degree of harm done by "that one article", and applying C# in practice cures this perception once you stop worrying and love the easy concurrency and parallelism that come with it.

spinningslate · on Aug 22, 2024

you've made the performance argument before and I don't refute that C# shows up better in benchmarks than BEAM languages. Or many others for that matter (including Python).

The reductive argument there is that if performance is the sole priority then write machine code. That's extreme. A more robust one is that, according to the same benchmarks you reference, Rust is meaningfully faster than C# and C faster still. So if performance is the overriding objective then use one of those.

You'll justifiably push back on that and raise other factors in favour of .Net. And that's the point: it's about trade-offs and preferences.

For the apps I've built and been involved with, real world performance has been within commercial tolerance using languages that, at least according to benchmarks, are slower than the top performers. In teams of moderate size and above, managing codebase size and evolution is usually a bigger challenge. Requiring sync and async variants of functions detracts from that: not to mention the overhead of ensuring some level of consistency in when to use each form.

> I continue to be convinced of the sheer degree of harm done by "that one article"

We'll just have to disagree agreeably on that one. I see the coloured function metaphor as an elegant articulation of an important limitation, and it has served the community well in describing the problem.

> applying C# in practice cures this perception once you stop worrying and love the easy concurrency and parallelism that come with it.

Another disagree agreeably. In isolation yes, but with a non-trivial cost in bifurcation and the need for async/sync variants.

neonsunset · on Aug 23, 2024

The claim was not that the performance is absolute. Instead, I'm poking at the assertion that Gleam's implementation does not have downsides, which is rather silly in the context of our discussion, is it not? (also async in C# and in Python are very different)

Not to mention, in the original reply this was raised as a discussion of implicit vs explicit suspend points as means to asynchrony and M:N threading and their trade-offs. Instead, you felt like reframing this as purely Language A vs Language B. I too am guilty of this, but I try to do better. In either case it tends to be less productive and derails the discussion, and is just not very nice.

On the "bifurcation of sync/async in .NET" question which seems to be what the argument in its confusion revolves around, I have written a long-form post and extracted it into a gist to avoid polluting the discussion: https://gist.github.com/neon-sunset/640a38f9f2af73ad888cb5b0...

Still, this subject deserves a better, proper, much more information-dense overview, ideally accessible to people unfamiliar with details of either async/await and tasks/futures or implicit suspend, how they relate to implementation strategies available to each of them, etc.

Unfortunately, I only have so much time and can spend so much effort on this, nor am sure whether there's value in that - I'm getting an impression that these replies come from a place focused on just seeking to confirm their point of view and signing boring praises to how Erlang and its derivatives is the one and only approach rather than understanding what drives different design decisions for achieving concurrency/parallelism across programming languages.

gpderetta · on Aug 22, 2024

There is nothing particularly closer to the metal in stackless coroutines, compared to stackful.

pjmlp · on Aug 23, 2024

Also a big reason why C++ co-routines are the way they are, is that they were originally modeled in C# async/await as per Microsoft design in C++/CX before submission to the WG21 process.

With the big difference that all those magic classes, that also exist in a similar form in .NET, do have support in Windows Concurrency Runtime, and later C++/WinRT.

However since WG21 left the runtime part as exercise for the reader, we have the current mess of C++ co-routine talks at each conference, and even so, not everyone gets them.

bilekas · on Aug 22, 2024

There are nuances with multi-threading in C#.

I don't agree with OP about I/O-bound ops, I think if you're looking to green threading, you've taken a wrong approach.

> [0] the Task.Runmethod offloads the provided action to the thread pool, and the await keyword yields control back to the caller until the task completes.

All async code must be in an async call stack, virtual threads are 100% transparent because its the runtime scheduling them so you get a but more control than relying on the yeild of dotnet at least as I see it.

Again I don't see the huge demand for it personally, but I barely touch dotnet too often so take this with a grain of salt.

[0] https://stackify.com/c-threading-and-multithreading-a-guide-...

arghwhat · on Aug 22, 2024

> I don't agree with OP about I/O-bound ops, I think if you're looking to green threading, you've taken a wrong

It depends in the implementation. In Go for example, all I/O is async and suspend your green thread, replacing it with another runnable green thread.

This works the same as if you managed an event loop on your own for the purpose of I/O, which is the best way to handle I/O outside for regular user space code. It’s just automatic with your code resembling a simple, blocking scenario.

OPs note on threading would be C# or runtime specific - green threads have no problem with parallelism, with runtimes commonly having a thread per core (or more) and having them all run green threads in parallel.

DougBTX · on Aug 22, 2024

The “inherently” means not by default, i.e., the runtime has to support moving green threads between OS threads itself.

01HNNWZ0MV43FF · on Aug 22, 2024

That's not how I use "inherent", maybe they should just say "default" then?

But that's like... C is also single threaded by default, what isn't?

jayd16 · on Aug 22, 2024

They will never be transparently/fundamentalally managed by the OS alone. The runtime will need to determine how to juggle green threads across multiple OS threads. In that way, this mapping is not inherent.

It can be designed around but that itself is a runtime design decision and I would not say it's akin to default vs custom.

recursive · on Aug 22, 2024

With respect, it's not particularly relevant how you use "inherent". It's a standard usage. Rather than asking the whole rest of the world to change, you should probably learn the definition.

layer8 · on Aug 22, 2024

“Inherently” means “intrinsically”, meaning it’s a characteristic that can’t be changed without changing the nature of the thing. It doesn’t mean “by default”.

cube2222 · on Aug 22, 2024

Ah, that makes sense, thanks!

louthy · on Aug 22, 2024

Presumably, it just means there needs to be explicit forking of the green thread for cpu bound operations, otherwise everything will run synchronously (because there’s no point where the green thread is paused to wait for an IO IRQ).

That is unless your compiler or JIT injects occasional yields into your synchronous code!

adgjlsfhk1 · on Aug 22, 2024

Jits already have to do this for GC so it's actually free

Rohansi · on Aug 22, 2024

And that wouldn't be great for performance.

01HNNWZ0MV43FF · on Aug 22, 2024

The overhead for epoch stopping like wasm uses can be something like 1%. I did a synthetic test with native code once because I was curious.

I think Go also injects yields into its generated code for go routine scheduling

pron · on Aug 22, 2024

The efficiency and complexity of user mode threads heavily depend on constraints imposed by the particular language. E.g. if the language supports pointers into the stack, user mode threads would be less efficient; if the language is largely dependent on manual memory management -- user mode threads would be more expensive; if the language already has some other concurrency primitives (like async/await) -- user mode threads will be more expensive (although in this case in terms of complexity rather than runtime efficiency). Because Java exposes relatively little of its implementation details, we've been able to implement efficient user mode threads even without any FFI overhead.

andygocke · on Aug 23, 2024

Well, any additional FFI overhead, right?

The cost for exposing very little tends to be that marshaling costs more due to the requirement that values be copied between domains rather than shared.

pron · on Aug 23, 2024

That's a matter of perspective.

Calling a C function in a shared library (dll, so) from Java using the new FFM API has the same overhead as calling such a function from C++ (although the overhead is higher if the called function upcalls into Java again, though that is relatively rare, or if the function blocks, only that makes the additional overhead negligible). But the FFM API does not directly expose Java objects to native code at all, although it does allow Java code to access and mutate "off-heap" native memory (C data) from Java code as efficiently as accessing and mutating Java heap memory. So if your goal is to expose Java objects to native code, then yes, that would require marshalling (although ideally you should do the opposite and expose native memory to Java code as trhough a Java interface, which would have no overhead).

However, relying on FFI in Java is far less common than in Python, Rust, or even C# or Go, and in the rare cases it's done it's easy to do it cheaply as I described. So I guess it's true to say that if you wanted FFI to work in the same manner it is employed in those other languages then yes, it would be more expensive as it would require marshalling, but that's just not the case in Java given the combination of Java's performance and size of its ecosystem of libraries.

Languages with worse performance or with smaller ecosystems do need to rely much more heavily on FFI and so they often choose to sacrifice the flexibility of their implementation in favour of a more direct flavour of FFI.

andygocke · on Aug 23, 2024

I agree with your general point, that it depends on your specific problem how difficult this is, but I disagree about how common or easy to work around.

Regarding

> But the FFM API does not directly expose Java objects to native code at all, although it does allow Java code to access and mutate "off-heap" native memory (C data) from Java code as efficiently as accessing and mutating Java heap memory

I just don’t buy it. First, I think it’s very common to want to expose managed memory to native. In fact, it might be the dominant case. If I want to call out to perform a crypto operation on a block of bytes I got from a Java operation, I don’t want to copy them first.

Second, I think you’re missing the use case for manipulating system APIs. If you want to perform some system call and the call requires setting up some structures as arguments, that’s going to be pretty expensive in Java. For things that are called a lot it can add up. For example, windows has a profiling and eventing system called ETW. To use it you create a set of events and call the system. It’s not uncommon to do this for thousands or millions of events per second. The way C# handles this is stack allocating an event blob and calling directly. I can’t imagine a Java workaround that would be as fast or simple. It seems like you’d have to pool a native event blob allocation and fill it in from Java.

It’s true that most Java programmers aren’t blocked by this but I think that’s because many Java programmers don’t try to use Java for these tasks. They don’t write systems software in Java and they don’t embed into big, performance-sensitive native apps, like games.

pron · on Aug 23, 2024

> First, I think it’s very common to want to expose managed memory to native. In fact, it might be the dominant case. If I want to call out to perform a crypto operation on a block of bytes I got from a Java operation, I don’t want to copy them first.

Doing it this way is not so common in Java anyway. First, primitive operations for crypto are intrinsics in Java and operate without FFI at all. Second, IO input and output buffers in high-performance applications are typically in off-heap buffers anyway (i.e. you serialize data to an off-heap buffer and then do crypto and then send it over the wire, or you receive data in an off-heap buffer, do crypto, and then deserialize).

> Second, I think you’re missing the use case for manipulating system APIs. If you want to perform some system call and the call requires setting up some structures as arguments, that’s going to be pretty expensive in Java.

It's not, because FFM allows you to manipulate native structs with no overhead. You do this efficient kind of stack allocation of native structures with FFM's Arenas and SegmentAllocator (https://docs.oracle.com/en/java/javase/22/docs/api/java.base...)

> They don’t write systems software in Java and they don’t embed into big, performance-sensitive native apps, like games.

It's true low-level programs are typically not written in Java, but the applications programming market is bigger. I wouldn't be at all surprised if applications written in Java alone comprise a bigger market than all intrinsically low-level applications combined. As for embedding in another application, there is no intrinsic reason not to do it in Java, but 1. traditionally and for "environmental" reasons Java hasn't been huge in the games space (except for Minecraft, of course) and 2. it's been less than six months since FFM became a permanent feature in the JDK; JNI, the FFI mechanism that preceded FFM was really quite cumbersome to use so it's not surprising people opted for more convenient FFI.

andygocke · on Aug 24, 2024

> First, primitive operations for crypto are intrinsics in Java and operate without FFI at all.

This is a pretty strange assertion given that I didn’t specify the crypto operation I wanted to perform. Is XAES-256-GCM available in the Java standard library?

> Doing it this way is not so common in Java anyway

Sure, because doing it the other way would be very expensive. But that doesn’t mean applications which can’t front or backload native processing don’t exist, it just means they will have slower throughput in Java.

It’s fine for a language to make that tradeoff, but it is a tradeoff

pron · on Aug 24, 2024

> Is XAES-256-GCM available in the Java standard library?

No (is it in any language's standard library?) but everything you need to implement it in Java is available.

> But that doesn’t mean applications which can’t front or backload native processing don’t exist, it just means they will have slower throughput in Java.

They won't, because working with native memory is just as efficient as working with heap memory. You store your bytes in a MemorySegment and you don't care if it's backed by an on- or off-heap buffer. I guess you could say, oh, but when working with FFI in Java you may need to keep some buffers off-heap if you don't want to copy bytes, but that's common practice in Java since JDK 1.4 (2002).

> It’s fine for a language to make that tradeoff, but it is a tradeoff

There is a tradeoff, but it's not on performance. Rather than expose Java heap objects directly to native code (which is possible with the old JNI, but not the recommended approach), Java says keep the bytes that you want to efficiently pass to native code off-heap and makes it easy to do (through the same interface for on- and off-heap data).

Rather than constrain the implementation, which could have performance implications always, Java gives you the choice to have no FFI overhead at the cost of a tiny bit of convenience when doing FFI. Given how rare FFI is in Java compared to many other languages, that is obviously the right design decision and it helps performance rather than harms it. So there is a tradeoff, but you're clearly trading away less than you would have if FFI were more common and the core implementation were impacted by it.

Ultimately, the question of "is it better to sacrifice language performance and flexibility in exchange for doing X (without significant performance overhead) in 3 lines instead of 30" depends entirely on the answer to the question how often users of the language need to do X. If the language is Java and X is FFI, the answer is "rarely" and so you're paying a small cost for a large gain. The tradeoff between the convenience of low/no-overhead FFI and language performance and flexibility becomes much more difficult and impactful in languages where FFI is more common.

andygocke · on Aug 23, 2024

I'm not sure that the original description is precisely correct, but yours isn't correct either.

Basically, you can't treat green threads just like "a multi-threaded runtime" and have it just work. That is, a 1:1 mapping between green threads and OS threads is just OS threads.

So fundamentally if you bounce your green stacks off of the actual stack they're going to need to go somewhere... and that place must be the heap.

There are pluses and minuses to this implementation, but the biggest minus is that it makes FFI very complicated. C# has an extremely rich native-interop history (having historically been used to integrate closely with Windows C++ applications) and therefore this approach raised some serious challenges.

In some sense, async is the cost for clean interop with the C/system ABI. Transition across OS threads requires something like async.

cube2222 · on Aug 23, 2024

I meant that you can have a multi-threaded runtime that will be executing your green threads in a multi-threaded fashion. Like in Go you have (by default) as many worker OS threads as CPUs, and the Go runtime will take care of scheduling your green threads on those worker OS threads (+ create threads as needed for blocking syscalls if I remember correctly, but that's getting way to deep into the details). And this will, in fact, "just work" from the user's perspective.

And yes, as both you said, and I said at the end of my previous comment, the main hurdle of green threads imo is FFI, but it's not what the article mentions, which is what surprised me.

andygocke · on Aug 23, 2024

Ah, I see. You were saying that green threads can usually be scheduled on multiple os threads and take advantage of parallelism. Yup, I agree. Apologies for the confusion.

neonsunset · on Aug 22, 2024

For everyone reading this blog post I caution to take the conclusions there with a grain of salt as they are an interpretation of the notes written down here: https://github.com/dotnet/runtimelab/blob/feature/async2-exp...

It is difficult to draw conclusions at the present moment on e.g. memory consumption until the work on this, which is underway, makes it into mainline runtime. It's important to understand that the experiment was first and foremost a research to look into modernizing async implementation, and was a massive success. Now once that is proven, the tuned and polished implementation will be made.

Once it is done and makes into a release (it could even be as early as .NET 10), then further review will be possible.

With that said, thank you for writing it, .NET tends to be criminally underrated and slept on by the most of the industry, and these help to alleviate it even if just a bit.

hirvi74 · on Aug 22, 2024

> .NET tends to be criminally underrated and slept on by the most of the industry

I have been programming in .Net/C# for about 8 years now. I absolutely hate Microsoft with every fiber of my being. However, I can't thank them enough for C# (and all the FOSS contributors as well). .Net has been such a pleasant experience that I truly do not want to program in any other language.

naasking · on Aug 22, 2024

> I absolutely hate Microsoft with every fiber of my being.

Why? I get it back when they were super hostile to Linux and open source, but those days are long past.

jayd16 · on Aug 22, 2024

They go from actively user hostile to passively user hostile. We just happen to be in a passive phase where we only have to worry about ads creeping into the command line and what have you.

Still... they often make good stuff, you have to admit.

MangoCoffee · on Aug 22, 2024

>hostile to Linux and open source, but those days are long past

Microsoft remains a giant elephant with questionable business practices. Their shift to SaaS and cloud computing meant they have to play nice with others.

naasking · on Aug 22, 2024

Sure, but that doesn't clarify why one would hate MS with every fiber of their being. I can see that with Oracle given they're still behaving that way, but MS seems mostly fine now. Execs have changed, the environment on what products are viable has changed (as you pointed out), so their attitude has changed with the times too.

pjmlp · on Aug 23, 2024

See the drama regarding the dotnet watch removal, killing VS4Mac, Xamarin related products are now a shade of what they were upon acquisition, C# DevKit has the same license as VS proper and many VS features are never going to leave VS to VSCode, the desktop development mess,....

It is much better now, however wind blows in all directions.

orphea · on Aug 22, 2024

They are super hostile to users of their OS, and these days are right now.

geodel · on Aug 22, 2024

Absolutely. It is just sad experience using Windows beyond some web browsing with Non-MS browser.

naasking · on Aug 22, 2024

Is the hate for Windows 11? I'm still using Windows 10 Pro. Aside from the occasional forced restart, it's fine.

vips7L · on Aug 23, 2024

Only Linux die hards on HN think they’re hostile. Real users of Windows never have this opinion. Windows is better than any flavor of Linux I’ve ever used.

anonymoushn · on Aug 22, 2024

Maybe because GP uses Windows sometimes.

runevault · on Aug 22, 2024

I see people complain sometimes that c# adds too many features, and I understand the concern, but with limited exceptions (I don't like the new global using stuff), they tend to feel tasteful to me. Also I'm really hopeful we'll get Discriminated Unions (under a name I'm forgetting) in c# 10.

jayd16 · on Aug 22, 2024

C# is a multi-paradigm language with everything and the kitchen sink but somehow the syntax and elegance is still best in class.

leosanchez · on Aug 22, 2024

It's C# 12 now. Maybe we will get discriminated unions in C# 15 or so

runevault · on Aug 22, 2024

Oh right I always get thrown off on .NET vs c# versions.

bob1029 · on Aug 22, 2024

I don't know if this proposal makes a lot of sense.

The existing async1/TPL path is stable & predictable. If you find yourself needing more performance, you can reach for hardware Thread instances and use whatever locking/waiting/sharing/context strategies you desire. Anything else is a weird blend of compromises that is going to have caveats that are not immediately obvious for your specific use case.

For example, async2 w/ runtime JIT appears to have some tradeoffs with regard to GC & memory usage and the experiment writeup leaves some open-ended questions here[0].

[0]: https://github.com/dotnet/runtimelab/blob/feature/async2-exp...

GordonS · on Aug 22, 2024

> If you find yourself needing more performance, you can reach for hardware Thread instances

True for code your team write, but async/await is kind of viral, and many libraries now only have async APIs, which makes them difficult to shoehorn into a threading approach.

bob1029 · on Aug 22, 2024

> async/await is kind of viral

Exactly part of my concern. Hypothetically we've got 2 competing viruses now.

GordonS · on Aug 22, 2024

Surely it's the same "virus"? Unless I'm mistaken, the async2 proposal wouldn't require any code changes, as it essentialy just moves compiler stuff to the runtime.

pjmlp · on Aug 22, 2024

They still need to interop the old async with the new async, without requiring recompilation of everything, otherwise it is going to be another .NET Framework to .NET Core, which is still ongoing.

neonsunset · on Aug 22, 2024

The current generated state-machine mechanism is not going away any time soon. Instead, Roslyn as well as any other compiler that targets CIL is expected, when appropriate, to stop generating state machines and start generating specially annotated method calls recognized by .NET.

This means that all existing libraries will continue working as is, and newly written or existing code that is compiled against e.g. C# 14/.NET 10(?) can automatically opt into new mechanism without user intervention. The write-up does cover the considerations for interoperability between these two models as well.

In fact, this will make using async from other CIL-based languages much easier as the burden of capturing the state and suspending/resuming the execution will be on the runtime rather than language compiler authors.

pjmlp · on Aug 23, 2024

The kind of scenarios where this kind of interop is put to test is when old code, without being recompiled, takes a subclass, an interface implementation, a member reference, a delegate, or a lambda, compiled with async2 model, as parameter, and everything works as expected.

Something that I hope they take into account, having new code calling into the old code, is the easy part of the problem.

andygocke · on Aug 23, 2024

Yes, we considered that and implemented a solution. Effectively, the runtime will generate thunk methods that will invisibly bridge between the two worlds. Calling (and overriding) regular async methods with runtime async methods will be stitched up by the runtime. The user will never see the difference.

pjmlp · on Aug 24, 2024

Cool! Thanks for jumping in.

andygocke · on Aug 23, 2024

In practice we don't think there will end up being tradeoffs in async2 vs. async1. If you look below, at the "JIT state machine" section, you'll see that async2 looked better and the GC behavior differences were probably transient.

Overall, there are no architectural reasons why the compiler version should be better. The runtime should be able to make perf decisions that are at least as good in every case.

whaleofatw2022 · on Aug 22, 2024

Trying to do clever stuff with threads tends to lead from brow beating, wailing, and gnashing of teeth by various folks who insist to just trust the threadpool.

kodablah · on Aug 22, 2024

> And for the transition phase, there has to be interop for async ↔ async2

For those like me who weren't clear whether `async2` was expected to be a real keyword in the final language, it's not[0].

0 - https://github.com/dotnet/runtime/issues/94620#issuecomment-...

TwentyPosts · on Aug 22, 2024

Have you read the article?

> As async is already used as an identifier in C#, the team decided to use async2 as a codename for the experiment. If that thing ever makes it into the runtime, it will be called async - so it will be a replacement for the current async implementation.

augusto-moura · on Aug 22, 2024

async2 looks like such a terrible keyword name, that it didn't even crossed my mind as an option. Only following your linked comment I understood that they used it as a temporary name while testing

jayd16 · on Aug 22, 2024

I'm encouraged by the talk of improved exception handling and a recognition of how much garbage it creates. It makes it hard to do things in an idiomatic way and keep GC pressure low.

I hope they also revisit task state tracking and TaskCanceledException. It feels out of place to use an exception (and unroll the stack) as control flow in the same API they added ValueTask to keep Task handles on the stack.

ikekkdcjkfke · on Aug 22, 2024

What is the fundemental problem we are trying to solve here? My pet problem with async is that it takes so much syntax, and i wonder why we need to do that. As i understand it it is all about blocking calls taking up OS threads.

So i will try to naievly solve it here, and maybe i end up with the same conclusions.

When doing a blocking call, the OS thread could just start executing something else transparently and not 'yield' anything to the code, it just stops executing the code and comes back and executes further, no await, no tasks no nothing. Ok, but how did we end up in a thread? The http listener started 4 threads and it's just putting a stack of function pointers and context memory for the threads to eat when they want. There is a separate engine on another OS thread that handles where the threads can start executing ready code again. No Task or await keywords show up in this code. I have no idea how stack traces work but i guess that can just be saved to memory and loaded back again when the threads feel like executing some ready code

irdc · on Aug 22, 2024

The rationale for async/await I keep hearing is that the separate stack required for each thread uses a lot of memory. I’m not sure going full on function colouring is the answer though. A concerted effort for language runtimes to use less stack space and then simply allocating smaller thread stacks sounds to me like a more elegant solution. It certainly is a whole lot easier to debug than a deeply-nested async/await chain.

iknowstuff · on Aug 22, 2024

But also avoiding the cost of context switching, and the ability to handle many tasks in embedded environments.

https://tweedegolf.nl/en/blog/65/async-rust-vs-rtos-showdown

irdc · on Aug 22, 2024

This makes it sound like using async/await is more like hand-optimising your code, somewhat similar to rewriting it in assembler. If a way can be found to improve thread switching times (eg. by only switching registers the program actually uses), the fact that the threaded code is easier to debug and reason about becomes a large advantage.

throwuxiytayq · on Aug 23, 2024

Only in the same way that assembly is safer, more readable and easier to write than C.

I’m not seeing significant improvements happening to the OS level threading model anytime soon, or ever really, at last in the context of existing mainstream operating systems.

layer8 · on Aug 22, 2024

What you propose is similar to how Java virtual threads work.

ikekkdcjkfke · on Aug 22, 2024

Thanks for pointing me in right direction. That's exactly what I'm wishing for in .net

jayd16 · on Aug 22, 2024

Ok well OS threads are already scheduled this way... The drawback is they're heavier because they make assumptions that the runtime doesn't need to make.

Furthermore, explicit Async/await provides a syntax for co-opoerative multithreading and that enables other patterns that implicit designs don't.

xeromal · on Aug 22, 2024

It's a bummer green threads didn't work out

giancarlostoro · on Aug 22, 2024

I still remember first hearing about Rust and how it was using Green Threads, then out of the blue a year later passes or months, and I'm reading that it doesn't do any of that anymore, it's basically C++ on steroids mixed with functional programming. I never did look up why they gave up on Green threads and other things, I wonder if they faced similar challenges?

steveklabnik · on Aug 22, 2024

https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

Note this was in 2014, before Rust 1.0.

giancarlostoro · on Aug 22, 2024

Yes it was a while back, wow 10 years.

hyutmyjahsd · on Aug 22, 2024

I miss peak rust.

wyager · on Aug 22, 2024

Rust has settled into an excellent design niche where the language does not provide any features that require a language runtime.

Rust's async/await is a purely compile-time feature, with no runtime support required.

This is extremely powerful because it means you can do things like easily run async code on embedded devices with no OS.

To run async code in rust, you bring your own executor. The most popular one is tokio (for desktop OSs), but you also have stuff like pollster (for running async code as blocking code) and embassy or RTIC (for embedded).

eknkc · on Aug 22, 2024

I might be completely wrong as I only played with rust but the current method seems to couple any async libraries you’d like to write to the executor. Which feels wrong imo.

Would it be feasible to provide async io (file, network) code in std and let the executors execute?

Or even some std traits that abstract the filesystem, network etc so that the executors can bring their own implementations but at least library code could only depend on std?

wyager · on Aug 22, 2024

Correct, libraries do have to either pick a specific async IO library or use a trait for async IO.

I have seen some of the latter in embedded-land, although I'm not sure if it's used in desktop async rust as well.

The rust compiler currently discourages async traits with a warning, which seems wrong to me, but we use them extensively in embedded anyway.

anonymoushn · on Aug 22, 2024

The feature is less than 1 year old, so a lot of the ecosystem couldn't use the feature because of being written before the feature existed. It does seem silly to have a warning today.

jtrueb · on Aug 22, 2024

Agreed, as much as there are complaints about usability of async in Rust, many alternatives proposed would fail in resource constrained systems. Green threading would be impossible or very limiting on embedded, where each stack eats away at something like 1-2% of memory and allocation is likely impossible or prohibitively expensive.

I do note that some of the futures are starting to take up a couple KiB of RAM and ROM.

anonymoushn · on Aug 22, 2024

This is a fine approach but it's trivial to ship green threads with no runtime support as well. Various libraries implementing the feature exist for C or Zig etc.

kbolino · on Aug 22, 2024

If you can accept cooperative scheduling that runs tasks only on a single core, then yes, lightweight solutions exist. This is more like "event-driven coroutines" than "green threads" though.

If you want "full" green threads with preemptive scheduling and parallel execution across multiple cores, then an active runtime is needed.

jasode · on Aug 22, 2024

>I still remember first hearing about Rust and how it was using Green Threads, [...] I never did look up why they gave up on Green threads

It's because green threads have unavoidable performance costs. I collected various threads that has Rust contributors explaining the costs they didn't want to pay: https://news.ycombinator.com/item?id=39062953

A language like Go is willing to pay those performance costs because they deliberately sit at a higher level of abstraction than lower-level Rust.

Each chose different tradeoffs for different goals:

- Golang : green threads worth the perf cost to gain higher productivity (goroutines, channels, etc) for the type of programs that Go programmers like to write

- Rust : green threads are not worth the perf cost so as to not sacrifice the best possible performance for the type of programs Rust programmers like to write

You can't have a language+runtime that satisfies both camps 100% perfectly. The language designer must choose the tradeoff. But because this tradeoff decision is not blatantly well-known and disseminated ... it also perpetually confuses Language X proponents on why Language Y didn't do the same thing as Language X.

giancarlostoro · on Aug 22, 2024

Thank you for this! This is exactly why I posted my comment, I was hoping someone would share some insight about Rust.

01HNNWZ0MV43FF · on Aug 22, 2024

What do Go channels do that takes advantage of their green threads?

zamalek · on Aug 22, 2024

Rust can have green threads. It depends on the reactor. The abstraction sits one level lower than nearly everything else out there.

Ygg2 · on Aug 22, 2024

There were plans for green threads, however, green threads were abandoned due to runtime costs around like 6 years ago?

https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

EDIT: 10 years ago.

anonymoushn · on Aug 22, 2024

"green threads" as discussed in TFA don't require a runtime. The relevant issues are maybe https://github.com/rust-lang/rust/issues/33368 and https://internals.rust-lang.org/t/what-is-the-current-safety...

steveklabnik · on Aug 22, 2024

It’s coming up on ten years. Time flies.

Ygg2 · on Aug 22, 2024

I thought it was 2018, it was 2014 instead.

dboreham · on Aug 22, 2024

User level (aka green) threading is difficult. Very few implementations work (Erlang, Golang, <any others>+) and it takes a great deal of time and effort to get to that "fully working" state.

+Perhaps the new Java fiber stuff works but I don't have enough data to be sure yet.