Python stands to lose its GIL, and gain a lot of speed

aeturnum · on Oct 17, 2021

I worked in Python for years and while I suppose I'm glad for any improvement, I have never understood the obsession with true multi-threading. Languages are about trade-offs and Python, again and again, chooses flexibility over performance. It's a good choice! You can see it in how widely Python is used and the diversity of its applications.

Performance doesn't come from any one quality, but from the holistic goals at each level of the language. I think some of the most frustrating aspects from the history of Python have been when the team lost focus on why and how people used the language (i.e. the 2 -> 3 transition, though I have always loved 3). I hope that this is a sensible optimization and not an over-extension.

klyrs · on Oct 17, 2021

Losing the GIL strictly makes the language strictly more flexible. Previous GILectomies tanked performance to an unacceptable degree. In single-threaded code, this one is a moderate performance improvement in some benchmarks, and a small detriment in others -- which is about as close to perfect as one could expect from such a change. That's why people are excited about it.

At a higher level, Python is getting serious about performance. But this gives both flexibility and performance.

gshulegaard · on Oct 17, 2021

Call me optimistically skeptical. I share similar reservations about GIL obsession with the original comment author, but if this is true:

> The overall effect of this change, and a number of others with it, actually boosts single-threaded performance slightly—by around 10%

Then it sounds like having the cake and eating it too (optimism). Although my experience keeps nagging at me with, "there is not such thing as a free lunch" (skepticism).

comex · on Oct 18, 2021

Perhaps better coverage on LWN:

https://lwn.net/Articles/872869/

The no-GIL version is actually about 8% slower on single-threaded performance than the GIL version, but the author bundled in some unrelated performance improvements that make the no-GIL version overall 10% faster than today's Python.

Ph0X · on Oct 18, 2021

Right, the 20% boost is unrelated to the Gilectomy.

> though, as Guido van Rossum noted, the Python developers could always just take the performance improvements without the concurrency work and be even faster yet.

Why be 10% faster single threaded when you can be 20% faster single threaded!

qeternity · on Oct 18, 2021

This suggests that the unrelated patches improve perf by 20% (0.92 * 1.20 ~= 1.10)

I would love to be proven wrong but I am skeptical.

selcuka · on Oct 18, 2021

That is already explained by the author [1]:

> The resulting interpreter is about 9% faster than the no-GIL proof-of-concept (or ~19% faster than CPython 3.9.0a3). That 9% difference between the “nogil” interpreter and the stripped-down “nogil” interpreter can be thought of as the “cost” of the major GIL-removal changes.

[1] https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...

qeternity · on Oct 18, 2021

Thanks for this.

It’s interesting because I think many people (myself included) would be far more interested in the perf patches than the GILectomy.

guenthert · on Oct 18, 2021

Why? It's not like CPython is a speed daemon. I'd think there are some low hanging fruits, simply because performance is such a low priority for the maintainers. It doesn't even do TCO after all.

vitus · on Oct 18, 2021

Subscriber link from Twitter if you (like me) ran into a paywall:

https://lwn.net/SubscriberLink/872869/0e62bba2db51ec7a/

ignoramous · on Oct 18, 2021

also: https://archive.is/1gLVY

thaumasiotes · on Oct 18, 2021

> Although my experience keeps nagging at me with, "there is not such thing as a free lunch" (skepticism).

Well, yeah, someone had to make the changes. That's the cost that was paid.

You can get a mass-produced machete that is cheaper and higher-quality than a 7th-century sword. It's easy for one thing to be better than another thing across several dimensions simultaneously. That's why certain technologies go out of use -- they have negative value compared to other technologies. But that has nothing to do with the principle that there's no such thing as a free lunch.

gshulegaard · on Oct 18, 2021

I feel like you aren't well informed on why removing the GIL results in a single-threaded performance hit. And while I think it's always nice to keep in mind the developer effort required, it's not the only cost as GIL removal has been done before (several times, even as far back as Python 1.5 [1]).

The crux of the issue (as I understand it) is that the GIL absolves the Python interpreter of downstream memory access control. You can replace the GIL with memory access controls of various strategies, but the overhead of that access control is just that: overhead. In a multi-threaded program the concurrency gains should outweigh that overhead, but in a single-threaded one it's just extra work that wasn't being done before.

Which brings us back to no free lunch. It turns out that the claim "10%" faster without the GIL is actually a result of Gross (GIL removal author) doing a multitude of unrelated performance improvements. These performance improvements increase performance enough that the performance of single-threaded no GIL code (with overhead) is ~10% higher than today. But as Guido pointed out, the core developers could upstream the performance improvements without the GIL removal:

> To be clear, Sam’s basic approach is a bit slower for single-threaded code, and he admits that. But to sweeten the pot he has also applied a bunch of unrelated speedups that make it faster in general, so that overall it’s always a win. But presumably we could upstream the latter easily, separately from the GIL-freeing part. [2]

[1] https://docs.python.org/3/faq/library.html#can-t-we-get-rid-...

[2] https://lwn.net/ml/python-dev/CAP7+vJJ1hzXiyDwVs6-eXed+DtodH...

thaumasiotes · on Oct 18, 2021

> he has also applied a bunch of unrelated speedups that make it faster in general

Tell me how "faster in general" doesn't make you suspicious about a free lunch.

gshulegaard · on Oct 18, 2021

To be explicit, I was skeptical because I believed that GIL removal requires adding overhead for managing memory access. Having dug a bit deeper into it that seems confirmed. The proposed GIL removal strategy _is slower for single-threaded code_ like other solutions before it. It turns out the reported performance increase was the result of orthogonal performance improvements overshadowing the overhead of GIL removal.

Put another way, if the performance improvements were upstreamed without removing the GIL the resulting performance increase would be ~20% instead of just ~10%. Which is what Guido was getting at in the quote I cited. Assuming the benchmarks to be true for the moment, this means that removing the GIL on this PoC branch is a 10% performance hit to single-threaded workloads.

atoav · on Oct 18, 2021

> "there is not such thing as a free lunch" (skepticism).

When you carry a heavy suitcase filled with lead and you drop it, things get lighter for free. You paid for it by carrying the damn thing around with you for the whole time.

xiaodai · on Oct 18, 2021

GIL will improve performance of multi-threaded code but the issue with Python performance is single-threaded code and its rich object system.

Can't see Python getting there unless we go to Python 4 which, given the fiasco that was Python 2->3 is probably never gonna happen.

Might as well wait for Julia to improve its TTFP then to hope for a Python 4.

native_samples · on Oct 18, 2021

Well, CPython probably won't ever get there. But Python as a language maybe could.

The GraalPython implementation of Python 3 is built on the JVM, which is a fully thread safe high performance runtime, and Graal/Truffle provide support for speculation on many things. For pure Python it provides a 5-7x speedup already and the implementation is not really mature. Although at the moment they're working on compatibility, in future it might be possible to speculatively remove GIL locks because you have support for things like forcing JITd code to a safepoint and discarding it, if you want to change the basic semantics of the language.

kaba0 · on Oct 18, 2021

How does it relate to PyPy? I read that the latter uses tracing JIT, while GraalPython builds on truffle’s AST-based one, that basically maps the JVM’s primitive structures to Python’s and thus making use of all the man-hours that went into the JVM’s development.

But last time I checked, pypy had much better performance than Graal, even though TruffleJS (javascript interpreter built on the same model as graalpython) has comparable performance to the v8 engine for long running code. Though the latter is the most actively developed truffle language, let me add that.

salawat · on Oct 18, 2021

This is different from Jython, correct?

native_samples · on Oct 18, 2021

It's sort of taking Jython's implementation approach to a much greater extreme, and bypassing bytecode, so it isn't limited by the Java semantics anymore.

It resolves a few big problems Jython had:

- GraalPython is Python 3, not Python 2

- It can use native extensions that plug into the CPython interpreter like NumPy, SciPy etc. The C code is itself virtualized and compiled by the JVM!

salawat · on Oct 18, 2021

Neat! Now that I've gotta check out!

aeturnum · on Oct 17, 2021

Yah, that's definitely the future I'm hoping for. What I am worried about are the kind of transition issues I mentioned. Python 2 -> 3 strictly made the language more flexible too - but the Python ecosystem is about existing code almost more than the language and I worry that we could find similar problems here. Potential for plenty of growing pains while chasing relatively small gains.

ynik · on Oct 17, 2021

In the company I'm working for, we had to spent more engineer time on GIL workarounds (dealing with the extra complexity caused by multiprocessing, e.g. patching C++ libraries to put all their state into shared memory) than we needed for the Python 2 -> 3 migration. And we've only managed to parallelize less than half of our workload so far.

Even if this will be a major breaking change to Python, it'll be worth it for us.

birdyrooster · on Oct 18, 2021

Python needs to be compiled into machine language to ever have a chance of competing on speed. We can already get around the GIL with multiprocess but Python is still to slow even when not bound by copying memory between processes.

saghm · on Oct 18, 2021

The phrase "competing on speed" begs the question "competing...with what?" If the answer is "machine compiled languages", then yes, it's unlikely Python will ever match their speed without also being compiled to machine code, but there are plenty of other interpreted languages with better performance than Python (even ruling out stuff like Java that technically isn't "compiled into machine language" in the way that phrase usually would mean); lots of work is done on JavaScript interpreters to improve performance, and I don't think that specifically has cost the language much flexibility.

acomjean · on Oct 18, 2021

I use python. I don’t love it but it has a good selection of libraries for what I do. It’s not blazing fast but not terribly slow either.

As for multiprocess, I currently have 150 python process running on the work cluster. Each doing their bit of a large task. The heavy lifting is in a python library but it’s C code. but it’s actually not bad performance wise and frankly wasn’t to bad to code up. I think for my use case threads would make it harder.

Maybe Im liking python more over time..

pjmlp · on Oct 18, 2021

Java is technically compiled into machine language, it is a matter to chose the JDK that offers such options, many people don't, but that is their problem, not lack of options.

JavaScript interpreters that people actually use, have a JIT inbox.

nextaccountic · on Oct 22, 2021

> other interpreted languages

Both Java and Javascript are ultimately compiled into machine code, through JIT. And this matters because Python doesn't have JIT.

Even Ruby, that is historically a way slower language, is gaining JIT nowadays. Python has got no excuses.

Ph0X · on Oct 18, 2021

I don't think the goal is to "compete on speed", but I'm sure people wouldn't complain about their Python scripts running 15x faster on their 16 core CPU.

And it is also about flexibility. What I love about Python is the simplicity, and let's be honest, multiprocess anything but. Especially if you fall into one of the gotchas (unpickable data for example).

birdyrooster · on Oct 18, 2021

Multiprocess is quite easy, have you tried aio_multiprocess?

heavyset_go · on Oct 17, 2021

It's because threads in Python are only really good for parallel I/O, and are ineffective for CPU bound workloads.

This can be a problem for a lot of treading use cases. If I'm working on an ETL app that parses large amounts of data, either the related CPU bound tasks need to either run sequentially, call out to C extensions, or use multiple processes which incurs an overhead.

It's a pain when you know threads would suit your use case well, but the threading implementation in the language you're working in isn't up to the task.

ketozhang · on Oct 18, 2021

It's very interesting you mentioned ETL tasks. In ETL batch jobs, a unit in the batch is defined small enough to rarely be CPU bound; rather, as you mentioned, it is I/O bound. In what situation must you define a unit of work to be so heavily CPU bound? To me that's a smell for too large of a unit.

heavyset_go · on Oct 18, 2021

I'm using ETL as shorthand for a "I wrote a script at home that parses my data and puts it in a database, and threads might save time" situation. I wouldn't reach for a thread pool for anything serious.

Godel_unicode · on Oct 18, 2021

While it would be nice to have the language be able to do it, the fact that you can't today leads to some tidy separation of concerns for parallelization. For instance, many people I've talked to use things like Spark or dask to get high scale on data processing tasks. That means that all of the management of distributed jobs is handled through an easily googleable framework that your ops team can manage, as opposed to needing to build all of that yourself.

I see this as being a nice stopgap solution for those who are too big for single-threaded but not big enough to need Spark.

amelius · on Oct 17, 2021

> Performance doesn't come from any one quality, but from the holistic goals at each level of the language.

It starts to become an issue when you have built a few well-performing subsystems and now want them to run together and interact. With the GIL, your subsystems are suddenly not performing as well anymore. Without the GIL, you can still get good performance (within limits of course).

Performance referring here to throughput and/or latency (responsiveness).

dec0dedab0de · on Oct 17, 2021

I agree, but I don't do anything that can be split up, and would benefit from sharing memory. That is really the only benefit of removing the GIL. Multiprocessing can do true concurrency, and so can Celery, which even allows you to use multiple computers. The only time that is a pain is when you need to share memory, or I guess maybe if you're low on resources and can't spare the overhead from multiple processes.

I think a JIT would be the best possible improvement for CPython as far as speed is concerned. Though I can imagine there are plenty of people doing processor heavy stuff with c extensions that would benefit from sharing memory. So from their perspective removing the GIL would be a better improvement.

So basically a JIT would help every Python program, and removing the GIL would only help a small subset of Python programs. Though I'm just happy I get to make a living using Python.

Edit: This was in the back of my head, but I didn't mention it, and it would be unfair to dismiss. A JIT does slowdown startup time, so for short programs that are finished running quickly it may make things worse. Though I suspect it would be easy enough to have a value to turn off the JIT at the start of the program.

lelandbatey · on Oct 18, 2021

Pythons existing threading support (via the threading module) can already do true concurrency just fine. Concurrency and parallelism are not the same thing. The GIL limits parallelism, making separate OS threads operate concurrently but not in parallel. Removing the GIL will allow threads in python programs to operate concurrently and in parallel.

https://go.dev/blog/waza-talk

dec0dedab0de · on Oct 18, 2021

Thank you for pointing this out, I didn't realize there was a difference until just now. If anyone else is confused this stack overflow helped.

https://stackoverflow.com/questions/1050222/what-is-the-diff...

phkahler · on Oct 17, 2021

>> So basically a JIT would help every Python program, and removing the GIL would only help a small subset of Python programs.

What if the "Global Interpreter Lock" needs to be removed for JIT? I put that in quotes to highlight it because AFAICT no compiled (or JITed) language has such a thing. I think it functions differently than regular stuff like critical sections.

dec0dedab0de · on Oct 17, 2021

What if the "Global Interpreter Lock" needs to be removed for JIT?

It doesn't. PyPy has a JIT and a GIL. The JIT just compiles the byte code to native code before running, along with some tricks that are over my head.

Edit: Here is PyPy's FAQ about it

https://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-gil...

native_samples · on Oct 18, 2021

High performance JIT compiling VMs don't use a GIL, they use a different trick called safe points.

The compiled code polls a global or per-thread variable as it runs (but in a very optimized way). When one thread tries to change something that might break another thread, the other threads are brought to a clean halt at specific points in the program where the full state of the abstract interpreter can be reconstructed from the stack and register state. Then the thread stacks are rewritten to force the thread back into the interpreter and the compiled code is deallocated.

The result is that if you need to change something that is in practice only changed very rarely, instead of constantly locking/unlocking a global lock (very, very slow) you replace it with a polling operation (can be very fast as the CPU will execute it speculatively).

However, this requires a lot of very sophisticated low level virtual machinery. The JVM has it. V8 has it. CLR has a limited form of it. Maybe PyPy does, I'm not sure? Most other runtimes do not. For the Python community, very likely the best way to upgrade performance would be to start treating CPython as stable/legacy, then support and encourage efforts like GraalPython. That way the community can re-use all the effort put into the JVM.

xorcist · on Oct 18, 2021

PyPy can utilitize something called software transactional memory to the same effect.

This gives you a unusually fast Python that is also GIL-less. It doesn't seem to be used much, so there may be some compatibility problems or similar, but for a trivial test it worked just as described many years ago.

It also tells me that the GIL isn't terribly important for most things Python is used for. It certainly isn't for me.

heavyset_go · on Oct 18, 2021

Last time I looked, Ruby had a GIL and JIT.

cma · on Oct 17, 2021

There are 64-core, 128-thread prosumer CPUs now and it is only going to go higher. At some point it just becomes necessary.

aeturnum · on Oct 17, 2021

Yes higher core counts are more and more common, but the language has thirty years of single-threaded path-dependence. Lots of elements of it work the way they do because there was a GIL. I could be wrong, but I am skeptical that Python will ever be the best choice for high performance code. It's always worth improving the speed of code when you can, but more often than not you "get" something for going slower. I hope my worries are wrong and this is actually a free win!

turminal · on Oct 17, 2021

What does a 128 thread python app do better than 128 single threaded ones?

heinrichhartman · on Oct 17, 2021

No shared memory. To communicate between processes you usually use sockets, to communicate between threads you mutate variables. This is a huge performance difference.

fulafel · on Oct 18, 2021

You can have (non transparent, but supporting the buffer interface, enough for eg numpy arrays) shared memory with the current multiprocessing stuff: https://docs.python.org/3/library/multiprocessing.shared_mem...

A tangent but I find it amusing to contrast the perpetual Python GIL debate with all the new computation platforms that claim to be focused on scalability. Those are mostly single threaded or max out at a few virtual CPUs (eg "serverless" platforms) and there people applaud it. There people view the isolation as supporting scalability.

jhoechtl · on Oct 17, 2021

OS overhead of 128 processes is higher than scheduling 128 tasks. Varies from os to os, but it's especially bad on Windows.

turminal · on Oct 17, 2021

Yeah, I know about that argument but it just doesn't make sense to me. Removing the GIL means that 1) you make your language runtime more complex and 2) you make your app more complex.

Is it truly worth it just to avoid some memory overhead? Or is there some other windows specific thing that I'm missing here?

dragonwriter · on Oct 17, 2021

> Yeah, I know about that argument but it just doesn't make sense to me. Removing the GIL means that 1) you make your language runtime more complex and 2) you make your app more complex.

#2 need not be true; e.g., the approach proposed here is transparent to most Python code and even minimized impact on C extensions, still exposing the same GIL hook functions which C code would use in the same circumstances, though it has slightly different effect.

lmm · on Oct 18, 2021

It doesn't have to be. On Linux 2.4 (pre-NPTL) processes and threads were represented exactly the same way.

gypsyharlot · on Oct 17, 2021

Shared L3 cache.

semi-extrinsic · on Oct 17, 2021

Well actually, on the types of CPUs that OP refers to (128 threads i.e. AMD Threadripper), L3 cache is only shared within each pair of CCXs that form a CCD. If you launch a program with 32 threads, they may have 1, 2, 3 or 4 distinct L3 caches to work with.

Moreover, unless thread pinning is enforced, a given thread will bounce around between different cores during execution, so the number of distinct L3 caches in action will not be constant.

Of course you have the same story with memory, accessing another thread's memory is slower if that thread is on another CCD.

TL;DR NUMA makes life hard if you want to get consistent performance from parallelism.

Redoubts · on Oct 17, 2021

marshal data

cma · on Oct 17, 2021

That's slower than just doing it single threaded for many use cases.

stjohnswarts · on Oct 18, 2021

I mean is there anything here preventing one from only writing their code to be single threaded tho? This is an addition to the capability and not a detraction.

cma · on Oct 18, 2021

I think you are replying to the wrong post, I'm not making that argument.

turminal · on Oct 17, 2021

Care to elaborate? What does that change for an average webapp?

yuliyp · on Oct 17, 2021

Say your webapp talks to a database or a cache. It'd be really nice if you could use a single connection to that database instead of 64 connections. Or if you wanted to cache some things on the web server, it would be nice if you could have 1 copy easily accessible vs needing 64 copies and needing to fill those caches 64x as much.

semiquaver · on Oct 17, 2021

Unfortunately using a single db/RPC connection for many active threads is not done in any multithreaded system I’m aware of for good reasons. Sharing this type of resource across threads is not safe without expensive and performance-destroying mutexes. In practice each thread needs exclusive access to its own database connection while it is active. This is normally achieved using connection pooling which can save a few connections when some threads are idle, but 1 connection for 64 active web worker threads is not a recipe for a performant web app. If you can point to a multithreaded web app server that works this way I’d be very interested to hear about it.

The idea of a process-local cache (or other data) shared among all worker threads is a different story. Along with reduced memory consumption, I see this as one of the bigger advantages of threaded app servers. However, preforking multiprocess servers can always use shmget(2) to share memory directly with a bit more work.

paulmd · on Oct 18, 2021

> Unfortunately using a single db/RPC connection for many active threads is not done in any multithreaded system I’m aware of for good reasons. Sharing this type of resource across threads is not safe without expensive and performance-destroying mutexes

lol, you're so deep into python stockholm-syndrome "don't share anything between threads because we don't support that at all even a little bit" that you don't even realize that connection pools exist. Instead of holding a connection open per process, you can have one connection pool with 30 connections that services 200 threads (exact ratio depends on how many are actually using connections, of course). literally everybody "shares a single DB/RPC connection across multiple threads" (or at least shares a number of connections across a number of threads), except python.

and yeah you can turn that into yet another standalone service that you gotta deliver in your docker-compose setup, but everybody else just builds that into the application itself.

yellowapple · on Oct 18, 2021

> that you don't even realize that connection pools exist

The GP mentions connection pooling literally three sentences later.

> literally everybody "shares a single DB/RPC connection across multiple threads" (or at least shares a number of connections across a number of threads), except python.

Right, but multiple ≠ many. You're discussing the former. GP is discussing the latter.

yuliyp · on Oct 18, 2021

Depending on the structure, it can indeed be many. Both in the case of protocols which support multiplexing of requests, and in situations where you have multiple databases (thus a given thread might not need to be talking to a particular database all the time).

cm2187 · on Oct 17, 2021

Popularity has probably as much to do, if not more to do, with ease of access (or lack of alternative) than good design of the language. Php is equally if not more popular than python.

aeturnum · on Oct 17, 2021

I'm not a PHP expert, but I did not know it was also used in data science, game programming, embedded programming and machine learning as Python is. Of course they are both used for web services.

tored · on Oct 17, 2021

Funny thing is that PHP does not have GIL, thus it would perform better on each of those things that you list.

antod · on Oct 18, 2021

I thought the reason PHP doesn't have a GIL, is that out of the box, it doesn't even have threads at all?

Is my PHP knowledge out of date? Is the this the current state the art: ? https://www.php.net/manual/en/parallel.setup.php

tored · on Oct 18, 2021

PHP doesn’t ship with an API for creating threads, but PHP can be executed in threads depending on setup. And it does that without using a GIL, instead it internally uses something called Thread-Safe Resource Manager

https://github.com/php/php-src/tree/master/TSRM

I have written about this recently

https://news.ycombinator.com/item?id=28692014

lioeters · on Oct 18, 2021

I don't know much about it, but I've heard here and there about Swoole, a "PHP extension for Async IO, Coroutines and Fibers".

> Swoole is a complete PHP async solution that has built-in support for async programming via fibers/coroutines, a range of multi-threaded I/O modules (HTTP Server, WebSockets, TaskWorkers, Process Pools) and support for popular PHP clients like PDO for MySQL, Redis and CURL.

https://www.swoole.co.uk/

tester756 · on Oct 17, 2021

>again and again, chooses flexibility over performance. It's a good choice! You can see it in how widely Python is used and the diversity of its applications.

What does it mean? how is python different here than Java/C#?

aeturnum · on Oct 17, 2021

I mean, you can modify Python code at runtime if you like. This has a good overview of all the nonsense happening under the hood: http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/

xmcqdpt2 · on Oct 18, 2021

This is my main issue with python. The whole GIL thing is basically necessary because of Python's heavily dynamic model, which is almost always used improperly.

Mypy and other static analysis tools are becoming more common in part because IMO they basically require you to stop and think about your "Pythonic" dynamic patterns (containers of mixed element types, duck typing of function arguments, mutable OOP etc.), and often realize that they are a bad idea.

So in some way we are hampering multithreading to support programming constructs that are mostly used to make python flavoured spaghetti, especially in the hands of beginners and non-programmers who are encouraged to learn Python...

I'm not sure Python is fixable at this point. Oh well.

karmasimida · on Oct 18, 2021

Removing GIL brings flexibility too.

Assuming now you have a big dictionary, which contains tens of millions of records to be filtered upon, another say TB of data. In current Python land, you either:

1. Using multi-processing, but each process needs to create its own dictionary

2. Or create an external DB, and using the DB's client to retrieve data in certain way.

This pattern has occurred again and again in my use case, and it is always messy to solve in Python. Had Python has true multi-threading, then sharing some big but read-only object among real threads would be a possibility, and believe me, a lot of people would be really happy.

didip · on Oct 17, 2021

This is because Python, by luck, ended up dominating the data science market.

In this market you really want to shuffle tons of data quickly, and that’s usually achieved through parallelism.

Python multiprocess library does a poor job at that.

anthk · on Oct 17, 2021

That's calling C and Fortran in the background, actually.

Mehdi2277 · on Oct 18, 2021

That’s a workaround that is not as ergonomic as writing python more directly. The person working on this Gil project is one of the major maintainers of a key ml library. They have used c/c++ binding approach but want to make life easier and be able to multi thread directly in python.

mechanical_bear · on Oct 18, 2021

By luck? I think that is a bit reductionist.

sillysaurusx · on Oct 18, 2021

It wasn't entirely Python's inborn merits. For example, we don't use the best language for each task today. Instead, we use whatever language our coworkers and other companies use, which we often convince ourselves is the best language for a task.

The timing was critical, and although Python may be the Facebook of languages, one can't discount that it was extraordinarily lucky for Guido to be in the precise time and place to capitalize on good design choices.

senderista · on Oct 18, 2021

Ditto with Ruby and web dev. The language was never designed with that application in mind (and very few people used it for that when I got into Ruby around 2000). Path dependence mostly accounts for the ubiquity of Python in science and Ruby in web dev; it just as easily could have gone the other way around.

zone411 · on Oct 18, 2021

Note that Python's poor single-threaded performance compared to single-threaded performance of other languages makes the ability to multi-thread that much more crucial. You can sometimes get away with 10x slower code but not 100x slower.

I've had to rewrite my Python code in another language in 3 different projects already (multi-processing wasn't an option) and I'm not even a heavy user. Removing GIL would be very welcome.

dragonwriter · on Oct 18, 2021

> You can sometimes get away with 10x slower code but not 100x slower.

That Python and other languages in its speed class are still used for new projects in production demonstrates that you can, often, get away with 100× slower code. But, sure, you can get away with 10× slower more often.

lmilcin · on Oct 18, 2021

The issue is people try to get throughput from Python and GIL stands in the way.

A lot of people learned Python and now want to use it for something more serious even if Python was not designed for it.

I work on high volume backends (like million TPS on a single node) and Python for me is just a toy language.

But I see no reason to purposefully inconvenience these people by keeping GIL if it can be removed.

Fiahil · on Oct 18, 2021

Think about the datascience use case: you need to load data from disk or network as fast as possible and compute a lot of CPU-bound operations right after that.

Threads will allow you to split your I/O in multiple procedures, so you might start computations as soon as possible when the data is ready. They will also allow you to massively speed up aggregates without having to create a new process each time (which doesn't allow you to share memory). Threads are a BIG issue when you don't want to rely on asyncio[1].

Note [1]: Asyncio pools are single-threaded because of the GIL. This is already bad enough in practice, but they also perform very badly in CPU-bound contexts. This make them an absolute no-go when dealing with datascience code: a cpu-bounded core in an i/o-bounded wrapper.

alanfranz · on Oct 18, 2021

You can release the GIL in extensions, and a lot of datascience libraries do that, so not all python computations are single threaded.

Fiahil · on Oct 18, 2021

Support is partial and uneven. Moreover, you don't generally use a single datascience library, it's often a mix of pandas, numpy, scikit-learn, custom algorithms, custom optimization suite, and arrow. Letting individual library release the GIL is nice, but you need deep knowledge of those library to know what computation are thread-able or not.

xmcqdpt2 · on Oct 18, 2021

In practice, for single-machine workloads, it's currently mostly numpy and/or whatever deep learning framework you use that does the number crunching.

This means that provided the code is operating on sufficiently large amounts of data (such that calls to numpy are each of sufficient duration), the multithreading in BLAS / Lapack within numpy usually give you weak scaling wrt to thread count without any tricks.

The issue however is that this require by hand making everything into structs of arrays from arrays of structs, removing as many iterations from python as possible, potentially balancing thread usage within python and within numpy etc. By this point IMO your "python" code looks more like Fortran or SQL with better string IO...

Fiahil · on Oct 19, 2021

The number crunching part is already fast enough, however the aggregations, parsing and filtering that come beforehand is really really slow. This part is often done in pure python because it's often custom code tailored to the data you're manipulating.

We're not interested in scaling up the parts that are already fast, but the rather mundane, uninteresting work that come before.

darthrupert · on Oct 17, 2021

Lack of multithreading can easily be a win for a language. A tiny subset of problems really needs it these days and for everything else it's a potential way to either screw things up or make them way more complicated that needs to be.

cm2187 · on Oct 17, 2021

> A tiny subset of problems

like processing web requests?

darthrupert · on Oct 18, 2021

No, that doesn't require multithreading at all.

dkersten · on Oct 18, 2021

Doesn't require, sure. Nothing "requires" multithreading. It may benefit from it, though, since threads are lower overhead (context switching and memory) than processes. If you have any shared data, then that too may be a benefit (but I guess your point is that most web requests don't share data).

cm2187 · on Oct 18, 2021

Also the smaller the memory footprint, the more you live in the CPU cache.

dkersten · on Oct 19, 2021

True.

A lot of people will argue that its not worth optimising for, that programmer time is more important and expensive. This may be true at the start, and is probably even worthwhile when you need to get a product out fast to test the waters in a startup, but I've worked in multiple companies that have spent significant developer time to try to reduce their cloud infrastructure costs. At this point, having your language and famework be able to make good use of the available hardware really can make a real difference to the overall performance and therefore cost, both hardware cost and engineering time to optimise it later.

The productivity gap between languages that optimise for development and those that try to at least somewhat optimise for runtime really isn't that large anymore nowadays. Even modern Java is quite productive now, compared to 10 or 15 year ago. Outside of a startup trying to find product market fit building their MVP's super fast to see what works, I think its usually worth spending a bit more on up front development time, a one off cost, to reduce the recurring infrastructure cost.

Of course, it takes a lot more than just a language that supports multithreading to do this, but everything the language and the libraries/frameworks you use do to help you is helpful. I'd rather have a tool that won't get in my way later, that gives me lots of room to grow when performance starts to become an issue, than one where I need to invest in significant and painful development time (based on personal experiences at least) later. This is one area where Go seems to shine, perhaps Rust too, although I have not tried any web backend dev in Rust yet so don't know how productive it would be.

emrah · on Oct 17, 2021

So if you don't like it or want it, don't use it then? Why does it have to be missing altogether for you to be happy?

darthrupert · on Oct 18, 2021

If it's done without causing complications then sure. I'm highly skeptical that it can be, as no other language ever has been able to do that.

Rust got it pretty good, but they designed a significant part of their language so that multithreading could be done well. Python did almost the opposite.

heavenlyblue · on Oct 18, 2021

Python uses a lot of memory, multi-threading makes you have only one process sharing all the same memory.

emerged · on Oct 17, 2021

From my perspective as a huge Python fan, efficient multithreading is simply the only major thing missing from the language. I would still use C/C++/assembly for bleeding edge performance needs, but efficient multithreading in Python would have me reaching for alternatives far less often.

Basically I love peanut butter ice cream (Python) I’d just like it even more with sprinkles.

m0zg · on Oct 17, 2021

One does not preclude another: the language can be flexible and offer higher concurrency that it does now. My workstation has 64 hyperthreads. Python can use one at a time. That's messed up since I use it as a general purpose language.

dleslie · on Oct 17, 2021

I wonder how many folks think they're writing thread safe software with ease, and are unaware that they are leaning on the GIL?

Could be that the impact of this change is far broader than just a few key libraries.

avianlyric · on Oct 17, 2021

I don’t see how the GIL makes writing thread safe software any easier. The GIL might prevent two Python threads executing simultaneously, but it doesn’t change the fact that a Python thread can be preempted, meaning your global state can change at any point during execution without warning.

Most of the issues with multi-threading come from concurrency, not parallelism. The GIL allows concurrency, you just don’t get any of the advantages of parallelism, which is normally the reason for putting up with the complexity concurrency creates.

chacham15 · on Oct 17, 2021

There are certain classes of errors that it prevents. E.g:

Thread1: a = 0xFFFFFFFF00000000

Thread2: a = 0x00000000FFFFFFFF

One might think that the two possible values of a if those are run concurrently are 0xFFFFFFFF00000000 and 0x00000000FFFFFFFF. But actually 0x0000000000000000 and 0xFFFFFFFFFFFFFFFF are also possible because the load itself isnt atomic.

The GIL (AFAICT) will prevent the latter two possibilities.

adrian_b · on Oct 17, 2021

Most CPUs guarantee that aligned loads and stores up to the register size, i.e. now usually up to 64-bit, are atomic.

The compilers also take care to align most variables.

So while your scenario is not impossible, it would take some effort to force "a" to be not aligned, e.g. by being a member in a structure with inefficient layout.

Normally in a multithreaded program all shared variables should be aligned, which would guarantee atomic loads and stores.

knorker · on Oct 17, 2021

Well, thread safety is exactly about these cases of "well, it's hardly ever a problem".

Real life bugs have come from misapplication of correct parameters for memory barriers, even on x86. Python GIL removes a whole class of potential errors.

Not that I'm against getting rid of the GIL, but I'm more sceptical that it won't trigger bugs.

Though in my opinion python just isn't a good language for large programs for other reasons. But it'd be nice to be able to multithread some 50 line scripts.

Thorrez · on Oct 18, 2021

Python's integers are arbitrary precision. If large enough, that certainly won't be atomic on any normal CPU. I'm not sure how the arbitrary precision integers work internally, but it's possible they wouldn't be atomic for any value.

vvanders · on Oct 17, 2021

"Most" ends up being surprising(ex: ARM has a fairly weak memory model). I've seen a lot of code with aligned access and extensive use of "volatile" in MSVC/x86 explode once it was ported to other architectures.

adrian_b · on Oct 18, 2021

Older ARM CPUs did not have a well defined memory model, but all 64-bit ARM CPUs have a well defined behavior, which includes the atomicity of any loads and stores whose size is up to 64-bit and which are aligned, i.e. the same as Intel/AMD CPUs.

The current ARM memory model is more relaxed regarding the ordering of loads and stores, but not regarding the atomicity of single loads and stores.

ignoramous · on Oct 17, 2021

Related: Symmetric Multi-Processor primer for Android, https://developer.android.com/training/articles/smp (although for Android/ARM, it makes for a pretty good read on the topic).

intrepidhero · on Oct 17, 2021

But if you're writing to the same object from two different threads you're going to have undefined behavior regardless of the GIL, yes?

hexane360 · on Oct 17, 2021

It depends what you mean by 'undefined behavior'. The GIL makes operations atomic on the bytecode instruction level. Critically, this includes loading and storing of objects, meaning that refcounting is atomic. However, this doesn't extend to most other operations, which generally need to pull an object onto the stack, manipulate it, and store it back in separate opcodes.

So with Python concurrency, you can get unpredictable behavior (such as two threads losing values when incrementing a counter), but not undefined behavior in the C sense, such as use-after-free.

fulafel · on Oct 17, 2021

No. The L in GIL stands for lock. So only the thread that holds it can write or read from the object, and the behavior is well defined at the C level, because C lock acquire and release operations are defined to be memory barriers.

dkersten · on Oct 17, 2021

But when each thread reads the variable, you have no control over which value you see, since you don't control when each thread gets to run. So its undefined in the sense that you don't know which values you will get: a thread might get the value it wrote, or the value the other thread wrote. The threads might not get the same value either.

The GIL exists to protect the interpreters internal data, not your applications data. If you access mutable data from more than one thread, you still need to your own synchronisation.

Thorrez · on Oct 18, 2021

How is the GIL different from atomics in other languages? There are many cases where atomics are useful.

One example would be incrementing a counter for statistics purposes. If the counter is atomic, and the reader of the value is ok with a slightly out of date value, it's fine. If code is doing this in GIL Python, it's working now, and will break after the GIL is removed.

dkersten · on Oct 18, 2021

> If the counter is atomic

if

I know you came to the same conclusion in another comments, but here's a look at it using the `dis` module:

    a += 1

turns into

    LOAD_FAST 1    (loads b)
    LOAD_CONST 1   (loads the constant 1)
    INPLACE_ADD    (perform the addition)
    STORE_FAST 1   (store back into b)

So if the interpreter switches threads between LOAD_CONST and STORE_FAST (so either before or after the INPLACE_ADD), you could clobber the value another thread wrote to `b`.

From your other comment:

> so even though an increment on a loaded int is atomic, loading it and storing it aren't

Its always the loads and stores that are the problem.

But that's the problem with relying on the GIL: it has lock in the name, but it does not protect you, unless you understand the internals and know what you're doing. It protects the interpreter. This isn't much different from programming in other languages without a GIL: if you understand the internals what you're doing you may or may not need locks, because you will know what is and isn't atomic (and even when things are atomic, its still difficult to write thread-safe code! lock-free algorithms are much harder than using mutexes).

Thread safe code requires thinking hard about your code, the GIL does not protect you from that.

dkersten · on Oct 18, 2021

Too late to edit correction: between LOAD_FAST (not LOAD_CONST) and STORE_FAST

fulafel · on Oct 18, 2021

Numbers are immutable and incrementing one eg in an attribute is actuay many byte code ops. So doesn't work even currently unless you are fine with losing updates. But a version of this question using another example (eg using a list as a queue) is interesting.

Thorrez · on Oct 18, 2021

Hah, we both came to the same conclusion at the same time.

Regarding list, it sounds like it might actually keep working atomically without the GIL:

> A lot of work has gone into the list and dict implementations to make them thread-safe. And so on.

fulafel · on Oct 18, 2021

Yep. So in a final answer to the original question of backwards bug-compatibility (https://news.ycombinator.com/item?id=28897534), it seems that it will be retained under the current proposal.

detaro · on Oct 18, 2021

What atomic counter increment exists currently and would be broken by this proposal?

Thorrez · on Oct 18, 2021

Hmm, so even though an increment on a loaded int is atomic, loading it and storing it aren't.

https://stackoverflow.com/a/1717514

The Python documentation seems misleading to me on this:

> In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that “look atomic” really are.

    count = 0
    def inc():
      count += 1

sure "looks atomic" to me, so according to the documentation should be, but isn't.

On the other hand, I think you could build a horribly inefficient actual atomic counter with

    count = []
    def inc():
      count.append(None)
    def get_count():
      return len(count)

I think it's quite likely there's correct and non-horrible code relying on list append and length being atomic. Although it sounds like this might continue working without the GIL:

> A lot of work has gone into the list and dict implementations to make them thread-safe. And so on.

So removing the GIL might not be a problem.

detaro · on Oct 18, 2021

Indeed, the proposal explicitly covers maintaining existing guarantees, which is why I was confused that so many people just assumed it would break them.

dragonwriter · on Oct 18, 2021

> How is the GIL different from atomics in other languages?

Atomics are isolated to where they are needed, the GIL is global though C (but not Python) code can release it.

Thorrez · on Oct 18, 2021

My question was about from a program correctness standpoint, not an efficiency standpoint. But it does look there is a difference from a correctness standpoint, as other comments address.

fulafel · on Oct 18, 2021

I agree that in principle the GIL is there for the interpreter, but in practice in globally ensuring all Python objects are in a coherent and safe state for each running thread, it makes many things quite thread-safe. For example you could use a normal list as a fifo queue, if you were to rely on the current behaviour.

dkersten · on Oct 19, 2021

Sure, but you really need to understand the internals to be able to make these assumptions.

For instance, in your example, I know that the call to the C code to do the list operations is atomic (a single bytecode instruction), but I can't assume that all such calls to C code are safe because of this, unless I know for sure that the C code doesn't itself release the GIL. I assume that simple calls like list append/pop wouldn't have any reason to do this, but I can't assume this for any given function/method call that delegates to C, since some calls do release the GIL.

So, with or without GIL, you either really need to understand what's going on under the hood so you can avoid using locks in your code (GIL or atomics-based lock-free programming), or you use locks (mutex, semaphore, condition variables etc). No matter what you do, to write thread safe programs, you need to understand what you're doing and how things are synchronizing and operating. The GIL doesn't remove that need.

Of course, removing the GIL removes one possible implementation option, I just don't believe the GIL really makes it any easier. Once you know enough internals to know what is and isn't safe with the GIL, you could just as easily do your own syncrhonization.

ajkjk · on Oct 17, 2021

It prevents classes of errors, such as, as the parent mentioned, non-atomic writes to individual variables.

NovemberWhiskey · on Oct 17, 2021

Not really. If you're doing an atomic write to the same object from two different threads, you're going to have one win the race and the other lose. That may be a bug in your code, but it's not undefined behavior at the language level.

heavenlyblue · on Oct 18, 2021

Python doesn't care about these errors (because python's integers are not as simple as 64-bit operations), however if you generalise this error to a "transaction" of two operations (registers in this case), you'd end up with the same ability to see a view of a state that should not be seen by another thread.

snek_case · on Oct 17, 2021

AFAIK CPUs implement atomic load and store instructions and the performance overhead of these is very small compared to something like a software busy lock. So I think it's quite possible to take away the GIL while still making it impossible to load only half of a value.

dleslie · on Oct 17, 2021

> The GIL might prevent two Python threads executing simultaneously, but it doesn’t change the fact that a Python thread can be preempted, meaning your global state can change at any point during execution without warning.

That thread behavior is enough to reduce the likelihood of races and collisions; particularly if the critical sections are narrow.

avianlyric · on Oct 17, 2021

That just means the GIL is good at hiding concurrency bugs. It doesn’t make writing correct code any easier. Arguably you could say it makes writing correct concurrent code harder, because it’ll take significantly longer for concurrency bugs cause errors.

laserlight · on Oct 17, 2021

I wouldn’t call it thread-safe when race conditions are possible.

jldugger · on Oct 17, 2021

Then we need a term for when code race conditions are possible but rare enough that nobody using the software notices. thread-timebomb?

nuerow · on Oct 17, 2021

> Then we need a term for when code race conditions are possible but rare enough that nobody using the software notices. thread-timebomb?

There's already a term for that: not thread-safe.

The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occurr. It only assesses whether a specific class of problems is eliminated or not.

jldugger · on Oct 17, 2021

>The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occur.

Well, obviously.

The challenge I am putting forth on HN is to meaningfully describe _usable_ thread-unsafe software. If you've spent enough time outside university, you'll be aware that there are all kinds of theoretical race conditions that are not triggered in practical use.

klyrs · on Oct 17, 2021

If you've worked at industrial scale, you'll be aware that even the most theoretical-seeming race condition will be triggered frequently.

The_Colonel · on Oct 17, 2021

That reminds me how I was called to fix some Java service, which was successfully in production for 10 years with hardly any incident, but it suddenly started crashing hard, all the time. It was of course a thread safety issue (concurrent non-synchronized access to hashmap) which laid dormant for 10 years only to wreak havoc later.

Nothing obvious changed (it was still running a decade old JRE), perhaps it was a kernel security patch, perhaps a RAM was replaced or even just the runtime data increased/changed in some way which woke up this monster.

jldugger · on Oct 18, 2021

> If you've worked at industrial scale,

Fun fact, I actually do! It's from that perspective I wrote that: every time you perturb the software environment, a new set of bugs that didn't happen in the old env before arises.

dkersten · on Oct 17, 2021

That's not useful. If you have a race condition, you will eventually hit it and when you do, you may get incorrect results or corrupt data. Thread unsafe is thread unsafe, regardless how rare it appears to be.

Also, rare on one computer (or today's computer) might not be rare on another (tomorrows faster one for example).

These types of bugs are also very hard to detect. You might not know your data is corrupted. Reminds me of how bad calculations in excel has cost companies billions of dollars, except now, the calculations could be "correct" and the error sitting dormant, just waiting for the right timings to happen. Much better to not make assumptions about the safety and think about it up front: if you are using multiple threads, you need to carefully consider your thread safety.

phkahler · on Oct 18, 2021

>> then we need a term for when code race conditions are possible but rare enough

There is no such thing as "rare enough". Random or probabilistic bugs are one of the worst things software can have.

lazide · on Oct 18, 2021

While I agree - plenty of CIOs and managers vote the opposite with their wallets.

Why not just set it to auto reboot every week, that seems to fix it - right?

phkahler · on Oct 18, 2021

MCAS only made the planes crash sometimes and then only when a sensor failed

formerly_proven · on Oct 17, 2021

Heisenbug.

Brian_K_White · on Oct 17, 2021

There is no such thing as "likely". A thing is either possible or not possible.

kingofpandora · on Oct 17, 2021

Likelihood refers to probability not possibility.

Brian_K_White · on Oct 17, 2021

There is no such thing as probability. All there is is possible and not possible.

I don't know how the point of the comment could be missed, but what I am saying is, it is a mistake, a rookie baby not-a-programmer not even any kind of engineer in any field, to even think in those sorts of terms at all. At least not in the platonic ideal worlds of math or code or protocol or systems design or legal documents, etc.

Physical events have probability that is unavoidable. How fast does the gas burn? "Probably this fast"

There is no excuse for any coder to even utter the word "likely".

The ONLY answers to "Is this operation atomic?" or "Is this function correct?" or "Does this cpu perform division correctly?" Is either yes or no. There is no freaking "Most of the time."

"Likely" only exists in the realm of user data and where it is explicitly created as part of an algorythm.

yholio · on Oct 17, 2021

There are whole branches of computer science and IT dedicated to reducing the likelihood of unpleasant outcomes: cryptography, security, disaster recovery etc.

You cannot guarantee your public key algorithm is impossible to break, but you can use keys long enough that an attacker has an arbitrarily low chance of succes with the best known methods.

You cannot prove your program is bug free, outside of highly specialized fields like aircraft control, but you can build a multi-layered architecture that can reduce the likelihood of successful intrusion. You cannot prevent a EMP bomb from wiping all your hard-drives at once, but you will likely maintain integrity of your database for uncorrelated hardware errors.

"Likely" is a tool that works in the real world. If you will chase mathematic certainty, your competition will likely eat your lunch.

Where you might be correct is that "unlikely" is very close to "likely" in the particular topic of thread safety, you just need a sufficiently large userbase with workloads and environments sufficiently different from your test setup.

Brian_K_White · on Oct 18, 2021

I have already eaten my competitions lunch through not being afraid of a little rigor, and not leaving a wake of shit that only works on good days behind me.

stefan_ · on Oct 17, 2021

From running the same software on two moderately powerful embedded systems, one single-core and one multi-core, the latter is a lot more reliable in immediately exposing races and concurrency issues.

dehrmann · on Oct 17, 2021

It's more cpython than the GIL, but it lets you get away with using += and certain dict operations without locks.

hexane360 · on Oct 17, 2021

Is this true? It looks like += compiles to four bytecode instructions: two loads, an increment, and a store. It should be possible for a thread to get paused after the load but before the store, resulting in a stale read and lost write.

Some more discussion here: https://stackoverflow.com/questions/1717393/is-the-operator-...

pansa2 · on Oct 17, 2021

With the GIL, for an int i, `i += 1` is not thread-safe, but IIRC for a list l, `l += [1]` (i.e. extend) is.

Presumably this patch changes the list implementation in some way so that the extend operation remains thread-safe without the GIL.

dehrmann · on Oct 17, 2021

Maybe it's just certain collection operations, then.

detaro · on Oct 18, 2021

And the proposal clearly addresses adjusting those to maintain the expected guarantees.

jnxx · on Oct 18, 2021

I have seen that once in supposedly thread-safe C++ software in an industrial automation application. What happened was that it was somehow relying on the Windows UI messaging system (the message pump) which is single-threaded, and the fact that there were no multi-core CPUs when that stuff was written a long time ago. The latter has the effect that the CPU always sees a consistent view on the memory, no locks, mutexes and barriers needed.

Porting the thing to a multi-core system revealed that there were a lot of nasty concurrency bugs, like dead-locks or crashes which happened after a day of operation. And this wasn't a toy system - it was in use for a long time in an industrial application, and the customer was not too happy about the intermittent dead-locks. I commiserated with the poor engineer who had the quite stressful task to debug this, equipped with a lot of dedication but an insufficient background.

Frankly, while it would be nice to be able to write parallel code in pure Python, I think that Clojure with its purely-functional approach has the better concepts for this. And moreover, actually improving performance by parallel computation (using several CPUs to work in parallel on the same thing) is damn hard and unsolved in many cases (just come up with an efficient parallel Fast Fourier Transform and you might get a Turing award). What is mostly needed (outside of massive data processing pipelines) is concurrency for event-driven systems. Python can handle that, Clojure does handle it in a much more elegant way.

hyperbovine · on Oct 17, 2021

Conversely, I have found that the GIL makes it unexpectedly easy to write thread-safe software in Python. Compare (in Cython) writing

   with gil:
       call_a_method()
       print(some_debugging_info)

with all the sit-ups you’d have to do in a “real” concurrent language.

ynik · on Oct 17, 2021

The GIL doesn't really help Python code though, because the interpreter may switch threads between any two opcodes.

It only protects the state of the Python interpreter and that of C/Cython extension modules. Though even there, you can have unexpected thread switches, e.g. in Cython `self.obj = None` can result in a thread switch if the value previously stored in `self.obj` had a `__del__` method implemented in Python.

And AFAIK pretty much any Python object allocation can trigger the cycle collector which can trigger `__del__` on (completely unrelated) objects in reference cycles, so it's pretty much impossible to rely on the GIL to keep any non-trivial code block atomic.

rectang · on Oct 17, 2021

I have read many anti-GIL arguments over the years that approach soundness as optional. Is this change going to make a bunch of previously sound code unsound?

np_tedious · on Oct 17, 2021

> These changes are major enough that a fair number of existing Python libraries that work directly with Python’s internals (e.g., Cython) would need to be rewritten. But the cadence of Python’s release schedule just means such breaking changes would need to be made in a major point release instead of a minor one.

Maybe time to rethink? https://www.techrepublic.com/article/programming-languages-w...

If this is as promising as it sounds, it seems Python 4 now has its "thing" and is on the horizon. Or at least may become a serious thing to talk about

gwking · on Oct 17, 2021

I began using python during the python3.0 betas, and I watched the 2 vs 3 saga from the (unusual?) perspective of a v3 hobbyist with no back-compat requirements.

What struck me as most significant was the opportunistic breakage of things not related to the unicode transition. In the many years it took to win people over to v3, they could have marched over all the breaking changes a year at a time. Given that side-by-side installs of python3.x point versions are very functional, with or without venvs, this would have been much more palatable. Perhaps harder than it sounds though.

I attempted a couple of 2to3 translations of open source libraries over the years, with varying degrees of success. Every time I found that most of the changes were easy, but debugging the broken bits was hard due to the sheer volume of source changes. If instead I could have done conversions where there was only a single major semantic change at a time, it would be so much easier to figure out what was going wrong at any given step. Furthermore, I imagine that a single-breaking-change mentality would lead to better documentation on how to transition for each version.

For this reason, I have become rather suspicious of yearly release schedules. Swift is even more frustrating: the version changes are really just dictated by Apple's yearly PR calendar. Some big things get rushed out for WWDC before they are ready, and smaller fixes can get held back until the next year. I would much rather that the language teams just prioritize one thing at a time, release it when it is ready, and foster a community where staying up-to-date on the latest version is easy and desirable (a more complicated story for Apple than for Python I think, due to ABI, OS version, etc).

From past discussions on HN I've gathered that there is such a thing as release fatigue, where developers get irritated when libraries release breaking changes too often. Nevertheless I often wonder if languages and libraries could improve faster by making more breaking changes, one at a time, with robust side-by-side installs to facilitate testing across versions. I wish side-by-side library versions were possible in Python, just to facilitate regression testing.

Bringing this all back to the post, I sincerely hope that if Python 4 is a breaking change to the GIL, that it will be only that.

I'm curious what others think about all this. Thoughts?

fractalb · on Oct 17, 2021

If every release has a single breaking change, then that language is said to be unstable/not-production ready. IMHO, that's not at all an acceptable way of doing point releases. People will just be scared of new releases. No one will adopt a new language version as soon as it releases. Java never breaks backwards compatibility and still there are people running Java 8. Imagine what would happen if every point release carries breaking changes. It makes you feel that the language is not mature, the library ecosystem broken, since you'll have to keep track of version compatibility for each library that you use. It's a nightmare for both library developers and end-users. Few people would like to use such a language

nuerow · on Oct 17, 2021

> If every release has a single breaking change, then that language is said to be unstable/not-production ready. IMHO, that's not at all an acceptable way of doing point releases.

This.

It makes absolutely no sense to claim that having to deal with a single non-backwards compatible release is somehow worse than having to deal with a sequence of non-backwards compatible releases.

Even though the migration from Python2 to Python3 faced some resistence, if anything the decision was totally vindicated.

burnished · on Oct 18, 2021

I thought the use case mentioned made sense, that of essentially being able to perform patches in a series, as opposed to trying to fix many breaking changes at once. You know. Iterative development. I think 'makes absolutely no sense' is a little harsh.

isoprophlex · on Oct 17, 2021

O boy, angry rant incoming. I'll say something petulant and overly dramatic, but I don't like the direction in which python is going, and I'm glad there's finally some news about focus on actual innovation instead of tacking on syntactic cruft.

I want the python that Guido promised me, with 2021 performance. I don't want some abhorrent committee-designed piece of middle-of-the-road shitware glue language that I must use because everyone uses it.

I want a language that doesn't spin it's single-threaded wheel in a sea of CPU cores, and I want a language that has one obvious way of doing things without needing to grok and parse dumb """clever""" hacks that will only be abused by midlevel programmers to show off hoe they saved typing a few lines of additional code.

To me, speed + simplicity = ergonomy = joy. I want a new python 4 to focus exclusively and intensely on performance improvements and ergonomy.

DangitBobby · on Oct 17, 2021

What recent changes to the language do you specifically dislike?

isoprophlex · on Oct 17, 2021

The walrus operator is a tired old trope to hate on, but I dont see the point personally. Same goes for the structural pattern matching thing. The tacking on of typing features feels superfluous in a language thats not compiled or even strongly typed.

But for the sake of maximum pendatry let me paste some nitpicky little detail from a somewhat recent syntactic addition:

   >>> def f(a, b, /, **kwargs):
   ...     print(a, b, kwargs)
   ...
   >>> f(10, 20, a=1, b=2, c=3)                
   10 20 {'a': 1, 'b': 2, 'c': 3}
   
   a and b are used in two ways.
   
   Since the parameters to the left of / are not exposed as possible keywords, the parameters names remain available for use in **kwargs

Jesus fucking hell on a tricylce so now i have *'s and /'s showing up in function signatures so someone can prematurely optimize the re-use of variable names without breaking backwards comparability?!

Python is becoming a mockery, dying a death through a thousand little cuts to its ergonomics.

wenc · on Oct 17, 2021

> The tacking on of typing features feels superfluous in a language thats not compiled or even strongly typed.

Python has always been strongly typed (Python has strong dynamic typing). Adding typechecks moves it towards being gradually/statically-typed.

ptx · on Oct 17, 2021

I'm sure you're already aware of this example since it's the canonical one, but to me personally the point is very clear: I use regular expressions all the time and always have to write that little bit of boilerplate, which the walrus operator now lets me get rid of.

Avoiding tedious boilerplate by adding nice features like the walrus operator is precisely what lets us avoid "death through a thousand little cuts to its ergonomics", in my view.

Sure, maybe writing

  m = re.match("^foo", s)
  if m != None:
    ...

isn't so bad, but in that case maybe writing

  i = 0 
  while i < len(stuff):
    element = stuff[i]
    ...
    i += 1

wouldn't be so bad, and we could get rid of Python's iterator protocol?

dataflow · on Oct 17, 2021

I think regex matches might be literally the only use case for := that I come across with any kind of nontrivial frequency, and it's only a minor nuisance at that. Certainly nothing to warrant an entirely new yet different syntax for something we already have.

The iterator protocol is way more general than what you have; it's not remotely comparable.

slightwinder · on Oct 18, 2021

Regex is just a prominent example of a certain pattern. Depending on work and style, one often has functions which return something, on which in case of a non-empty result you want to do something more. Walrus can shrink the code in data-intensive code quite well from my experience.

haihaibye · on Oct 18, 2021

I've come to like using:

    if foo := data.get("foo"):
        handle_foo(foo)

pansa2 · on Oct 18, 2021

That’s exactly the same as this, though.

    foo = data.get("foo")
    if foo:
        handle_foo(foo)

The only place I have found the walrus operator useful is in similar `while` loops, where the equivalent code would be:

    while True:
        foo = data.get("foo")
        if not foo: break
        handle_foo(foo)

ptx · on Oct 18, 2021

The loop can be written using This One Weird Trick that Walruses Hate:

  for foo in iter(partial(data.get, "foo"), None):
      handle_foo(foo)

So I would only use the walrus operator for the first example (the if statement), which even though it is exactly the same as doing it in two steps just feels nicer as a single step.

heavenlyblue · on Oct 18, 2021

No it's not the same. The visibility of variable "foo" is extended beyond it's usefullness if you use `if`.

pansa2 · on Oct 18, 2021

No, the visibility of `foo` is the same in both examples. Python’s `if` statements do not introduce a new scope.

mixmastamyk · on Oct 17, 2021

It should have been “if … as y” and reused existing syntax. I’ve never seen anyone use the extended variant (multiple assignment) that walrus allows. The extra colons with this and typing makes it look like a standard punctuation-heavy language we sought to avoid in the first place.

dragonwriter · on Oct 17, 2021

AFAIK, the purpose of “/” is so that python-implemented functions can be fully signature-(and, therefore, also type-)compatible with builtins and C-implemented functions that required positional arguments but do not accept those arguments being passed as keyword arguments.

folkrav · on Oct 18, 2021

Python is strongly typed. Dynamic, but not weak.

isoprophlex · on Oct 18, 2021

Yeah, thanks for the correction. Shows what I know; I'm just a grumpy mediocre programmer whose rants shouldn't be taken too seriously ;)

xorcist · on Oct 18, 2021

I find a little irony in the recent additions. A walrus operator sounds very Perl-ish.

ahmedfromtunis · on Oct 17, 2021

Not the OP but some of the late additions to Python were, *in MY very humble opinion* not very pythonic; just a syntactic sugar that meant there are now more than just one way of doing things.

On that list: the walrus operator and the new switch thing. If I understand them fully and correctly, those two things don't enable developers to do things that were impossible before, instead they add new ways to do things that were possible prior.

That's the Python I know and love.

Of course, this doesn't mean I'll love Python any less, just that I wished there were more focus on staff that matters like the topic of this article. Or maybe getting type hinting better.

Again, this is just my opinion.

DangitBobby · on Oct 17, 2021

IMO, the "one obvious way to do things" has always been a comforting fiction. There are numerous ways to do everything, the worst offenders forcing people to make tradeoffs between debuggability and readability (ie, for loops versus list comprehensions). Many of them are purely about readability (ternary expression versus if blocks) and many of them are about style (ternary expression versus use of or/and short-circuiting). Even so, before the walrus operator, there was never a way to define a variable that only existed in the scope of a particular if statement.

After using pattern matching in Rust and switch statements in JavaScript, I personally am very excited for that addition to Python, but I understand the feature is divisive and will concede it as a matter of opinion.

Edit: turns out the walrus operator does not cause the variable to move out of scope after the if block, which is disappointing. IMO the worse anti-pattern has already been part of the language, which is not creating new scopes for if statements.

slightwinder · on Oct 18, 2021

Half of python was always syntactic sugar that could be also done one way or another with more primitive code. I mean for-loops, elif, import, the whole OOP; all just redundant syntactic sugar. Pythonic never meant to make the syntax less "sweet". Python is about making simple, straight forward code which removes unnecessary friction. And that's exactly what the walrus and switch are doing on their own.

But yes, of course can one also argue that they add friction on the global scale, because it's yet another syntax-element to know about, and the benefit is rather small on surface. But that's the problem with syntax, it's always a trade-off between overhead and benefit.

xorcist · on Oct 18, 2021

The "one thing to do it" was never to be taken literally, it was mostly to mock Perl programmers.

Python always had at least half a dozen ways of doing trivial things such as packaging dependencies or even doing http requests or looping over stuff.

dataflow · on Oct 17, 2021

Not the parent but := assignment expressions and match expressions are abominations.

DangitBobby · on Oct 17, 2021

I'll point out that the walrus operator was actually accepted while Guido was still BDFL (and the vitriol surrounding the decision to include it led directly to him stepping down from the position [1]), so even accepting the fact that it's a poor addition to the language, does not provide support for the statement that "design by committee" has lead to poor language design decisions.

1. https://pythonsimplified.com/the-most-controversial-python-w...

dataflow · on Oct 17, 2021

You'll want to post that as a reply to the parent you originally replied to, since I'm not the one who said anything about design-by-committee.

DangitBobby · on Oct 17, 2021

They gave no specific criticisms. This thread was born of a request for specific criticisms. When that happens, I try to operate as though the assumptions laid out in the parents hold for the children. I think this makes sense to do, especially when you appeared to step in as a proxy expanding on the parent's opinion. Even if that wasn't your intention, this is a public thread, and the most relevant place to post things as a response to a sentiment in a thread may not be directly to a person who holds that exact sentiment. If you don't take issue with "design by committee" then you need not be concerned. I don't think you think that, and I think no less of you regardless.

dataflow · on Oct 17, 2021

I meant it moreso that the person whom the reply is actually relevant to might not see it otherwise, but whatever, it's fine with me.

asah · on Oct 17, 2021

Disagree: the recent changes are things I put to work immediately and in a large fraction of the code. They're not niche and "should have" been added years ago. If anything, I'm thrilled with the work of the "committee," whose judgments are better than the result of any individual. Postgres is the same.

Gone are the days when you invest in a platform like python, and they make crazy decisions that kill the platform's future (e.g. perl5). Ignore small syntax stuff like := and focus on the big stuff.

dataflow · on Oct 17, 2021

> Disagree: the recent changes are things I put to work immediately and in a large fraction of the code.

That says nothing about their quality. It just says you like them. If you gave me unhealthy food I'd probably eat it immediately too. Doesn't mean I think it's good for me.

> Ignore small syntax stuff like := and focus on the big stuff.

They're not "small" when you immediately start using them in a "large fraction of your code". And a simple syntax that's easy to understand is practically Python's raison d'être. They added constructs with some pretty darn unexpected meanings into what was supposed to be an accessible language, and you want people to ignore them? I would ignore them in a language like C++ (heck, I would ignore syntax complications in C++ to a large degree), but ignoring features that make Python harder to read? To me that's like putting performance-killing features in C++ and asking people to ignore them. It's not that I can't ignore them—it's that that's not the point.

DangitBobby · on Oct 17, 2021

I simply do not understand how the walrus operator is harder to read. Maybe an example?

    my_match = regex.match(foo)
    if my_match:
        return my_match.groups()
    # continues with the now useless my_match in scope

Versus

    if my_match := regex.match(foo):
        return my_match.groups()
    # continues without useless my_match in scope

How is the second one less readable? Have you ever heard of a real world example of a beginner or literally anyone ever actually expressing confusion over this?

dataflow · on Oct 17, 2021

The problem isn't that simple use case. Although even in that case, they already had '=' as an assignment operator, and they could've easily kept it like the majority of other languages do instead of introducing an inconsistency.

The more major problem with the walrus operator is more complicated expressions they made legal with it. Like, could you explain to me why making these legal was a good thing?

  def foo()
      return ...
  def bar():
      yield ...
  while foo() or (w := bar()) < 10:
      # w is in-scope here, but possibly nonexistent!
      # Even in C++ it would at least *exist*!
      print(w)

  # The variable is still in-scope here, and still *nonexistent*
  # Ditto as above, but even worse outside the loop
  print(w := w + 1)

If they just wanted your use case, they could've made only expressions of the form 'if var := val' legal, and maybe the same with 'while', not full-blown assignments in arbitrary expressions, which they had (very wisely) prohibited for decades for the sake of readability. And they would've scoped the variable to the 'if', not made it accessible after the conditional. But nope, they went ahead and just did what '=' does in any language, and to add insult to injury, they didn't even keep the existing syntax when it has exactly the same meaning. And it's not like they even added += and -= and all those along with it (or +:= and -:= because apparently that's their taste) to make it more useful in that direction, if they really felt in-expression assignments were useful, so it's not like you get those benefits either.

eesmith · on Oct 17, 2021

While the walrus operator gives a way to see this sort of non-C++ behavior, it's more showing that Python isn't C++ than something special about the operator.

Here's another way to trigger the same NameError, via "global":

    import random
    def foo():
        return random.randrange(2)
    def bar():
        global w
        w = return random.randrange(20)
        return w
        
    while foo() or (bar() < 10):
        print(w)

For even more Python-is-not-C++-fun:

    import re
    def parse_str(s):
        def m(pattern):  # I <3 Perl!
            nonlocal _
            _ = re.match(pattern, s)
            return _ is not None

        if m("Name: (.*)$"):
            return ("name", _[1])
        if m("State: (..) City: (.*)$"):
            return ("city", (_[2], _[1]))
        if m(r"ZIP: (\d{5})(-(\d{4}))?$"):
            return ("zip", _[1] + (_[2] if _[2] else ""))

        return ("Unknown", s)
        del _  # Remove this line and the function isn't valid Python(!)

    for line in (
            "Name: Ernest Hemingway",
            "State: FL City: Key West",
            "ZIP: 33040",
            ):
        print(parse_str(line))

dataflow · on Oct 17, 2021

Right, I'm quite well-aware of that, but I'm saying this change has made the situation even worse. If they ensured the variables were scoped and actually initialized it'd have actually been an improvement.

eesmith · on Oct 18, 2021

I apologize. I misinterpreted your comment:

      # w is in-scope here, but possibly nonexistent!
      # Even in C++ it would at least *exist*!

because I don't see how bringing up C++'s semantics is relevant when Python has long raised an UnboundLocalError for similar circumstances.

If I understand you correctly, you believe Python should have introduced scoping so the "w" would be valid only in the if, elif, and else clauses, and not after the 'if' ends.

This would be similar to how the error object works in the 'except' clause:

  >>> try:
  ...   1/0
  ... except Exception as err:
  ...   err = "Hello"
  ...
  >>> err
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  NameError: name 'err' is not defined

If so, I do not have the experience or insight to say anything meaningful.

DangitBobby · on Oct 17, 2021

In your example, if you leave out the parentheses around w := bar(), you get "SyntaxError: cannot use assignment expressions with operator" which makes me think it's a bug in the interpreter and not intentionally designed to allow it.

I am baffled to learn that it's kept in scope outside of the statement it's assigned, and I agree it would have a negative impact on readability if used outside of the if statement.

dataflow · on Oct 17, 2021

> if you leave out the parentheses around w := bar(), you get "SyntaxError: cannot use assignment expressions with operator" which makes me think it's a bug in the interpreter and not intentionally designed to allow it.

No, I'm pretty sure that's intentional. You want the left-hand side of an assignment to be crystal clear, which "foo() or w := bar()" is not. It looks like it's assigning to (foo() or w).

DangitBobby · on Oct 17, 2021

To be clear:

    def thing(): return True
    
    if thing() or w:= "ok": # SyntaxError: cannot use assignment expressions with operator
        pass
    print(w)

    . . .

    if thing() or (w := "ok"):
        pass
    print(w) # NameError: name 'w' is not defined

The first error makes me think your concern (that w is conditionally undefined) was anticipated and supposed to be guarded against with the SyntaxError. I believe the fact you can bypass it with parentheses is a bug and not an intentional design decision.