I worked in Python for years and while I suppose I'm glad for any improvement, I have never understood the obsession with true multi-threading. Languages are about trade-offs and Python, again and again, chooses flexibility over performance. It's a good choice! You can see it in how widely Python is used and the diversity of its applications.
Performance doesn't come from any one quality, but from the holistic goals at each level of the language. I think some of the most frustrating aspects from the history of Python have been when the team lost focus on why and how people used the language (i.e. the 2 -> 3 transition, though I have always loved 3). I hope that this is a sensible optimization and not an over-extension.
Losing the GIL strictly makes the language strictly more flexible. Previous GILectomies tanked performance to an unacceptable degree. In single-threaded code, this one is a moderate performance improvement in some benchmarks, and a small detriment in others -- which is about as close to perfect as one could expect from such a change. That's why people are excited about it.
At a higher level, Python is getting serious about performance. But this gives both flexibility and performance.
Call me optimistically skeptical. I share similar reservations about GIL obsession with the original comment author, but if this is true:
> The overall effect of this change, and a number of others with it, actually boosts single-threaded performance slightly—by around 10%
Then it sounds like having the cake and eating it too (optimism). Although my experience keeps nagging at me with, "there is not such thing as a free lunch" (skepticism).
The no-GIL version is actually about 8% slower on single-threaded performance than the GIL version, but the author bundled in some unrelated performance improvements that make the no-GIL version overall 10% faster than today's Python.
Right, the 20% boost is unrelated to the Gilectomy.
> though, as Guido van Rossum noted, the Python developers could always just take the performance improvements without the concurrency work and be even faster yet.
Why be 10% faster single threaded when you can be 20% faster single threaded!
> The resulting interpreter is about 9% faster than the no-GIL proof-of-concept (or ~19% faster than CPython 3.9.0a3). That 9% difference between the “nogil” interpreter and the stripped-down “nogil” interpreter can be thought of as the “cost” of the major GIL-removal changes.
Why? It's not like CPython is a speed daemon. I'd think there are some low hanging fruits, simply because performance is such a low priority for the maintainers. It doesn't even do TCO after all.
> Although my experience keeps nagging at me with, "there is not such thing as a free lunch" (skepticism).
Well, yeah, someone had to make the changes. That's the cost that was paid.
You can get a mass-produced machete that is cheaper and higher-quality than a 7th-century sword. It's easy for one thing to be better than another thing across several dimensions simultaneously. That's why certain technologies go out of use -- they have negative value compared to other technologies. But that has nothing to do with the principle that there's no such thing as a free lunch.
I feel like you aren't well informed on why removing the GIL results in a single-threaded performance hit. And while I think it's always nice to keep in mind the developer effort required, it's not the only cost as GIL removal has been done before (several times, even as far back as Python 1.5 [1]).
The crux of the issue (as I understand it) is that the GIL absolves the Python interpreter of downstream memory access control. You can replace the GIL with memory access controls of various strategies, but the overhead of that access control is just that: overhead. In a multi-threaded program the concurrency gains should outweigh that overhead, but in a single-threaded one it's just extra work that wasn't being done before.
Which brings us back to no free lunch. It turns out that the claim "10%" faster without the GIL is actually a result of Gross (GIL removal author) doing a multitude of unrelated performance improvements. These performance improvements increase performance enough that the performance of single-threaded no GIL code (with overhead) is ~10% higher than today. But as Guido pointed out, the core developers could upstream the performance improvements without the GIL removal:
> To be clear, Sam’s basic approach is a bit slower for single-threaded code, and he admits that. But to sweeten the pot he has also applied a bunch of unrelated speedups that make it faster in general, so that overall it’s always a win. But presumably we could upstream the latter easily, separately from the GIL-freeing part. [2]
To be explicit, I was skeptical because I believed that GIL removal requires adding overhead for managing memory access. Having dug a bit deeper into it that seems confirmed. The proposed GIL removal strategy _is slower for single-threaded code_ like other solutions before it. It turns out the reported performance increase was the result of orthogonal performance improvements overshadowing the overhead of GIL removal.
Put another way, if the performance improvements were upstreamed without removing the GIL the resulting performance increase would be ~20% instead of just ~10%. Which is what Guido was getting at in the quote I cited. Assuming the benchmarks to be true for the moment, this means that removing the GIL on this PoC branch is a 10% performance hit to single-threaded workloads.
> "there is not such thing as a free lunch" (skepticism).
When you carry a heavy suitcase filled with lead and you drop it, things get lighter for free. You paid for it by carrying the damn thing around with you for the whole time.
Well, CPython probably won't ever get there. But Python as a language maybe could.
The GraalPython implementation of Python 3 is built on the JVM, which is a fully thread safe high performance runtime, and Graal/Truffle provide support for speculation on many things. For pure Python it provides a 5-7x speedup already and the implementation is not really mature. Although at the moment they're working on compatibility, in future it might be possible to speculatively remove GIL locks because you have support for things like forcing JITd code to a safepoint and discarding it, if you want to change the basic semantics of the language.
How does it relate to PyPy? I read that the latter uses tracing JIT, while GraalPython builds on truffle’s AST-based one, that basically maps the JVM’s primitive structures to Python’s and thus making use of all the man-hours that went into the JVM’s development.
But last time I checked, pypy had much better performance than Graal, even though TruffleJS (javascript interpreter built on the same model as graalpython) has comparable performance to the v8 engine for long running code. Though the latter is the most actively developed truffle language, let me add that.
It's sort of taking Jython's implementation approach to a much greater extreme, and bypassing bytecode, so it isn't limited by the Java semantics anymore.
It resolves a few big problems Jython had:
- GraalPython is Python 3, not Python 2
- It can use native extensions that plug into the CPython interpreter like NumPy, SciPy etc. The C code is itself virtualized and compiled by the JVM!
Yah, that's definitely the future I'm hoping for. What I am worried about are the kind of transition issues I mentioned. Python 2 -> 3 strictly made the language more flexible too - but the Python ecosystem is about existing code almost more than the language and I worry that we could find similar problems here. Potential for plenty of growing pains while chasing relatively small gains.
In the company I'm working for, we had to spent more engineer time on GIL workarounds (dealing with the extra complexity caused by multiprocessing, e.g. patching C++ libraries to put all their state into shared memory) than we needed for the Python 2 -> 3 migration. And we've only managed to parallelize less than half of our workload so far.
Even if this will be a major breaking change to Python, it'll be worth it for us.
Python needs to be compiled into machine language to ever have a chance of competing on speed. We can already get around the GIL with multiprocess but Python is still to slow even when not bound by copying memory between processes.
The phrase "competing on speed" begs the question "competing...with what?" If the answer is "machine compiled languages", then yes, it's unlikely Python will ever match their speed without also being compiled to machine code, but there are plenty of other interpreted languages with better performance than Python (even ruling out stuff like Java that technically isn't "compiled into machine language" in the way that phrase usually would mean); lots of work is done on JavaScript interpreters to improve performance, and I don't think that specifically has cost the language much flexibility.
I use python. I don’t love it but it has a good selection of libraries for what I do. It’s not blazing fast but not terribly slow either.
As for multiprocess, I currently have 150 python process running on the work cluster. Each doing their bit of a large task. The heavy lifting is in a python library but it’s C code. but it’s actually not bad performance wise and frankly wasn’t to bad to code up. I think for my use case threads would make it harder.
Java is technically compiled into machine language, it is a matter to chose the JDK that offers such options, many people don't, but that is their problem, not lack of options.
JavaScript interpreters that people actually use, have a JIT inbox.
I don't think the goal is to "compete on speed", but I'm sure people wouldn't complain about their Python scripts running 15x faster on their 16 core CPU.
And it is also about flexibility. What I love about Python is the simplicity, and let's be honest, multiprocess anything but. Especially if you fall into one of the gotchas (unpickable data for example).
It's because threads in Python are only really good for parallel I/O, and are ineffective for CPU bound workloads.
This can be a problem for a lot of treading use cases. If I'm working on an ETL app that parses large amounts of data, either the related CPU bound tasks need to either run sequentially, call out to C extensions, or use multiple processes which incurs an overhead.
It's a pain when you know threads would suit your use case well, but the threading implementation in the language you're working in isn't up to the task.
It's very interesting you mentioned ETL tasks. In ETL batch jobs, a unit in the batch is defined small enough to rarely be CPU bound; rather, as you mentioned, it is I/O bound. In what situation must you define a unit of work to be so heavily CPU bound? To me that's a smell for too large of a unit.
I'm using ETL as shorthand for a "I wrote a script at home that parses my data and puts it in a database, and threads might save time" situation. I wouldn't reach for a thread pool for anything serious.
While it would be nice to have the language be able to do it, the fact that you can't today leads to some tidy separation of concerns for parallelization. For instance, many people I've talked to use things like Spark or dask to get high scale on data processing tasks. That means that all of the management of distributed jobs is handled through an easily googleable framework that your ops team can manage, as opposed to needing to build all of that yourself.
I see this as being a nice stopgap solution for those who are too big for single-threaded but not big enough to need Spark.
> Performance doesn't come from any one quality, but from the holistic goals at each level of the language.
It starts to become an issue when you have built a few well-performing subsystems and now want them to run together and interact. With the GIL, your subsystems are suddenly not performing as well anymore. Without the GIL, you can still get good performance (within limits of course).
Performance referring here to throughput and/or latency (responsiveness).
I agree, but I don't do anything that can be split up, and would benefit from sharing memory. That is really the only benefit of removing the GIL. Multiprocessing can do true concurrency, and so can Celery, which even allows you to use multiple computers. The only time that is a pain is when you need to share memory, or I guess maybe if you're low on resources and can't spare the overhead from multiple processes.
I think a JIT would be the best possible improvement for CPython as far as speed is concerned. Though I can imagine there are plenty of people doing processor heavy stuff with c extensions that would benefit from sharing memory. So from their perspective removing the GIL would be a better improvement.
So basically a JIT would help every Python program, and removing the GIL would only help a small subset of Python programs. Though I'm just happy I get to make a living using Python.
Edit: This was in the back of my head, but I didn't mention it, and it would be unfair to dismiss. A JIT does slowdown startup time, so for short programs that are finished running quickly it may make things worse. Though I suspect it would be easy enough to have a value to turn off the JIT at the start of the program.
Pythons existing threading support (via the threading module) can already do true concurrency just fine. Concurrency and parallelism are not the same thing. The GIL limits parallelism, making separate OS threads operate concurrently but not in parallel. Removing the GIL will allow threads in python programs to operate concurrently and in parallel.
>> So basically a JIT would help every Python program, and removing the GIL would only help a small subset of Python programs.
What if the "Global Interpreter Lock" needs to be removed for JIT? I put that in quotes to highlight it because AFAICT no compiled (or JITed) language has such a thing. I think it functions differently than regular stuff like critical sections.
High performance JIT compiling VMs don't use a GIL, they use a different trick called safe points.
The compiled code polls a global or per-thread variable as it runs (but in a very optimized way). When one thread tries to change something that might break another thread, the other threads are brought to a clean halt at specific points in the program where the full state of the abstract interpreter can be reconstructed from the stack and register state. Then the thread stacks are rewritten to force the thread back into the interpreter and the compiled code is deallocated.
The result is that if you need to change something that is in practice only changed very rarely, instead of constantly locking/unlocking a global lock (very, very slow) you replace it with a polling operation (can be very fast as the CPU will execute it speculatively).
However, this requires a lot of very sophisticated low level virtual machinery. The JVM has it. V8 has it. CLR has a limited form of it. Maybe PyPy does, I'm not sure? Most other runtimes do not. For the Python community, very likely the best way to upgrade performance would be to start treating CPython as stable/legacy, then support and encourage efforts like GraalPython. That way the community can re-use all the effort put into the JVM.
PyPy can utilitize something called software transactional memory to the same effect.
This gives you a unusually fast Python that is also GIL-less. It doesn't seem to be used much, so there may be some compatibility problems or similar, but for a trivial test it worked just as described many years ago.
It also tells me that the GIL isn't terribly important for most things Python is used for. It certainly isn't for me.
Yes higher core counts are more and more common, but the language has thirty years of single-threaded path-dependence. Lots of elements of it work the way they do because there was a GIL. I could be wrong, but I am skeptical that Python will ever be the best choice for high performance code. It's always worth improving the speed of code when you can, but more often than not you "get" something for going slower. I hope my worries are wrong and this is actually a free win!
No shared memory. To communicate between processes you usually use sockets, to communicate between threads you mutate variables. This is a huge performance difference.
A tangent but I find it amusing to contrast the perpetual Python GIL debate with all the new computation platforms that claim to be focused on scalability. Those are mostly single threaded or max out at a few virtual CPUs (eg "serverless" platforms) and there people applaud it. There people view the isolation as supporting scalability.
Yeah, I know about that argument but it just doesn't make sense to me. Removing the GIL means that 1) you make your language runtime more complex and 2) you make your app more complex.
Is it truly worth it just to avoid some memory overhead? Or is there some other windows specific thing that I'm missing here?
> Yeah, I know about that argument but it just doesn't make sense to me. Removing the GIL means that 1) you make your language runtime more complex and 2) you make your app more complex.
#2 need not be true; e.g., the approach proposed here is transparent to most Python code and even minimized impact on C extensions, still exposing the same GIL hook functions which C code would use in the same circumstances, though it has slightly different effect.
Well actually, on the types of CPUs that OP refers to (128 threads i.e. AMD Threadripper), L3 cache is only shared within each pair of CCXs that form a CCD. If you launch a program with 32 threads, they may have 1, 2, 3 or 4 distinct L3 caches to work with.
Moreover, unless thread pinning is enforced, a given thread will bounce around between different cores during execution, so the number of distinct L3 caches in action will not be constant.
Of course you have the same story with memory, accessing another thread's memory is slower if that thread is on another CCD.
TL;DR NUMA makes life hard if you want to get consistent performance from parallelism.
I mean is there anything here preventing one from only writing their code to be single threaded tho? This is an addition to the capability and not a detraction.
Say your webapp talks to a database or a cache. It'd be really nice if you could use a single connection to that database instead of 64 connections. Or if you wanted to cache some things on the web server, it would be nice if you could have 1 copy easily accessible vs needing 64 copies and needing to fill those caches 64x as much.
Unfortunately using a single db/RPC connection for many active threads is not done in any multithreaded system I’m aware of for good reasons. Sharing this type of resource across threads is not safe without expensive and performance-destroying mutexes. In practice each thread needs exclusive access to its own database connection while it is active. This is normally achieved using connection pooling which can save a few connections when some threads are idle, but 1 connection for 64 active web worker threads is not a recipe for a performant web app. If you can point to a multithreaded web app server that works this way I’d be very interested to hear about it.
The idea of a process-local cache (or other data) shared among all worker threads is a different story. Along with reduced memory consumption, I see this as one of the bigger advantages of threaded app servers. However, preforking multiprocess servers can always use shmget(2) to share memory directly with a bit more work.
> Unfortunately using a single db/RPC connection for many active threads is not done in any multithreaded system I’m aware of for good reasons. Sharing this type of resource across threads is not safe without expensive and performance-destroying mutexes
lol, you're so deep into python stockholm-syndrome "don't share anything between threads because we don't support that at all even a little bit" that you don't even realize that connection pools exist. Instead of holding a connection open per process, you can have one connection pool with 30 connections that services 200 threads (exact ratio depends on how many are actually using connections, of course). literally everybody "shares a single DB/RPC connection across multiple threads" (or at least shares a number of connections across a number of threads), except python.
and yeah you can turn that into yet another standalone service that you gotta deliver in your docker-compose setup, but everybody else just builds that into the application itself.
> that you don't even realize that connection pools exist
The GP mentions connection pooling literally three sentences later.
> literally everybody "shares a single DB/RPC connection across multiple threads" (or at least shares a number of connections across a number of threads), except python.
Right, but multiple ≠ many. You're discussing the former. GP is discussing the latter.
Depending on the structure, it can indeed be many. Both in the case of protocols which support multiplexing of requests, and in situations where you have multiple databases (thus a given thread might not need to be talking to a particular database all the time).
Popularity has probably as much to do, if not more to do, with ease of access (or lack of alternative) than good design of the language. Php is equally if not more popular than python.
I'm not a PHP expert, but I did not know it was also used in data science, game programming, embedded programming and machine learning as Python is. Of course they are both used for web services.
PHP doesn’t ship with an API for creating threads, but PHP can be executed in threads depending on setup. And it does that without using a GIL, instead it internally uses something called Thread-Safe Resource Manager
I don't know much about it, but I've heard here and there about Swoole, a "PHP extension for Async IO, Coroutines and Fibers".
> Swoole is a complete PHP async solution that has built-in support for async programming via fibers/coroutines, a range of multi-threaded I/O modules (HTTP Server, WebSockets, TaskWorkers, Process Pools) and support for popular PHP clients like PDO for MySQL, Redis and CURL.
>again and again, chooses flexibility over performance. It's a good choice! You can see it in how widely Python is used and the diversity of its applications.
What does it mean? how is python different here than Java/C#?
This is my main issue with python. The whole GIL thing is basically necessary because of Python's heavily dynamic model, which is almost always used improperly.
Mypy and other static analysis tools are becoming more common in part because IMO they basically require you to stop and think about your "Pythonic" dynamic patterns (containers of mixed element types, duck typing of function arguments, mutable OOP etc.), and often realize that they are a bad idea.
So in some way we are hampering multithreading to support programming constructs that are mostly used to make python flavoured spaghetti, especially in the hands of beginners and non-programmers who are encouraged to learn Python...
I'm not sure Python is fixable at this point. Oh well.
Assuming now you have a big dictionary, which contains tens of millions of records to be filtered upon, another say TB of data. In current Python land, you either:
1. Using multi-processing, but each process needs to create its own dictionary
2. Or create an external DB, and using the DB's client to retrieve data in certain way.
This pattern has occurred again and again in my use case, and it is always messy to solve in Python. Had Python has true multi-threading, then sharing some big but read-only object among real threads would be a possibility, and believe me, a lot of people would be really happy.
That’s a workaround that is not as ergonomic as writing python more directly. The person working on this Gil project is one of the major maintainers of a key ml library. They have used c/c++ binding approach but want to make life easier and be able to multi thread directly in python.
It wasn't entirely Python's inborn merits. For example, we don't use the best language for each task today. Instead, we use whatever language our coworkers and other companies use, which we often convince ourselves is the best language for a task.
The timing was critical, and although Python may be the Facebook of languages, one can't discount that it was extraordinarily lucky for Guido to be in the precise time and place to capitalize on good design choices.
Ditto with Ruby and web dev. The language was never designed with that application in mind (and very few people used it for that when I got into Ruby around 2000). Path dependence mostly accounts for the ubiquity of Python in science and Ruby in web dev; it just as easily could have gone the other way around.
Note that Python's poor single-threaded performance compared to single-threaded performance of other languages makes the ability to multi-thread that much more crucial. You can sometimes get away with 10x slower code but not 100x slower.
I've had to rewrite my Python code in another language in 3 different projects already (multi-processing wasn't an option) and I'm not even a heavy user. Removing GIL would be very welcome.
> You can sometimes get away with 10x slower code but not 100x slower.
That Python and other languages in its speed class are still used for new projects in production demonstrates that you can, often, get away with 100× slower code. But, sure, you can get away with 10× slower more often.
Think about the datascience use case: you need to load data from disk or network as fast as possible and compute a lot of CPU-bound operations right after that.
Threads will allow you to split your I/O in multiple procedures, so you might start computations as soon as possible when the data is ready. They will also allow you to massively speed up aggregates without having to create a new process each time (which doesn't allow you to share memory). Threads are a BIG issue when you don't want to rely on asyncio[1].
Note [1]: Asyncio pools are single-threaded because of the GIL. This is already bad enough in practice, but they also perform very badly in CPU-bound contexts. This make them an absolute no-go when dealing with datascience code: a cpu-bounded core in an i/o-bounded wrapper.
Support is partial and uneven. Moreover, you don't generally use a single datascience library, it's often a mix of pandas, numpy, scikit-learn, custom algorithms, custom optimization suite, and arrow. Letting individual library release the GIL is nice, but you need deep knowledge of those library to know what computation are thread-able or not.
In practice, for single-machine workloads, it's currently mostly numpy and/or whatever deep learning framework you use that does the number crunching.
This means that provided the code is operating on sufficiently large amounts of data (such that calls to numpy are each of sufficient duration), the multithreading in BLAS / Lapack within numpy usually give you weak scaling wrt to thread count without any tricks.
The issue however is that this require by hand making everything into structs of arrays from arrays of structs, removing as many iterations from python as possible, potentially balancing thread usage within python and within numpy etc. By this point IMO your "python" code looks more like Fortran or SQL with better string IO...
The number crunching part is already fast enough, however the aggregations, parsing and filtering that come beforehand is really really slow. This part is often done in pure python because it's often custom code tailored to the data you're manipulating.
We're not interested in scaling up the parts that are already fast, but the rather mundane, uninteresting work that come before.
Lack of multithreading can easily be a win for a language. A tiny subset of problems really needs it these days and for everything else it's a potential way to either screw things up or make them way more complicated that needs to be.
Doesn't require, sure. Nothing "requires" multithreading. It may benefit from it, though, since threads are lower overhead (context switching and memory) than processes. If you have any shared data, then that too may be a benefit (but I guess your point is that most web requests don't share data).
A lot of people will argue that its not worth optimising for, that programmer time is more important and expensive. This may be true at the start, and is probably even worthwhile when you need to get a product out fast to test the waters in a startup, but I've worked in multiple companies that have spent significant developer time to try to reduce their cloud infrastructure costs. At this point, having your language and famework be able to make good use of the available hardware really can make a real difference to the overall performance and therefore cost, both hardware cost and engineering time to optimise it later.
The productivity gap between languages that optimise for development and those that try to at least somewhat optimise for runtime really isn't that large anymore nowadays. Even modern Java is quite productive now, compared to 10 or 15 year ago. Outside of a startup trying to find product market fit building their MVP's super fast to see what works, I think its usually worth spending a bit more on up front development time, a one off cost, to reduce the recurring infrastructure cost.
Of course, it takes a lot more than just a language that supports multithreading to do this, but everything the language and the libraries/frameworks you use do to help you is helpful. I'd rather have a tool that won't get in my way later, that gives me lots of room to grow when performance starts to become an issue, than one where I need to invest in significant and painful development time (based on personal experiences at least) later. This is one area where Go seems to shine, perhaps Rust too, although I have not tried any web backend dev in Rust yet so don't know how productive it would be.
If it's done without causing complications then sure. I'm highly skeptical that it can be, as no other language ever has been able to do that.
Rust got it pretty good, but they designed a significant part of their language so that multithreading could be done well. Python did almost the opposite.
From my perspective as a huge Python fan, efficient multithreading is simply the only major thing missing from the language. I would still use C/C++/assembly for bleeding edge performance needs, but efficient multithreading in Python would have me reaching for alternatives far less often.
Basically I love peanut butter ice cream (Python) I’d just like it even more with sprinkles.
One does not preclude another: the language can be flexible and offer higher concurrency that it does now. My workstation has 64 hyperthreads. Python can use one at a time. That's messed up since I use it as a general purpose language.
I don’t see how the GIL makes writing thread safe software any easier. The GIL might prevent two Python threads executing simultaneously, but it doesn’t change the fact that a Python thread can be preempted, meaning your global state can change at any point during execution without warning.
Most of the issues with multi-threading come from concurrency, not parallelism. The GIL allows concurrency, you just don’t get any of the advantages of parallelism, which is normally the reason for putting up with the complexity concurrency creates.
There are certain classes of errors that it prevents. E.g:
Thread1:
a = 0xFFFFFFFF00000000
Thread2:
a = 0x00000000FFFFFFFF
One might think that the two possible values of a if those are run concurrently are 0xFFFFFFFF00000000 and 0x00000000FFFFFFFF. But actually 0x0000000000000000
and 0xFFFFFFFFFFFFFFFF are also possible because the load itself isnt atomic.
The GIL (AFAICT) will prevent the latter two possibilities.
Most CPUs guarantee that aligned loads and stores up to the register size, i.e. now usually up to 64-bit, are atomic.
The compilers also take care to align most variables.
So while your scenario is not impossible, it would take some effort to force "a" to be not aligned, e.g. by being a member in a structure with inefficient layout.
Normally in a multithreaded program all shared variables should be aligned, which would guarantee atomic loads and stores.
Well, thread safety is exactly about these cases of "well, it's hardly ever a problem".
Real life bugs have come from misapplication of correct parameters for memory barriers, even on x86. Python GIL removes a whole class of potential errors.
Not that I'm against getting rid of the GIL, but I'm more sceptical that it won't trigger bugs.
Though in my opinion python just isn't a good language for large programs for other reasons. But it'd be nice to be able to multithread some 50 line scripts.
Python's integers are arbitrary precision. If large enough, that certainly won't be atomic on any normal CPU. I'm not sure how the arbitrary precision integers work internally, but it's possible they wouldn't be atomic for any value.
"Most" ends up being surprising(ex: ARM has a fairly weak memory model). I've seen a lot of code with aligned access and extensive use of "volatile" in MSVC/x86 explode once it was ported to other architectures.
Older ARM CPUs did not have a well defined memory model, but all 64-bit ARM CPUs have a well defined behavior, which includes the atomicity of any loads and stores whose size is up to 64-bit and which are aligned, i.e. the same as Intel/AMD CPUs.
The current ARM memory model is more relaxed regarding the ordering of loads and stores, but not regarding the atomicity of single loads and stores.
It depends what you mean by 'undefined behavior'. The GIL makes operations atomic on the bytecode instruction level. Critically, this includes loading and storing of objects, meaning that refcounting is atomic. However, this doesn't extend to most other operations, which generally need to pull an object onto the stack, manipulate it, and store it back in separate opcodes.
So with Python concurrency, you can get unpredictable behavior (such as two threads losing values when incrementing a counter), but not undefined behavior in the C sense, such as use-after-free.
No. The L in GIL stands for lock. So only the thread that holds it can write or read from the object, and the behavior is well defined at the C level, because C lock acquire and release operations are defined to be memory barriers.
But when each thread reads the variable, you have no control over which value you see, since you don't control when each thread gets to run. So its undefined in the sense that you don't know which values you will get: a thread might get the value it wrote, or the value the other thread wrote. The threads might not get the same value either.
The GIL exists to protect the interpreters internal data, not your applications data. If you access mutable data from more than one thread, you still need to your own synchronisation.
How is the GIL different from atomics in other languages? There are many cases where atomics are useful.
One example would be incrementing a counter for statistics purposes. If the counter is atomic, and the reader of the value is ok with a slightly out of date value, it's fine. If code is doing this in GIL Python, it's working now, and will break after the GIL is removed.
I know you came to the same conclusion in another comments, but here's a look at it using the `dis` module:
a += 1
turns into
LOAD_FAST 1 (loads b)
LOAD_CONST 1 (loads the constant 1)
INPLACE_ADD (perform the addition)
STORE_FAST 1 (store back into b)
So if the interpreter switches threads between LOAD_CONST and STORE_FAST (so either before or after the INPLACE_ADD), you could clobber the value another thread wrote to `b`.
From your other comment:
> so even though an increment on a loaded int is atomic, loading it and storing it aren't
Its always the loads and stores that are the problem.
But that's the problem with relying on the GIL: it has lock in the name, but it does not protect you, unless you understand the internals and know what you're doing. It protects the interpreter. This isn't much different from programming in other languages without a GIL: if you understand the internals what you're doing you may or may not need locks, because you will know what is and isn't atomic (and even when things are atomic, its still difficult to write thread-safe code! lock-free algorithms are much harder than using mutexes).
Thread safe code requires thinking hard about your code, the GIL does not protect you from that.
Numbers are immutable and incrementing one eg in an attribute is actuay many byte code ops. So doesn't work even currently unless you are fine with losing updates. But a version of this question using another example (eg using a list as a queue) is interesting.
Yep. So in a final answer to the original question of backwards bug-compatibility (https://news.ycombinator.com/item?id=28897534), it seems that it will be retained under the current proposal.
The Python documentation seems misleading to me on this:
> In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that “look atomic” really are.
count = 0
def inc():
count += 1
sure "looks atomic" to me, so according to the documentation should be, but isn't.
On the other hand, I think you could build a horribly inefficient actual atomic counter with
I think it's quite likely there's correct and non-horrible code relying on list append and length being atomic. Although it sounds like this might continue working without the GIL:
> A lot of work has gone into the list and dict implementations to make them thread-safe. And so on.
Indeed, the proposal explicitly covers maintaining existing guarantees, which is why I was confused that so many people just assumed it would break them.
My question was about from a program correctness standpoint, not an efficiency standpoint. But it does look there is a difference from a correctness standpoint, as other comments address.
I agree that in principle the GIL is there for the interpreter, but in practice in globally ensuring all Python objects are in a coherent and safe state for each running thread, it makes many things quite thread-safe. For example you could use a normal list as a fifo queue, if you were to rely on the current behaviour.
Sure, but you really need to understand the internals to be able to make these assumptions.
For instance, in your example, I know that the call to the C code to do the list operations is atomic (a single bytecode instruction), but I can't assume that all such calls to C code are safe because of this, unless I know for sure that the C code doesn't itself release the GIL. I assume that simple calls like list append/pop wouldn't have any reason to do this, but I can't assume this for any given function/method call that delegates to C, since some calls do release the GIL.
So, with or without GIL, you either really need to understand what's going on under the hood so you can avoid using locks in your code (GIL or atomics-based lock-free programming), or you use locks (mutex, semaphore, condition variables etc). No matter what you do, to write thread safe programs, you need to understand what you're doing and how things are synchronizing and operating. The GIL doesn't remove that need.
Of course, removing the GIL removes one possible implementation option, I just don't believe the GIL really makes it any easier. Once you know enough internals to know what is and isn't safe with the GIL, you could just as easily do your own syncrhonization.
Not really. If you're doing an atomic write to the same object from two different threads, you're going to have one win the race and the other lose. That may be a bug in your code, but it's not undefined behavior at the language level.
Python doesn't care about these errors (because python's integers are not as simple as 64-bit operations), however if you generalise this error to a "transaction" of two operations (registers in this case), you'd end up with the same ability to see a view of a state that should not be seen by another thread.
AFAIK CPUs implement atomic load and store instructions and the performance overhead of these is very small compared to something like a software busy lock. So I think it's quite possible to take away the GIL while still making it impossible to load only half of a value.
> The GIL might prevent two Python threads executing simultaneously, but it doesn’t change the fact that a Python thread can be preempted, meaning your global state can change at any point during execution without warning.
That thread behavior is enough to reduce the likelihood of races and collisions; particularly if the critical sections are narrow.
That just means the GIL is good at hiding concurrency bugs. It doesn’t make writing correct code any easier. Arguably you could say it makes writing correct concurrent code harder, because it’ll take significantly longer for concurrency bugs cause errors.
> Then we need a term for when code race conditions are possible but rare enough that nobody using the software notices. thread-timebomb?
There's already a term for that: not thread-safe.
The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occurr. It only assesses whether a specific class of problems is eliminated or not.
>The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occur.
Well, obviously.
The challenge I am putting forth on HN is to meaningfully describe _usable_ thread-unsafe software. If you've spent enough time outside university, you'll be aware that there are all kinds of theoretical race conditions that are not triggered in practical use.
That reminds me how I was called to fix some Java service, which was successfully in production for 10 years with hardly any incident, but it suddenly started crashing hard, all the time. It was of course a thread safety issue (concurrent non-synchronized access to hashmap) which laid dormant for 10 years only to wreak havoc later.
Nothing obvious changed (it was still running a decade old JRE), perhaps it was a kernel security patch, perhaps a RAM was replaced or even just the runtime data increased/changed in some way which woke up this monster.
Fun fact, I actually do! It's from that perspective I wrote that: every time you perturb the software environment, a new set of bugs that didn't happen in the old env before arises.
That's not useful. If you have a race condition, you will eventually hit it and when you do, you may get incorrect results or corrupt data. Thread unsafe is thread unsafe, regardless how rare it appears to be.
Also, rare on one computer (or today's computer) might not be rare on another (tomorrows faster one for example).
These types of bugs are also very hard to detect. You might not know your data is corrupted. Reminds me of how bad calculations in excel has cost companies billions of dollars, except now, the calculations could be "correct" and the error sitting dormant, just waiting for the right timings to happen. Much better to not make assumptions about the safety and think about it up front: if you are using multiple threads, you need to carefully consider your thread safety.
There is no such thing as probability. All there is is possible and not possible.
I don't know how the point of the comment could be missed, but what I am saying is, it is a mistake, a rookie baby not-a-programmer not even any kind of engineer in any field, to even think in those sorts of terms at all. At least not in the platonic ideal worlds of math or code or protocol or systems design or legal documents, etc.
Physical events have probability that is unavoidable. How fast does the gas burn? "Probably this fast"
There is no excuse for any coder to even utter the word "likely".
The ONLY answers to "Is this operation atomic?" or "Is this function correct?" or "Does this cpu perform division correctly?" Is either yes or no. There is no freaking "Most of the time."
"Likely" only exists in the realm of user data and where it is explicitly created as part of an algorythm.
There are whole branches of computer science and IT dedicated to reducing the likelihood of unpleasant outcomes: cryptography, security, disaster recovery etc.
You cannot guarantee your public key algorithm is impossible to break, but you can use keys long enough that an attacker has an arbitrarily low chance of succes with the best known methods.
You cannot prove your program is bug free, outside of highly specialized fields like aircraft control, but you can build a multi-layered architecture that can reduce the likelihood of successful intrusion. You cannot prevent a EMP bomb from wiping all your hard-drives at once, but you will likely maintain integrity of your database for uncorrelated hardware errors.
"Likely" is a tool that works in the real world. If you will chase mathematic certainty, your competition will likely eat your lunch.
Where you might be correct is that "unlikely" is very close to "likely" in the particular topic of thread safety, you just need a sufficiently large userbase with workloads and environments sufficiently different from your test setup.
I have already eaten my competitions lunch through not being afraid of a little rigor, and not leaving a wake of shit that only works on good days behind me.
From running the same software on two moderately powerful embedded systems, one single-core and one multi-core, the latter is a lot more reliable in immediately exposing races and concurrency issues.
Is this true? It looks like += compiles to four bytecode instructions: two loads, an increment, and a store. It should be possible for a thread to get paused after the load but before the store, resulting in a stale read and lost write.
I have seen that once in supposedly thread-safe C++ software in an industrial automation application. What happened was that it was somehow relying on the Windows UI messaging system (the message pump) which is single-threaded, and the fact that there were no multi-core CPUs when that stuff was written a long time ago. The latter has the effect that the CPU always sees a consistent view on the memory, no locks, mutexes and barriers needed.
Porting the thing to a multi-core system revealed that there were a lot of nasty concurrency bugs, like dead-locks or crashes which happened after a day of operation. And this wasn't a toy system - it was in use for a long time in an industrial application, and the customer was not too happy about the intermittent dead-locks. I commiserated with the poor engineer who had the quite stressful task to debug this, equipped with a lot of dedication but an insufficient background.
Frankly, while it would be nice to be able to write parallel code in pure Python, I think that Clojure with its purely-functional approach has the better concepts for this. And moreover, actually improving performance by parallel computation (using several CPUs to work in parallel on the same thing) is damn hard and unsolved in many cases (just come up with an efficient parallel Fast Fourier Transform and you might get a Turing award). What is mostly needed (outside of massive data processing pipelines) is concurrency for event-driven systems. Python can handle that, Clojure does handle it in a much more elegant way.
The GIL doesn't really help Python code though, because the interpreter may switch threads between any two opcodes.
It only protects the state of the Python interpreter and that of C/Cython extension modules. Though even there, you can have unexpected thread switches, e.g. in Cython `self.obj = None` can result in a thread switch if the value previously stored in `self.obj` had a `__del__` method implemented in Python.
And AFAIK pretty much any Python object allocation can trigger the cycle collector which can trigger `__del__` on (completely unrelated) objects in reference cycles, so it's pretty much impossible to rely on the GIL to keep any non-trivial code block atomic.
I have read many anti-GIL arguments over the years that approach soundness as optional. Is this change going to make a bunch of previously sound code unsound?
> These changes are major enough that a fair number of existing Python libraries that work directly with Python’s internals (e.g., Cython) would need to be rewritten. But the cadence of Python’s release schedule just means such breaking changes would need to be made in a major point release instead of a minor one.
If this is as promising as it sounds, it seems Python 4 now has its "thing" and is on the horizon. Or at least may become a serious thing to talk about
I began using python during the python3.0 betas, and I watched the 2 vs 3 saga from the (unusual?) perspective of a v3 hobbyist with no back-compat requirements.
What struck me as most significant was the opportunistic breakage of things not related to the unicode transition. In the many years it took to win people over to v3, they could have marched over all the breaking changes a year at a time. Given that side-by-side installs of python3.x point versions are very functional, with or without venvs, this would have been much more palatable. Perhaps harder than it sounds though.
I attempted a couple of 2to3 translations of open source libraries over the years, with varying degrees of success. Every time I found that most of the changes were easy, but debugging the broken bits was hard due to the sheer volume of source changes. If instead I could have done conversions where there was only a single major semantic change at a time, it would be so much easier to figure out what was going wrong at any given step. Furthermore, I imagine that a single-breaking-change mentality would lead to better documentation on how to transition for each version.
For this reason, I have become rather suspicious of yearly release schedules. Swift is even more frustrating: the version changes are really just dictated by Apple's yearly PR calendar. Some big things get rushed out for WWDC before they are ready, and smaller fixes can get held back until the next year. I would much rather that the language teams just prioritize one thing at a time, release it when it is ready, and foster a community where staying up-to-date on the latest version is easy and desirable (a more complicated story for Apple than for Python I think, due to ABI, OS version, etc).
From past discussions on HN I've gathered that there is such a thing as release fatigue, where developers get irritated when libraries release breaking changes too often. Nevertheless I often wonder if languages and libraries could improve faster by making more breaking changes, one at a time, with robust side-by-side installs to facilitate testing across versions. I wish side-by-side library versions were possible in Python, just to facilitate regression testing.
Bringing this all back to the post, I sincerely hope that if Python 4 is a breaking change to the GIL, that it will be only that.
I'm curious what others think about all this. Thoughts?
If every release has a single breaking change, then that language is said to be unstable/not-production ready. IMHO, that's not at all an acceptable way of doing point releases. People will just be scared of new releases. No one will adopt a new language version as soon as it releases. Java never breaks backwards compatibility and still there are people running Java 8. Imagine what would happen if every point release carries breaking changes. It makes you feel that the language is not mature, the library ecosystem broken, since you'll have to keep track of version compatibility for each library that you use. It's a nightmare for both library developers and end-users. Few people would like to use such a language
> If every release has a single breaking change, then that language is said to be unstable/not-production ready. IMHO, that's not at all an acceptable way of doing point releases.
This.
It makes absolutely no sense to claim that having to deal with a single non-backwards compatible release is somehow worse than having to deal with a sequence of non-backwards compatible releases.
Even though the migration from Python2 to Python3 faced some resistence, if anything the decision was totally vindicated.
I thought the use case mentioned made sense, that of essentially being able to perform patches in a series, as opposed to trying to fix many breaking changes at once. You know. Iterative development. I think 'makes absolutely no sense' is a little harsh.
O boy, angry rant incoming. I'll say something petulant and overly dramatic, but I don't like the direction in which python is going, and I'm glad there's finally some news about focus on actual innovation instead of tacking on syntactic cruft.
I want the python that Guido promised me, with 2021 performance. I don't want some abhorrent committee-designed piece of middle-of-the-road shitware glue language that I must use because everyone uses it.
I want a language that doesn't spin it's single-threaded wheel in a sea of CPU cores, and I want a language that has one obvious way of doing things without needing to grok and parse dumb """clever""" hacks that will only be abused by midlevel programmers to show off hoe they saved typing a few lines of additional code.
To me, speed + simplicity = ergonomy = joy. I want a new python 4 to focus exclusively and intensely on performance improvements and ergonomy.
The walrus operator is a tired old trope to hate on, but I dont see the point personally. Same goes for the structural pattern matching thing. The tacking on of typing features feels superfluous in a language thats not compiled or even strongly typed.
But for the sake of maximum pendatry let me paste some nitpicky little detail from a somewhat recent syntactic addition:
>>> def f(a, b, /, **kwargs):
... print(a, b, kwargs)
...
>>> f(10, 20, a=1, b=2, c=3)
10 20 {'a': 1, 'b': 2, 'c': 3}
a and b are used in two ways.
Since the parameters to the left of / are not exposed as possible keywords, the parameters names remain available for use in **kwargs
Jesus fucking hell on a tricylce so now i have *'s and /'s showing up in function signatures so someone can prematurely optimize the re-use of variable names without breaking backwards comparability?!
Python is becoming a mockery, dying a death through a thousand little cuts to its ergonomics.
I'm sure you're already aware of this example since it's the canonical one, but to me personally the point is very clear: I use regular expressions all the time and always have to write that little bit of boilerplate, which the walrus operator now lets me get rid of.
Avoiding tedious boilerplate by adding nice features like the walrus operator is precisely what lets us avoid "death through a thousand little cuts to its ergonomics", in my view.
Sure, maybe writing
m = re.match("^foo", s)
if m != None:
...
isn't so bad, but in that case maybe writing
i = 0
while i < len(stuff):
element = stuff[i]
...
i += 1
wouldn't be so bad, and we could get rid of Python's iterator protocol?
I think regex matches might be literally the only use case for := that I come across with any kind of nontrivial frequency, and it's only a minor nuisance at that. Certainly nothing to warrant an entirely new yet different syntax for something we already have.
The iterator protocol is way more general than what you have; it's not remotely comparable.
Regex is just a prominent example of a certain pattern. Depending on work and style, one often has functions which return something, on which in case of a non-empty result you want to do something more. Walrus can shrink the code in data-intensive code quite well from my experience.
The loop can be written using This One Weird Trick that Walruses Hate:
for foo in iter(partial(data.get, "foo"), None):
handle_foo(foo)
So I would only use the walrus operator for the first example (the if statement), which even though it is exactly the same as doing it in two steps just feels nicer as a single step.
It should have been “if … as y” and reused existing syntax. I’ve never seen anyone use the extended variant (multiple assignment) that walrus allows. The extra colons with this and typing makes it look like a standard punctuation-heavy language we sought to avoid in the first place.
AFAIK, the purpose of “/” is so that python-implemented functions can be fully signature-(and, therefore, also type-)compatible with builtins and C-implemented functions that required positional arguments but do not accept those arguments being passed as keyword arguments.
Not the OP but some of the late additions to Python were, *in MY very humble opinion* not very pythonic; just a syntactic sugar that meant there are now more than just one way of doing things.
On that list: the walrus operator and the new switch thing. If I understand them fully and correctly, those two things don't enable developers to do things that were impossible before, instead they add new ways to do things that were possible prior.
That's the Python I know and love.
Of course, this doesn't mean I'll love Python any less, just that I wished there were more focus on staff that matters like the topic of this article. Or maybe getting type hinting better.
IMO, the "one obvious way to do things" has always been a comforting fiction. There are numerous ways to do everything, the worst offenders forcing people to make tradeoffs between debuggability and readability (ie, for loops versus list comprehensions). Many of them are purely about readability (ternary expression versus if blocks) and many of them are about style (ternary expression versus use of or/and short-circuiting). Even so, before the walrus operator, there was never a way to define a variable that only existed in the scope of a particular if statement.
After using pattern matching in Rust and switch statements in JavaScript, I personally am very excited for that addition to Python, but I understand the feature is divisive and will concede it as a matter of opinion.
Edit: turns out the walrus operator does not cause the variable to move out of scope after the if block, which is disappointing. IMO the worse anti-pattern has already been part of the language, which is not creating new scopes for if statements.
Half of python was always syntactic sugar that could be also done one way or another with more primitive code. I mean for-loops, elif, import, the whole OOP; all just redundant syntactic sugar. Pythonic never meant to make the syntax less "sweet". Python is about making simple, straight forward code which removes unnecessary friction. And that's exactly what the walrus and switch are doing on their own.
But yes, of course can one also argue that they add friction on the global scale, because it's yet another syntax-element to know about, and the benefit is rather small on surface. But that's the problem with syntax, it's always a trade-off between overhead and benefit.
I'll point out that the walrus operator was actually accepted while Guido was still BDFL (and the vitriol surrounding the decision to include it led directly to him stepping down from the position [1]), so even accepting the fact that it's a poor addition to the language, does not provide support for the statement that "design by committee" has lead to poor language design decisions.
They gave no specific criticisms. This thread was born of a request for specific criticisms. When that happens, I try to operate as though the assumptions laid out in the parents hold for the children. I think this makes sense to do, especially when you appeared to step in as a proxy expanding on the parent's opinion. Even if that wasn't your intention, this is a public thread, and the most relevant place to post things as a response to a sentiment in a thread may not be directly to a person who holds that exact sentiment. If you don't take issue with "design by committee" then you need not be concerned. I don't think you think that, and I think no less of you regardless.
Disagree: the recent changes are things I put to work immediately and in a large fraction of the code. They're not niche and "should have" been added years ago. If anything, I'm thrilled with the work of the "committee," whose judgments are better than the result of any individual. Postgres is the same.
Gone are the days when you invest in a platform like python, and they make crazy decisions that kill the platform's future (e.g. perl5). Ignore small syntax stuff like := and focus on the big stuff.
> Disagree: the recent changes are things I put to work immediately and in a large fraction of the code.
That says nothing about their quality. It just says you like them. If you gave me unhealthy food I'd probably eat it immediately too. Doesn't mean I think it's good for me.
> Ignore small syntax stuff like := and focus on the big stuff.
They're not "small" when you immediately start using them in a "large fraction of your code". And a simple syntax that's easy to understand is practically Python's raison d'être. They added constructs with some pretty darn unexpected meanings into what was supposed to be an accessible language, and you want people to ignore them? I would ignore them in a language like C++ (heck, I would ignore syntax complications in C++ to a large degree), but ignoring features that make Python harder to read? To me that's like putting performance-killing features in C++ and asking people to ignore them. It's not that I can't ignore them—it's that that's not the point.
I simply do not understand how the walrus operator is harder to read. Maybe an example?
my_match = regex.match(foo)
if my_match:
return my_match.groups()
# continues with the now useless my_match in scope
Versus
if my_match := regex.match(foo):
return my_match.groups()
# continues without useless my_match in scope
How is the second one less readable? Have you ever heard of a real world example of a beginner or literally anyone ever actually expressing confusion over this?
The problem isn't that simple use case. Although even in that case, they already had '=' as an assignment operator, and they could've easily kept it like the majority of other languages do instead of introducing an inconsistency.
The more major problem with the walrus operator is more complicated expressions they made legal with it. Like, could you explain to me why making these legal was a good thing?
def foo()
return ...
def bar():
yield ...
while foo() or (w := bar()) < 10:
# w is in-scope here, but possibly nonexistent!
# Even in C++ it would at least *exist*!
print(w)
# The variable is still in-scope here, and still *nonexistent*
# Ditto as above, but even worse outside the loop
print(w := w + 1)
If they just wanted your use case, they could've made only expressions of the form 'if var := val' legal, and maybe the same with 'while', not full-blown assignments in arbitrary expressions, which they had (very wisely) prohibited for decades for the sake of readability. And they would've scoped the variable to the 'if', not made it accessible after the conditional. But nope, they went ahead and just did what '=' does in any language, and to add insult to injury, they didn't even keep the existing syntax when it has exactly the same meaning. And it's not like they even added += and -= and all those along with it (or +:= and -:= because apparently that's their taste) to make it more useful in that direction, if they really felt in-expression assignments were useful, so it's not like you get those benefits either.
While the walrus operator gives a way to see this sort of non-C++ behavior, it's more showing that Python isn't C++ than something special about the operator.
Here's another way to trigger the same NameError, via "global":
import random
def foo():
return random.randrange(2)
def bar():
global w
w = return random.randrange(20)
return w
while foo() or (bar() < 10):
print(w)
For even more Python-is-not-C++-fun:
import re
def parse_str(s):
def m(pattern): # I <3 Perl!
nonlocal _
_ = re.match(pattern, s)
return _ is not None
if m("Name: (.*)$"):
return ("name", _[1])
if m("State: (..) City: (.*)$"):
return ("city", (_[2], _[1]))
if m(r"ZIP: (\d{5})(-(\d{4}))?$"):
return ("zip", _[1] + (_[2] if _[2] else ""))
return ("Unknown", s)
del _ # Remove this line and the function isn't valid Python(!)
for line in (
"Name: Ernest Hemingway",
"State: FL City: Key West",
"ZIP: 33040",
):
print(parse_str(line))
Right, I'm quite well-aware of that, but I'm saying this change has made the situation even worse. If they ensured the variables were scoped and actually initialized it'd have actually been an improvement.
# w is in-scope here, but possibly nonexistent!
# Even in C++ it would at least *exist*!
because I don't see how bringing up C++'s semantics is relevant when Python has long raised an UnboundLocalError for similar circumstances.
If I understand you correctly, you believe Python should have introduced scoping so the "w" would be valid only in the if, elif, and else clauses, and not after the 'if' ends.
This would be similar to how the error object works in the 'except' clause:
>>> try:
... 1/0
... except Exception as err:
... err = "Hello"
...
>>> err
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'err' is not defined
If so, I do not have the experience or insight to say anything meaningful.
In your example, if you leave out the parentheses around w := bar(), you get "SyntaxError: cannot use assignment expressions with operator" which makes me think it's a bug in the interpreter and not intentionally designed to allow it.
I am baffled to learn that it's kept in scope outside of the statement it's assigned, and I agree it would have a negative impact on readability if used outside of the if statement.
> if you leave out the parentheses around w := bar(), you get "SyntaxError: cannot use assignment expressions with operator" which makes me think it's a bug in the interpreter and not intentionally designed to allow it.
No, I'm pretty sure that's intentional. You want the left-hand side of an assignment to be crystal clear, which "foo() or w := bar()" is not. It looks like it's assigning to (foo() or w).
def thing(): return True
if thing() or w:= "ok": # SyntaxError: cannot use assignment expressions with operator
pass
print(w)
. . .
if thing() or (w := "ok"):
pass
print(w) # NameError: name 'w' is not defined
The first error makes me think your concern (that w is conditionally undefined) was anticipated and supposed to be guarded against with the SyntaxError. I believe the fact you can bypass it with parentheses is a bug and not an intentional design decision.
Oh I see, you're looking at it from that angle. But no, it's intentional. Check out PEP 572 [1]:
> The motivation for this special case is twofold. First, it allows us to conveniently capture a "witness" for an any() expression, or a counterexample for all(), for example:
if any((comment := line).startswith('#') for line in lines):
print("First comment:", comment)
else:
print("There are no comments")
I have a hard time believing even the authors (let alone you) could tell me with a straight face that that's easy to read. If they really believe that, I... have questions about their experiences.
Your new example makes me wonder: if I can intentionally conditionally bring variables into existence with the walrus operator, what's the motivation behind the SyntaxError in my statement above? I maintain my belief that the real issue here is, readability aside, if blocks do not implement a new scope, which has always been a problem in the language. The walrus operator just gives you new ways to trip over that problem.
From the PEP:
> An assignment expression does not introduce a new scope. In most cases the scope in which the target will be bound is self-explanatory: it is the current scope. If this scope contains a nonlocal or global declaration for the target, the assignment expression honors that. A lambda (being an explicit, if anonymous, function definition) counts as a scope for this purpose.
I find this particularly strange and inconsistent:
lines = ["1"]
[(comment := line).startswith('#') for line in lines]
print(comment) # 1
[x for x in range(3)]
print(x) # NameError: name 'x' is not defined
I'm saying it's the same reason why (x + y = z) should be illegal even if (x + (y = z)) is legal in any language. It's not specific to Python by any means. The target of an assignment needs to be obvious and not confusing. You don't want x + y to look like it's being assigned to.
There are two aspects I have been thinking about while looking at this: Introduction of non-obvious behavior (foot-guns) and readability. Readability is important, but I have been thinking primarily about the foot-gun bits, and you have been emphasizing the readability bits. I can't really accurately assess readability of something until I encounter it in the wild.
If the precedence was higher then you'd get a situation like
x := 1 if cond else 2
never resulting in x := 2 which is pretty unintuitive.
And you have to realize, even if the precedence works out, nobody is going to remember the full ordering for every language they use. People mostly remember a partial order that they're comfortable with, and the rest they either avoid or look up as needed. Like in C++, I couldn't tell you exactly how (a << b = x ? c : d) groups (though I could make an educated guess), and I don't have any interest in remembering it either.
Ultimately, this isn't about the actual precedence. Even if the precedence was magically "right", it's about readability. It's just not readable to assign to a compound expression, even if the language has perfect precedence.
I know they don't, normally. I really thought that was basically the point of the walrus operator to begin with, that the variable was only in scope for the lifetime of the if statement where it's needed. Huge bummer to find out that's not true.
Scoop in python is normally defined by functions/methods, not blocks. The same happens with for-loos and with-blocks. So this is consistent. And this good, because it can be very useful. The exception here are try/except-blocks, where the fetched error is cleaned up after leaving the except-block, for reasons.
IMO the real abomination was already present in the language, which is that if blocks do not introduce new scope. My IDE protects me from the bugs this could easily introduce when I try to use a variable that may not yet be in scope, but it should be detected before runtime.
I will readily admit that the walrus operator doesn't do what I thought it did and I have no interest in whatever utility it provides as it exists today.
> IMO the real abomination was already present in the language, which is that if blocks do not introduce new scope.
Definitely. You would think if they're going to undermine decades of their own philosophy, they would instead introduce variable declarations and actually help mitigate some bugs in the process.
I don't know how important this is, but I believe it does make it less readable for "outsiders".
As a non-Python programmer it is usually pretty easy for me to correctly guess what a piece of Python code does. (And once in a while I need to take a look at some Python code).
Walrus operator got me. I tried to guess what it did, but even having simple code examples I could not. My guesses were along the lines of binding versus plain assignment, or some such. None of my guesses were even close. I had to google it to find out (of course I could also read the documentation).
IMO the match statement has some very unintuitive behaviour:
match status:
case 404:
return "Not found"
not_found = 404
match status:
case not_found:
return "Not found"
The first checks for equality (`status == 404`) and the second performs an assignment (`not_found = status`).
`not_found` behaving differently from the literal `404` breaks an important principle: “if you see an undocumented constant, you can always name it without changing the code’s meaning” [0].
Actually, I don’t really want the feature. It’s complicated and it doesn’t really fit with the rest of the language (it breaks fundamental rules, as above, and has scoping issues).
Worst of all, though, it’s really just another way to write if-elif-else chains that call `isinstance` - a pattern which Python traditionally discouraged in favour of duck-typing.
Do you not like the idea of pattern matching as a feature or do you not like the implementation details? This kind of seems like another clumsy scoping problem, no?
I would love a good pattern matching feature, but this is not it. And this is a seriously broken design at a fundamental level, not an "implementation detail". I actually have no clue how it's implemented and couldn't care less honestly. I just know it's incredibly dangerous for the user to actually use, and incredibly unintuitive on its face. It's as front-and-center as a design decision could possibly be, I think.
And no, this is not really a scoping issue. Match is literally writing to a variable in one pattern but not the other. A conditional write is just a plain inconsistency.
The sad part is both of these features are stumbling over the fact that Python doesn't have variable declarations/initialization. If they'd only introduced a different syntax for initializations, both of these could have been much clearer.
> I actually have no clue how it's implemented and couldn't care less honestly.
I guess I'm not sure where "design" ends and "implementation" begins? To me, how to handle matching on variables that already exists is both, because "pattern matching and destructuring" are the features and how that must work in the context of the actual language is "implementation". It being written in a design doc and having real world consequences in the resulting code doesn't make it not part of the implementation.
Instead of quibbling over terms, I was much more interested in whether you like the idea of pattern matching.
I think not liking the final form a feature takes in the language is fundamentally different from wholesale disliking the direction the language design is going.
Design is the thing the client sees, implementation is the stuff they don't see. In this case the user is the one using match expressions. And they're seeing variables mutate inconsistently. It's practically impossible for a user not to see this, even if they wanted to. Calling that an implementation detail is like calling your car's steering wheel an implementation detail.
But I mean, you can call it that if you prefer. It's just as terrible and inexcusable regardless of its name. And yes, as I mentioned, I would have loved to have a good pattern matching system, but so far the "direction" they're going is actively damaging the language by introducing more pitfalls instead of fixing the existing ones (scopes, declarations, etc.). Just because pattern matching in the abstract could be a feature, that doesn't mean they're going in a good direction by implementing it in a broken way.
I guess like they say, the road to hell is paved with good intentions.
> Design is the thing the client sees, implementation is the stuff they don't see.
By this definition, bugs and other unintended consequences that the user encounters are "Design".
> Calling that an implementation detail is like calling your car's steering wheel an implementation detail.
Yes, if there weren't so many important decisions behind the outcome of a car being steered with a steering wheel, it could be a steering handle, or a steering joystick, or just about anything else that allows you to orient the front wheels of the car. The same is true of the pedals on the floor. Those could be implemented as controls on the steering wheel instead. Whether it's an implementation detail depends on the specificity of the feature in question. When I asked you about an "implementation detail", it was scoped to "the feature is pattern matching" (can I steer the car?) and you scoped it to "the feature is pattern matching without overwriting variables conditionally in surprising ways" (can I steer the car with failure modes that aren't fatal?).
Yes, I am definitely now quibbling over terms. I'm not sure what an appropriate response would have been? Just silence? You responded rather uncharitably and I didn't like it. I felt the need to defend my position.
Not silence, just continuing whatever your underlying point was if your goal wasn't to quibble over semantics. Now I have no idea what you're referring to, but this is clearly getting personal, so let's just leave this here. I think at this point we're both clear on the concrete problems with these features and what our positions are.
> If instead I could have done conversions where there was only a single major semantic change at a time,
That was the point of the "from __future__" imports. You could get most of the way toward Python 3 so that 2to3 would be easier to work with and the new semantics could be gradually baked into the code prior to migration.
Python 3 had 25 years of cruft to clean up. They won't have to do that again.
Idealists and free developers (who have created the majority of the Python interpreter) agree with you.
Corporate developers, who have taken over Python and other people's work, like unnecessary changes, because they get many billable hours of seemingly complex work that can be done on autopilot.
Corporations might even take over more C extensions whose developers are no longer willing to put up with the churn and who have moved to C++ or Java.
In the long run, this is bad for Python. But many developers want to milk the snake until their retirement and don't care what happens afterwards.
"Corporate developers, who have taken over Python and other people's work, like unnecessary changes, because they get many billable hours of seemingly complex work that can be done on autopilot."
In my 20+ year career I have never worked with a programmer that matches this description.
I'd wonder if it would be easier to introduce a totally new API along the lines of ruby's ractor API[1] that enables thread parallelism while keeping existing Thread behavior identical as with the GIL. Tons of python code relies on threaded code that is thread-safe under the GIL, but would completely blow up if the GIL was naively replaced.
Yeah, That's what I thought. I think the greatest barrier now is that most multithreaded python code right now is just barely thread-safe, even with the GIL. I occasionally have to remind colleagues that even though the GIL guarantees instructions are atomic, you need to use mutexes and other synchronization primitives to ensure there is no race condition between multiple instructions. I'd imagine this change would be an optional interpreter feature initially, since removing the GIL would break the vast majority of code out in the wild, and it would be much more difficult to create an automated conversion tool like they did with the syntactic changes between 2.7 and 3
Cython itself should be a relatively simple fix (relative to the difficulty that Cython devs are accustomed to). Libraries that use Cython in a pure way (that is, not fussing with refcounts in hand-written C code) should "just work" after Cython gets updated. It's the poor folk who have done straight C extensions without the benefit of Cython that I'm concerned about.
A simple solution would be to introduce two new types: ConcurrentThread and ParallelThread. Alias the old Thread to the ConcurrentThread and keep the behaviour. No breaking changes, easy to explain the differrence. People who need it can use the new truly parallel version.
The story of losing GIL is very popular in the news, and I like it too!
.. but. Let's not count our chickens until they are home. I'm wondering if the Python dev community will take on this challenge. I hope so, Sam seems to really have put in a lot of effort!
In my opinion, there's a time in a language's life were it should slow down the pace of "innovation". Code bases are complexe things and updating and upgrading them constantly just to keep up with the language maybe counterproductive.
Python is there now, if you ask me. It should slow down and focus more on "maintenance" stuff with little to no impact on its interface. And maybe work on big projects like multithreading or stronger typing on the background and them when they're fully ready.
The gradual typing available now seems suitable. I have written plenty of code in typed contexts and plenty without. Python's "consenting adults" approach seems a win.
Perhaps, without the GIL and with typing information included, additional performance gains with be on offer.
But the "have it your way" nature of Python is a bigger win than either end of the data typing spectrum.
We won't get data race free guarantees but if built into pandas or Vaex we can have a near transparent API.
It'll really open things up for those apps running on 32 core machines. They're out there, I deploy these things frequently (Plotly Dash framework for large Enterprise customers).
Benchmarks seeing a 19.4x improvement going to 20 threads, an almost linear speedup. That's pretty amazing, in Java I feel I never manage to achieve linear speedup past a few threads. How are they managing the overhead?
Amdahl's law? It depends on the code. If you throw 100 cores at half of your code, that part may get 100x faster. But if the other half is single threaded you'll only see overall performance double.
I use Ruby and not Python, but I think both have a lot of the same benefits and weaknesses.
IMO, removing the GIL is a major mistake. The GIL is what allows you easy concurrency and to keep the language's 'magic' while ensuring correctness. If you need parallelism, there's processes and probably other tactics (I'm not super up to date on Python things). If you simply remove the GIL you have a bunch of race conditions, so you need a bunch of new language constructs, and it just adds a bunch of complexity to solve problems that don't really need solving.
IMO they should just do what Ruby did with Ractors; basically a cheap alternative to spawning more processes. Rewriting absolutely everything that uses threads to be thread-safe is a waste of time.
It's already easy to write race conditions in Python.
if x in d:
del d[x]
else:
d[x] = True
Is a classic example -- if two threads execute that, you can't predict the outcome (but a KeyError is quite likely)
The GIL only protects the CPython virtual machine; it doesn't protect user code. Concurrent code with shared mutable state already needs explicit mutexes.
What thread safe code can I write with the GIL that will have a race without it?
I already have to be careful to only write to a shared object from one thread, since I have no guarantees on order of execution.
The main benefit of the GIL, from my recent reading is that it makes ref counting fast and thread safe. The meat of the proposal is changing ref counting so that it's almost as fast and atomic without the GIL.
What about setting a simple boolean flag, e.g. setting "cancelled = True" in the UI thread to cancel an operation in a background thread?
In Java you would have to worry about safe publication to make the change visible to the other thread, but thanks to the GIL changes in Python are always (I think?) made visible to other threads.
changes are always eventually made visible to other threads in all modern languages - it's the timing of thread wakeup and the possibility that the thread may not flush a cache that is uncertain. the GIL doesn't magically solve race conditions, and in Java and other such languages, you usually have a keyword like volatile that can make flushes explicit. In any case, I very much doubt the GIL work would affect Python's cache coherency model.
The GIL is not a feature but a design problem with both Ruby and Python. Implementations of both without a GIL have existed for quite long (e.g. jvm based implementations like jython and jruby). It's fine; not having a GIL is an enabler. E.g. being able to run ruby on rails on an application server with threads for each connection used to be a popular thing. I've migrated a few ruby things to jruby. It's shockingly easy. Mostly stuff just works.
The nice thing about everything being single threaded is that nothing will break if you remove the GIL. It will still be single threaded. It's only when you actively start using multiple threads that things might break. So, you won't have race conditions until you do that and then only if you do things that you shouldn't be doing like sharing things across threads that you should not be sharing because they aren't thread safe.
Removing the GIL will simply enable people to start gradually fixing things and give them the option to use threads instead of forcing them to use completely different languages.
Ractor style asynchronous programming might be a good idea for python as well. One does not exclude the other.
No new race conditions are expected as this work replaces Gil with finer grained performant locks. It still lowers performance of single threaded code some vs no Gil (about 10 percent) but 10 percent drop in single threaded performance is worth considering for multi threading.
The race conditions a Gil program and a no Gil program should have should be same. A Gil is not the only way to keep certain operations safe.
It is trade off. Pure single threaded code is worse in exchange for much better multithreaded code. And the current sentiment in the python dev mailing list looks positive. Previous attempts at Gil removal had much bigger drop in single threaded performance.
> Multithreaded performance, on some benchmarks, scales almost linearly with each new thread in the best case—e.g., when using 20 threads, an 18.1× speedup on one benchmark and a 19.8× speedup on another.
Interesting. When I write Python, and I write Python most of the time, I'm not chasing performance. But when it comes to speed, a 18x speedup gets Python up to par with currently much faster languages like Java. At least if you are willing to spam threads like your life depends on it.
I do not understand the significance of this. If i want serious numbercrunch multi-thread performance in python, i execute it on the gpu?
conda install numba & conda install cudatoolkit comes to mind?
And you should data orientated, so all that remains from objects is arrays of structures with the index being the object indicator. Im honestly puzzled about the use-case..
But what would be nice currently, is to create map reduce API w/o shared data or just readonly data.
Something like ProcessPoolExecutor but instead of spawning new process and pickling input/output data, create threads w/o GIL w/o pickling input/output data.
Knowledge of the GIL forces a lot of python developers to not try to write multithreaded scripts. Doing away with it will make life harder for many folks, IMO. Have you tried explaining multithreading to some python scripters?
I'm a random commenter with a chip on his shoulder.
<grumble> Please escape Discord links so people don't accidentally click a live one and thereby expose data to Hammer & Chisel, Inc. </grumble>
It's a constant slap-in-the-face how prevalent Discord is in the tech community. Need support with an obscure library? Discord. Want to talk to contributors for a project? Discord.
I honestly don't know why people so greatly prefer Discord to IRC. The interface? Network effects? Whatever it is, Discord and H&C are disgusting, and I pray the tech community finds a way to escape it.
Fill you in on what? I'd be happy to, I'm just missing context.
Also, it's so disheartening to see people tout Discord's interface as its 'killer feature.' As far as I'm concerned, if I can't mold it into a decent TUI for my terminals, it's okay at best. If an app actively resists its interface being molded, however, that's just evil.
Why I Hate Discord (A Manifesto) [Without Sources]
Discord is aggressively proprietary.
People ought to own their data, but Discord's architecture ensures everything gets hoovered into the mother-ship. At first, this was for nothing more than to facilitate their client-server communications model; recently, all of the hoovered data gets submitted to their AI moderation platform. (searched for sources on this but it was very hard to find anything. I remember talk about this ~c. Nov 2020, might be wrong)
I should be able to modify an interface to my tastes. Modifying Discord is explicitly against their ToS, including their interface; attempting to do so will lead to a ban. Don't like their painted whore? Prefer to chat from your bespoke terminal? GTFO, Hammer & Chisel knows what's best for We Peons.
Addenda: I find Hammer & Chisel developers and Discord admins to be disgusting. This is hearsay & personal opinion from my time hanging around the developer chats, but they all came across as nasty people. I, personally, believe many of the news reports surrounding the grooming controversies; searching "Discord admin controversy," "Discord allthefoxes controversy," "Discord cub policy," &c. turn up some relevant articles. Most of these articles are from low-quality reporting shops, but I buy in to the narrative.
Here's a use case: I was training a neutral net, and wanted to do some preprocessing (similar to image resizing, but without an existing C function). Inputs are batched, so the preprocessing is trivially parallelizable. I tried to multithread it in python, and got no speedup at all.
That was a really sad moment, and I've never felt good about python since.
Except for embarrassingly parallel problems, the trade-off of generating more parallelism is usually needing finer-grained communication. Canonical examples in the literature are matrix multiplication, triangulation, and refinement.
Large shared state is basically always the answer. You can cop-out and say use a database or Redis if that’s fast enough but that’s just making someone else use many threads with shared memory.
MySQL comes to mind. Unlike Postgres where connections are expensive MySQL encourages loading up the server with hundreds of simultaneous connections per server.
Like anything, it’s possible to split this work out to a separate process but the IPC overhead is a lot.
Depends on the workload if it matters or not. Some it matters a ton, some not a bit. And if you have n=cores processes, you can get some benefits with collocation and the like the OS can do for you.
I’m not saying performance is not different between these types of solutions or the dev work is not sometimes different.
I’m saying that the penalties (including development effort) for work co-ordinated between processes compared to threads can vary from nearly zero (sometimes even net better) to terrible depending on the nature of the workload. Threads are very convenient programmatically, but also have some serious drawbacks in certain scenarios.
For instance fork()’d sub-processes do a good job of avoiding segfaulting
/crashing/dead locking everything all at once if there is a memory issue or other problem when doing the work, it’s very difficult in native threading models to do per-work resource management or quotas (like maximum memory footprint, or max number of open sockets, or max number of open files) since everything is grouped together at the OS level and it’s a lot of work to reinvent that yourself (which you’d need to do with threads). Also, the same shared memory convenience can cause some pretty crazy memory corruption in otherwise isolated areas of your system if you’re not careful, which is not possible with separate processes.
I do wish the python concurrency model was better. Even with it’s warts, it
is still possible to do a LOT with it in a performant way. Some workloads are definitely not worth the trouble now however.
Last I did this, when the processes were fork()‘s of the parent (the typical way this was done), memory overhead was minimal compared to threads.
A couple %. That was somewhat workload dependent however, if there is a lot of memory churn or data marshaling/unmarshalling happening as part of the workload, they’ll quickly diverge and you’ll burn a ton of CPU doing so.
Typical ways around that include mmap’ng things or various types of shared memory IPC, but that is a lot of work.
Also generally no one spins up distinct, new processes for the ‘co-ordinated distinct process work queue’ when they can just fork(), which should be way faster and pretty much every platform uses copy-on-write for this, so also has minimal memory overhead (at least initially)
The problem is (perhaps amusingly) with refcounting. As the processes run, they'll each be doing refcount operations on the same module/class/function/etc objects which causes the memory to be unshared.
Only where there is memory churn. If you’re in a tight processing loop (checksumming a file? Reading data in and computing something from it?) then the majority of objects are never referenced or dereferenced from the baseline.
Also, since the copy on write generally is memory page by memory page, even if you were doing a lot of that, if most of those ref counts are in a small number of pages, it’s not likely to really change much.
It would be good to get real numbers here of course. I couldn’t find anyone obviously complaining about obvious issues with it in Python after a cursory search though.
Interactive use of Python: plotting, working with data, also would benefit from better multithreading. It's interactive, so it's (a bit) frustrating to wait for it to compute and see that it uses just one thread (the statistics ops are usually well threaded already, but plotting is not).
I have seen and written Python code that spawns various threads with shared mutable state. Is it possible that some day the same code would run in parallel? That could be a terrible (very) breaking change. I'm not against allowing in-process parallel execution but please let it require a new API.
Translation: I have written buggy, racy software that has specific dependencies on thread timing. Please do not make significant improvements to Python because it will reveal these bugs in my software, and I will be forced to fix the bugs and use proper synchronization.
If you make something that works because of an explicit memory and concurrency model (and not like there are other options at the time), it is indeed legit to worry about a major shift to those models that would cause problems.
Even if those changes are better for other ways of solving problems.
Is that how you call things that have been working flawlessly and solving people's problems for over 10 years? Is needlessly breaking things that work an improvement to you?
You could probably simply lock the Python version you use for such code. No breakage there. If you must upgrade to a newer Python version, then you will have to repair broken code.
It did buy a decade or so (or more really) - not like the python2 distribution you downloaded and distribute with your program back then is going to get tracked down and shot in the head by Guido anytime soon.
If you’re relying on whatever python version is distributed with whatever machine it happens to be on, there are a huge number of problems you’re already going to have.
It's concurrent not parallel. The switch won't happen inside the execution of one opcode including some dictionary update operations so it's safe in many cases where parallel execution isn't.
Yes, single-instruction operations would be fine, but if you're writing multithreaded code you are probably doing things that the GIL doesn't protect all the time. Like dict-updates on classes that implement __set__, or `if not a[x]: a[x] = y` sorts of two-phased checks, or just like, anything else. You can't get very far with global state without reckoning with concurrency, GIL or not.
I assume that a change to relax the GIL will both allow you to opt-out of it, and allow you to use locking versions of primitive data-structures, anyway; it's not like it's going to just vanish overnight with no guardrails.
It is probably a bad practice to not acquire a mutex for that concurrent dictionary update. The code should be improved in that regard, with or without any potential Python language change.
If performance isn't hugely important you could make blanket-locking wrappers around common data structures and swap them in-place for all of your global state.
.. but, as I said, removing the GIL will almost certainly be opt-in.
It really is, they don't make it clear at all. Every time I have to ask the question of "is this atomic under the GIL" I struggle to find the right answer.
That was my first thought as well. I use Python to whip up quick scripts and enjoy not having to worry about shared memory, even when I'm using concurrency. I'd hate to lose that.
The difference is that it is so much easier, by orders of magnitude, to write code that gets shit done in python than any native language I'm aware of.
I did ask a question. The second part is me being belligerent but the question was sincere.
I think Python is shit but so is most everything else, I'm interested in whether people jump ship or just work around it's issues at scale. I work on a programming language designed to avoid messy python scripts internally, so I am sincerely interested in these decisions.
The question is unanswerable. Python started as a scripting and prototyping language and that hasn't and won't completely change. It's fantastic at what it does from that perspective, late additions of complexity notwithstanding.
Performance doesn't come from any one quality, but from the holistic goals at each level of the language. I think some of the most frustrating aspects from the history of Python have been when the team lost focus on why and how people used the language (i.e. the 2 -> 3 transition, though I have always loved 3). I hope that this is a sensible optimization and not an over-extension.