Funny, the pathlib functionality is really similar to a saturday morning hack that I was playing with a few months ago.
Pathlib, asyncio, unicode default, bundled pip and venv...
I do like the look of Python3. It's looking a lot more interesting, the further along it gets. I still haven't really done anything with it, due to python2 being bundled with our OSX and Linux workstations, which means painless installs. This year, however, I do want to start porting my projects over to Python3 (well, making them run on both, at least...)
What surprised me is that after a long time with string based path libs, Java and Python are now moving onto oo based ones. I wonder how many other languages already had abstractions for this (smalltalk ? ruby?).
What I'm glad is that apparently they did a good job in converting it, and not converting it to OO just for the sake of it and resulting in a crappy API "but it's OO"
"Note This module has been included in the standard library on a provisional basis. Backwards incompatible changes (up to and including removal of the package) may occur if deemed necessary by the core developers."
Most new modules in the standard library come with that warning. It's pragmatic.
Python has a relatively large standard library (one of its selling points), and not everything in the standard library can be right the first time. No matter how long a Python release spends in beta, there are probably flaws that will only be discovered when people start using new modules to do their actual jobs (where they won't be using a beta version of Python).
In Python 1 and 2, if a standard library module had a flaw that required a backwards-incompatible change to fix, the flaw would have to remain there. In Python 3 they acknowledge that the first version may require backwards-incompatible changes.
As far as I know it's always a string. You just have a bunch of helper methods to handle "path strings". But they're still strings, not Path objects (or File, like in Java).
It makes me slightly sad that the median routine just sorts the data, and looks at the middle. I was hoping for quickselect, which uses 4N comparisons in the average case.
or maybe Blum, Floyd, Pratt, Rivest, and Tarjan if you're being fancy, which is 24N comparisons in the worst case (but more than 4N for the average case).
A few months ago I wrote a C implementation of a lazy "sorted" function, (https://github.com/naftaliharris/lazysort). One of the things you can use it for is computing medians, with the same "sort-then-take-the-middle-element" algorithm that the statistics module uses. Internally, however, what will happen is that the lazy data-structure that results from a call to the lazy "sorted" function will find the middle element with quickselect.
For lists of size 10M, I got a 7.3x speedup vs using the sorted function, and a 1.6x speedup vs using numpy.median. The sorted function was faster up until lists of size 100, presumably due to lower overhead. (The lazily-sorted list data-structure also keeps track of the pivots resulting from partitions, so that later method calls run faster by exploiting the partially-sorted structure that remains from earlier method calls).
Here's a plot of the time to compute the median for these different algorithms, (taken from a paper I wrote that's under review): http://imgur.com/oX1QLnS
I haven't got time at the moment, but I would consider contributing an implementation in the long run, if no one else does by then. It just shouldn't be O(n log(n)) when it doesn't have to be.
I've just spent the morning implementing various "faster" median algorithms, and it seems that constant factors, function call overhead (for recursive quickselects), etc make all alternative approaches quite slow. The sort-then-select approach is pretty damn fast, especially since I also think / am pretty sure sorted() is implemented in C...
So far I've tried median-of-medians, recursive quickselect, iterative quickselect tracking indices and partitioning in place, and heapq.nlargest. The implementation in Python 3.4.0 is both cleaner/easier to read and faster than anything I can make by an order of magnitude for 10,000 element lists. I'm sure someone else here can do better than me, but (s)he'd have a hard time beating CPython, imo.
The reason it's for non-primitives is because it's optimized for sorting lists where comparison is relatively expensive , such as the dereferencing Python does on every item and Java does for non-primitive items.
Thanks for putting the effort in and testing it! Very interesting results.
That's very interesting. I used a C implementation of quickselect in that past that was way faster than sorting. That said, my data was also larger than 10,000 elements.
FWIW, here is some old timing data on a 2008 era Mac:
* approx timing on my Mac
* n, time in ms
* 10, <1
* 100, <1
* 1000, <1
* 10000, <1
* 100000, 30
* 1000000, 230
* 10000000, 2530
* 100000000, 33000
You are probably right that pure python will have trouble competing with a C sort, except for quite large N.
This might be a stupid question, but would you know where the the source for the statistics module can be found? I would be quite interested in submitting a patch, but I can't seem to find the source.
Tangential observation: most implementations of quickselect to find the median I've seen use 6N comparisons in the average case; for an even length list they select the middle values using two independent quickselects. The code to do it in a single quickselect gets hairy quite quickly.
Or maybe Introselect, which does Quickselect, but switches to Median-of-Medians (i.e., BFPRT) if the recursion depth gets too big. It has the average-case performance of Quickselect, but, like M-of-M, requires only O(n) comparisons when given a list of length n.
The whole statistics module seems very minimal to me, I was hoping for it to include distributions and random number generators etc. but it only includes a few very basic functions, and not even the best implementation of them. Hopefully it will expand in future releases, but I don't really see the point of including it with such limited functionality.
People who need distributions and more advanced features probably want numpy anyway, which already has all of that. This just covers the very basics that people might run into while doing 'normal' non-statistical programming.
Asyncio sounds promising, does anyone with more knowledge know whether it comes close to Go channels?
Go channels are the only experience I have with concurrent programming, but when I understood them it quickly became second nature to create everything using channels. Can techniques like websockets be used now as cleanly as in Go?
I've been using asyncio in the last week. I've found it remarkable clean and intuitive. It's difficult to compare to channels in go, because it's such a different solution to the same problem - that of structuring concurrent programs.
They aren't like channels. If you want something like channels, try out greenlets for CPython, Stackless Python (a modified version of CPython) or pypy.
I suspect that you could easily implement channels on top of asyncio - goroutines = coroutines, hide the yield-from inside 'send', select = asyncio.wait(return_when=FIRST_COMPLETED).
If I understand the PEP right, this explanation is missing the part where it's only useful to extend abstract classes; python classes are open so you can just add a method to a concrete class to get single-dispatch if that's all you need. I don't think the example given needs generics at all.
Nice! That's something I like about Haskell and Elixir: the ability to define behavior for a given case. Though it looks like that's constrained to type-based dispatch (a la Java method overloading). Is there any way to define for specific values?
this, unicode and exceptions are IMHO good enough to start thinking about upgrading. you might say that pip is a non-issue because everybody who develops in python has it installed anyway, but in windows world deploying python apps has always been a pain (yes, i use py2exe, it's not perfect).
This makes a proper P3 Virtualenv into something good (from something barely ok in 3.3 where you need to install setuptools manually in your virtualenv)
pyvenv in Python 3.3 was immensely frustrating to use, because it leaves you without any package manager that understands your virtualenv.
I've seen this rumor that virtualenv "doesn't work" in Python 3. It seems to be propagated by people who make Python 2 virtualenvs and get confused, including a StackOverflow thread full of people basically typing commands at random to try to fix the problem. But pyvenv in 3.3 had much more potential for confusion.
The fact that Python 3.4 bundles pip may finally resolve this confusion, and make pyvenv appropriate to use. But I'm sure virtualenv will keep working fine as well as long as you don't mix up major versions of Python.
Yes, the regular virtualenv may work with Python 3 but it's a small detail in a frustrating setup. When I finally found out how you're supposed to do in > 3.4 it was a relief, still
Just tested 3.4 as it is like in 2.X, so, problem solved and no-one has to worry about this anymore.
I don't want pip and virtualenv to be constantly changing and evolving. We finally reached a stage where they're usable reliably. Some stability wouldn't hurt.
Added to the stdlib was "ensurepip", which is a simple installer for pip. The ensurepip module includes inside of it a copy of pip that it will install from (in other words, ensurepip doesn't hit the network).
There are various reasons why it does this, part of which is to enable easy upgrades to newer versions of pip both inside of CPython itself, and for the end user to upgrade it locally.
Glad to see customizable memory allocators made it in. Now if we could only take it up a notch and allow the run time to assign custom allocators on individual objects I could implement something like CCL's WATCH [0] for Python objects which would make debugging multi-threaded code much easier.
Congrats to the python-core team and thanks to everyone for another great release!
tldr; 'watchpoints' in python can do this already(since before clojure existed), but it's not built into standard pdb.
You could do the allocator yourself with a custom allocator easily. Just inspect the object type or object id, and in your meta allocator select the correct allocator.
I'm not sure how that would help you with a CCL style WATCH though...
pdb, with set_trace or conditions on breakpoints could let you WATCH individual objects. Search google for watchpoints.
Good to see that the internal hash function has been replaced with SipHash. It took quite a while for Python to decide to not try to tweak their own hash function/PRF. Now Python is on par with Ruby and Perl.
Speculation: A Python 3.x runtime might be the mystery annoucement promised for the Google Cloud Platform Live event on March 25. Unless it's a Node runtime?
my $$ is on a node runtime, assuming google is interested in increasing the # of people that use it. python 3 would be nice but it probably wont get people interested like node would.
Agreed. However, Node doesn't really fit their request based abstraction as Node assumes to have a long running server that holds things in memory (like Socket.io connections).
But I haven't really used Node enough to make statements like this, so correct me if I'm wrong.
I've written several production apps in it, and the unicode fixes in particular are a much easier, as well as little things like being able to do enums.
Ultimately, they really aren't all that different- I think the differences tend to be overblown somewhat on HN.
If the course/teacher you like only teaches using 2.7, learn that. You can pick up the changes from 2.7->3.4 on your own pretty easily once you know the basic language.
Yes, there's nothing you will gain by starting with 2.7. The language is polished, majority of modules are ported (currently from those that matter only twisted and opencv isn't).
If you for some reason have to work on older version (for example RedHat still uses 2.6) knowing Py3 is not really an issue to program there. Of course few nice features are missing, but people seem to talk as if it is a completely different language for some reason. With Python 3 at least you have a nice more consistent language.
If you were learning Java, would you start with JDK 1.3 or the latest one?
That depends. If you intend to work on other people's code or currently-active projects (or anything likely to be active soon), starting with 2.7 and learning the differences later would be a good approach.
If you're just using it for its own sake, or only for building things locally that either don't need to run anywhere else or only need to run where you have enough control to install your own Python, and you don't and won't need dependencies that aren't currently ported... then definitely start with 3.x. :)
There's very little difference between the two (a few standard library differences at most + a little syntax). 3.4 (and Python 3 generally) is the future of Python. Learn it first and you'll be on good footing.
The problem with statements like these is people always move the goal posts. If challenged it goes from nobody to talking about the proportions. There are plenty of people who will use this straight away. Yes, maybe as a proportion but Python has a lot of users developing a lot of different types of applications. Even if initially it's a small proportion, it's not the absolute numbers are large. I guess some of the bellyaching is because there's a feeling that the community and ecosystem of libraries is becoming fragmented -- but the community is large enough to take it and anyone really needing a particular library should be capable of expending the effort to port it or find an alternative. There are plenty of users who don't engage with the community much and/or just use established libraries/the standard library anyway.
In conclusion, not everyone is in the same boat as you. Try and see beyond your own nose pleas.
I imagine some of the disappointment is that there is a large community of people using the language ecosystem "Python 2", and that ecosystem seemed to have been prematurely killed in the hope of not just helping the small proportion of people who considered this new language ecosystem "Python 3" valuable, but seemingly in a failed attempt to actively strongarm people into switching: it's like if the people who worked on C++ standardized Rust as "C++14" and indicated that not only were no language inprovements ever coming to the old C++, but that all of the compiler implementaters had given up on it and were unlikely to ever release updates, even to fix bugs: that if people wanted support from the language called "C++" they needed to switch to this new language ecosystem. Sure, the syntax disparity is much smaller between Python 2 and Python 3, but the library semantics are still highly skewed (I mean, if it were just some simple syntax issues everything would have been trivial to port immediately). But really, the key issue is really the language ecosystem: the number of available libraries, the distribution in major operating systems, the cloud hosting options, etc. While these almost look vaguely hopeful for the Python 3 crowd at this point, it took five years of enforced death of Python 2 to strongarm it to even this point, and as it stands it is still weak. Maybe after 5 years of enforced death to C++ compilers someone could also claim a transition to Rust looked hopeful. And again: I understand that for some people this is great: I don't think anyone is actively complaining that Python 3 shouldn't exist, more that it's existence at the detriment of Python 2 snubbed a very large community of people. Like, one could say the same thing to the people who advocate everyone upgrading ASAP, or even ever: "not everyone is in the same boat as you; try to see beyond your own nose please".
> I understand that for some people this is great: I don't think anyone is actively complaining that Python 3 shouldn't exist, more that it's existence at the detriment of Python 2 snubbed a very large community of people.
There's a very simple solution, for those that can't/won't upgrade: Fork it. Work on OldNewPython 2.8. The core devs of Python want to work on Python 3, so that's where we're headed. There's no need to argue technical merits of Python 2 vs 3, the simple fact is that the development momentum is with Python 3 now.
Open source is a do-ocracy. Bring together your other like minded devs to work on Python 2 and keep on keeping on. We can't expect the core devs to keep maintaining an old version of Python. A lot of them are doing this for fun or as a hobby.
Python 3 came out in 2008. You're saying 16 years post-release until a moderately sized change (the majority of which is regarded as improvements) to a language is used in production.
Actually its not. No one will want to spend time and effort on regression just because the tool maintainers want a print statement to be function and not a statement. Or for a small trivial change.
This isn't just a "cosmetic" change, it's actually useful; print as a statement can't be used in list comprehensions, and print as a function comes with end= which is a huge time saver.
Nitpick, but the longest break so far between major Java releases has been 4.5 years[1] (between Java 6 and 7), which also spanned Sun's acquisition by Oracle.
Still better than 13 years for C++11 or having to wait around 6 years for the availability across all major compilers of what might be C++17 (looking at you modules).
I'm not sure it is. With C++, you can just start to work. There's no one saying "C++-11 is completely different and not at all compatible with the previous version, so you'll need to port all your existing code. Oh, and we're not going to support the old version anymore. Except we sort of will. But not. Kind of."
I like both languages actually, and I use python quite a bit, but holy crap has python 3 been a train wreck. C++ moves slowly, but predictably. Python has two active versions: one you're not supposed to use and one you can't.
I have heard Stroustrup speak and he mentioned multiple times how important it is to maintain compatibility with all existing code in between language versions. Say what you want about C++, but I think they have the right approach. If you are the steward of such a fundamental project, it is definitely in your users' (and therefore your project's) best interest to keep everything that exists working. It's just not worth the pain to make trivial changes in the name of aesthetics or "usability" that break every single hello world program (making print a function instead of a statement) or to change how the division operator works for your millions of existing users to save some initial confusion for hypothetical potential users.
C++ may have become a bit unwieldy as a result of the combination of this policy and the focus on adding new features, but developers working on existing code bases can continue to develop happily ignoring as much of C++11 as they like without the concern that the plug will be pulled in their platform some time soon. As they decide to adopt the new features, they can, based in the merits of the feature, not because they are forced to by the platform's developers.
There are starting to be some compelling features in py3k, but none of those required the massive language breakage that has happened between Python 2.x and these releases. All of the compelling features could have been added piecemeal through the normal backward-compatible deprecation and release process. And the devs who were so bothered by print and exec being a function, or other stupid things could continue to complain in the mailing list while the rest of us get our work done.
I tried very hard to be in the python3 camp. I'm among those who found the changes to be mostly good ones, and I've talked myself into biting the bullet a couple of different times, but it always turns out the same.
Last month I gave up and reverted the project I'm working on now back to python2 after spending an afternoon looking for unofficial ports of a large library only to find that someone on Github had forked it, done the hard work of fixing 2200 broken bits, and had his pull request ignored without comment for a full year.
Apparently it's sort of unmaintained in general, but I was trying to use it for something that I wasn't happy with scikit-learn for, and at the time, I wasn't aware of how long it had been since it had been updated in general.
So in hindsight, it wasn't the best example, but I had already had to deal with tracking down an random person's networkx branch as well, so that was enough to make me finally just fix the few things needed to work in 2.7.
I think you'll have an increasingly hard time finding good examples. Most packages that aren't ready for Python 3 by now are just unmaintained. At best they're so bogged down in their own complexity that you should be wary of using them for anything new.
Sometimes you need to use unmaintained or legacy code, and that sucks. But there are lots of programmers who don't have such a burden, and they shouldn't be discouraged from Python 3.
So it is. I'm not sure why I couldn't find that six months ago while I was looking.
Also, I don't mean to imply that people should be discouraged from using Python 3. Like I said, I want to use it myself, and would be if I hadn't ran into problems.
As a developer it's easier for me to ship java apps with the current runtime than it is to ship a python app with the updated runtime. Therefore I'm more likely to stay current with Java than I am with Python, especially when my target is any Linux from centos 5 and newer (including debian, etc. centos 5 is more used as a reference for a date).
I think adoption will improve substantially when major Linux distros start shipping with Python 3.x as the default interpreter. I believe this is the case with Ubuntu 14.04 LTS.
I think most distributions don't intend to change /usr/bin/python to 3 in the observeble future as that will break scripts. The meaningful way in which distros work on 2->3 transition is uprading apps installed by default to use 3 so python2 can be thrown out of the default CD image.
I find it hard to think of anything that doesn't run on Python3 by now. The only thing that's still good about Python2 is that it's much faster. I always hope for a release that addresses that.
Do you have more info about how Python 2 is faster than Python 3? I had understood that it was faster for some stuff but slower for other stuff. This benchmark says they're roughly the same: https://speakerdeck.com/pyconslides/python-3-dot-3-trust-me-...
Thank you for posting this. Pages like that are nice for pointing out to people who don't have issues that many of us do. I think people tend to forget that most programming languages are not single-purpose and can be used in widely varying environments for significantly different purposes.
Twisted is a pretty big dependency. The entire stack that I work on depends heavily on Twisted, and we use a lot of it, so until it's ready on Pyhon 3, it'll be hard for us to move.
We also depend on Paramiko, which is also red on the WOS.
Twisted is a bit of a bigger matter. A lot of the Python 3 tasks have been closed, but it's a pretty big beast, with lots of corners for bugs to hide in.
Which I think is pretty revealing about the state of Twisted and gevent.
I know from experience that gevent is not in a healthy state of maintenance. There have been some recent commits that indicate it might get better eventually, but I would not design new code to use gevent.
If there's anyone not using python3 in production now, I'd be wary of that team. Maybe it could be forgiven two years ago, but now... There's just too much to be gained with modern python.
But hey, there's still some organisations using php3 in production!
> If there's anyone not using python3 in production now, I'd be wary of that team.
I'd imagine the majority of teams are using python2 in production. Especially when LTS OSs ship with python 2.7. Things like Anaconda make it a lot easier to install python3 along side of python2.7 and move individual projects to 3.
we have a simple setup, and use python3.3.3. (Should be at 3.3.5 soonish) We have a dependency on pyodbc and they don't have a version for python 3.4. Also, we use pywin32. (Yes, on windows. We use it to monitor machines, do graphviz diagrams, xml-rpc, that sort of things. On windows mmsql environemnt. Yes i keep on top of it.
We try to do everything we need with the standard library, which is surprisingly good. Also, I do have one service using 2.7 because web.py. (Thank You Aaron!)
Pathlib, asyncio, unicode default, bundled pip and venv...
I do like the look of Python3. It's looking a lot more interesting, the further along it gets. I still haven't really done anything with it, due to python2 being bundled with our OSX and Linux workstations, which means painless installs. This year, however, I do want to start porting my projects over to Python3 (well, making them run on both, at least...)