Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've just spent the morning implementing various "faster" median algorithms, and it seems that constant factors, function call overhead (for recursive quickselects), etc make all alternative approaches quite slow. The sort-then-select approach is pretty damn fast, especially since I also think / am pretty sure sorted() is implemented in C...

So far I've tried median-of-medians, recursive quickselect, iterative quickselect tracking indices and partitioning in place, and heapq.nlargest. The implementation in Python 3.4.0 is both cleaner/easier to read and faster than anything I can make by an order of magnitude for 10,000 element lists. I'm sure someone else here can do better than me, but (s)he'd have a hard time beating CPython, imo.



Not only is sorting a Python list implemented in C, the algorithm was sufficiently impressive to be picked up by Java for non-primitives: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/bfd7abda8f79

The reason it's for non-primitives is because it's optimized for sorting lists where comparison is relatively expensive , such as the dereferencing Python does on every item and Java does for non-primitive items.

Thanks for putting the effort in and testing it! Very interesting results.


That's very interesting. I used a C implementation of quickselect in that past that was way faster than sorting. That said, my data was also larger than 10,000 elements.

FWIW, here is some old timing data on a 2008 era Mac:

     *  approx timing on my Mac
     *          n, time in ms 
     *         10, <1
     *        100, <1
     *       1000, <1
     *      10000, <1
     *     100000, 30
     *    1000000, 230
     *   10000000, 2530
     *  100000000, 33000
You are probably right that pure python will have trouble competing with a C sort, except for quite large N.


Since this is pure Python performance you might want to give the different algorithms a try on PyPy




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: