Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
In-place merge algorithm using up to O(n+m) swaps and no temp swap space (gvelim.github.io)
66 points by gvelim on March 8, 2022 | hide | past | favorite | 19 comments


It looks like this is not really an “in-place” algorithm as it uses O(n) temporary space for the array of references. (If you allow that, one can always make an array of references, use any algorithm to perform a merge on the references, then use the references to permute the original array.)

There are in-place algorithms for merging, such as Bing-Chao Huang, Michael A. Langston, “Practical In-Place Merging” (1988): http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.88.1...


Yeah I guess the data itself is remaining in place, maybe there’s better name for this? Copy free perhaps?


Would o(n+m) really be a record? I didn't see any claims on your paper about computation cost, but I have only had time to read it's summary.


If, in a row of n+m items, you have to move the first one to the end and all other ones one place forward, you can, at best, move one item to its correct place per swap, except for the last swap. So, worst case, you can’t do with fewer than n+m-1 swaps.

In general, to perform a permutation, you can put each of its cycles in the correct spot with one fewer swap than the number of items in the cycle. So, for a permutation with k cycles, the best you can do is n+m-k swaps.


Misleading title, the algorithm uses O(n+m) temporary space. No idea why the author is misrepresenting this.

If you want to look at in-place merging, take a look at Wikisort: https://github.com/BonzaiThePenguin/WikiSort


Guys, do you know this trick:

    var a = 7, b = 9; 
    a = b - a; 
    b = b - a; 
    a = a + b;
    console.log(a, b); // prints 9 7


I love tricks like these even though I can't figure out when it'd be an advantage. This trick works with Xor for similar reasons:

  A = A^B
  B = A^B
  A = A^B
I've seen it used to store prev^next in one pointer in doubly linked lists. The coolest part is traveling forward and backwards depends only on if you use head or the tail to "xor-unpack" the pointer, no different for prev or next!


Actually, djbsort [1] uses a similar trick to sort two integers in constant time:

    int32_t ab = b ^ a;
    int32_t c = b - a;
    c ^= ab & (c ^ b);
    c >>= 31;
    c &= ab;
    a ^= c;
    b ^= c;
How it works is that if b < a we initially have that c = b - a is negative, and non-negative otherwise, if it were a 33-bit integer. We can then use an arithmetic shift right to spread the top sign bit, creating a mask that is all 1s when b < a and 0s otherwise. However it isn't 33-bit, so we have to detect overflow.

If both a and b are non-negative or both negative, there was no overflow. So only if the top bit of both was different (indicating different signs and thus potentially overflow), and the top bit of the result differs from b (since if the signs of a and b differ, the sign of b is our desired result), we flip the top bit of our result. That is the role of the mysterious line c ^= ab & (c ^ b);.

We then as mentioned before do an arithmetic right shift to spread the top bit into the entirety of c, creating a mask. We filter this mask using c &= ab to only contain ones on the positions where a and b differ in bits. Finally we toggle those bit positions by XORing both a and b with the mask, swapping their values if they're out of order, and leaving them unchanged otherwise.

[1]: https://sorting.cr.yp.to


The advantage is not having to use a third register to swap two values around. On systems like x86, registers are in very short supply. It's not quite as relevant now as x86 has had XCHG[0] for a long time now. Compilers are quite good at detecting when it should be used so you don't often have to explicitly write the XOR trick. XCHG also doesn't corrupt your flags register.

[0] https://c9x.me/x86/html/file_module_x86_id_328.html


The XOR trick also introduces data dependencies that makes it slower on some CPUs (can’t start the second XOR until the result of the first is available, or the third one until the result of the second is available)

When using a third register, CPUs with register renaming need not even move data at all. They could move the labels “register A”, “register B” around.


Alternatively you can do:

  a ^= b;
  b ^= a;
  a ^= b;
Which uses less energy because the bitwise operators do not use the intermediate/interbit carry.


saving (one) swap space by adding 3 reads and 3 calculations.


and an int overflow!


Unless they are unsigned ints, and then it is defined behaviour and it becomes modulo arithmetic.


Thank you, I'd forgotten and remembered and forgotten that again.

In a sort of ironic comedy, maybe an msint.h file could include definitions such as: mint8, sint8, ... mint64, sint64, etc. As well as an explanation about the difference between modular and signed (2s compliment) arithmetic operations.


Yes, I've known that trick for a long time. There's hardly ever a reason to use it though, unless you're swapping registers in assembly and don't have a spare register.


If the arrays are in one contiguous block, I think this is equivalent to a 2-way shuffle in place, here's a neat paper on it: https://arxiv.org/pdf/0805.1598.pdf


I would appreciate references on the time complexity of merging sorted lists (if I am not worried about space complexity at all), for arbitrary n and m. There is an excellent discussion in Volume 3 of TAOCP, but the references within are quite old.


It's trivial to solve this problem under those criteria by using selection sort.

https://en.wikipedia.org/wiki/Selection_sort




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: