In-place merge algorithm using up to O(n+m) swaps and no temp swap space

anderskaseorg · on March 8, 2022

It looks like this is not really an “in-place” algorithm as it uses O(n) temporary space for the array of references. (If you allow that, one can always make an array of references, use any algorithm to perform a merge on the references, then use the references to permute the original array.)

There are in-place algorithms for merging, such as Bing-Chao Huang, Michael A. Langston, “Practical In-Place Merging” (1988): http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.88.1...

klysm · on March 8, 2022

Yeah I guess the data itself is remaining in place, maybe there’s better name for this? Copy free perhaps?

HWR_14 · on March 8, 2022

Would o(n+m) really be a record? I didn't see any claims on your paper about computation cost, but I have only had time to read it's summary.

Someone · on March 8, 2022

If, in a row of n+m items, you have to move the first one to the end and all other ones one place forward, you can, at best, move one item to its correct place per swap, except for the last swap. So, worst case, you can’t do with fewer than n+m-1 swaps.

In general, to perform a permutation, you can put each of its cycles in the correct spot with one fewer swap than the number of items in the cycle. So, for a permutation with k cycles, the best you can do is n+m-k swaps.

orlp · on March 8, 2022

Misleading title, the algorithm uses O(n+m) temporary space. No idea why the author is misrepresenting this.

If you want to look at in-place merging, take a look at Wikisort: https://github.com/BonzaiThePenguin/WikiSort

dvh · on March 8, 2022

Guys, do you know this trick:

    var a = 7, b = 9; 
    a = b - a; 
    b = b - a; 
    a = a + b;
    console.log(a, b); // prints 9 7

kjeetgill · on March 8, 2022

I love tricks like these even though I can't figure out when it'd be an advantage. This trick works with Xor for similar reasons:

  A = A^B
  B = A^B
  A = A^B

I've seen it used to store prev^next in one pointer in doubly linked lists. The coolest part is traveling forward and backwards depends only on if you use head or the tail to "xor-unpack" the pointer, no different for prev or next!

orlp · on March 8, 2022

Actually, djbsort [1] uses a similar trick to sort two integers in constant time:

    int32_t ab = b ^ a;
    int32_t c = b - a;
    c ^= ab & (c ^ b);
    c >>= 31;
    c &= ab;
    a ^= c;
    b ^= c;

How it works is that if b < a we initially have that c = b - a is negative, and non-negative otherwise, if it were a 33-bit integer. We can then use an arithmetic shift right to spread the top sign bit, creating a mask that is all 1s when b < a and 0s otherwise. However it isn't 33-bit, so we have to detect overflow.

If both a and b are non-negative or both negative, there was no overflow. So only if the top bit of both was different (indicating different signs and thus potentially overflow), and the top bit of the result differs from b (since if the signs of a and b differ, the sign of b is our desired result), we flip the top bit of our result. That is the role of the mysterious line c ^= ab & (c ^ b);.

We then as mentioned before do an arithmetic right shift to spread the top bit into the entirety of c, creating a mask. We filter this mask using c &= ab to only contain ones on the positions where a and b differ in bits. Finally we toggle those bit positions by XORing both a and b with the mask, swapping their values if they're out of order, and leaving them unchanged otherwise.

[1]: https://sorting.cr.yp.to

globular-toast · on March 8, 2022

The advantage is not having to use a third register to swap two values around. On systems like x86, registers are in very short supply. It's not quite as relevant now as x86 has had XCHG[0] for a long time now. Compilers are quite good at detecting when it should be used so you don't often have to explicitly write the XOR trick. XCHG also doesn't corrupt your flags register.

[0] https://c9x.me/x86/html/file_module_x86_id_328.html

Someone · on March 8, 2022

The XOR trick also introduces data dependencies that makes it slower on some CPUs (can’t start the second XOR until the result of the first is available, or the third one until the result of the second is available)

When using a third register, CPUs with register renaming need not even move data at all. They could move the labels “register A”, “register B” around.

pickledcods · on March 8, 2022

Alternatively you can do:

  a ^= b;
  b ^= a;
  a ^= b;

Which uses less energy because the bitwise operators do not use the intermediate/interbit carry.

pickledcods · on March 8, 2022

saving (one) swap space by adding 3 reads and 3 calculations.

Wiseacre · on March 8, 2022

and an int overflow!

pickledcods · on March 8, 2022

Unless they are unsigned ints, and then it is defined behaviour and it becomes modulo arithmetic.

mjevans · on March 8, 2022

Thank you, I'd forgotten and remembered and forgotten that again.

In a sort of ironic comedy, maybe an msint.h file could include definitions such as: mint8, sint8, ... mint64, sint64, etc. As well as an explanation about the difference between modular and signed (2s compliment) arithmetic operations.

mark-r · on March 8, 2022

Yes, I've known that trick for a long time. There's hardly ever a reason to use it though, unless you're swapping registers in assembly and don't have a spare register.

duped · on March 8, 2022

If the arrays are in one contiguous block, I think this is equivalent to a 2-way shuffle in place, here's a neat paper on it: https://arxiv.org/pdf/0805.1598.pdf

hgibbs · on March 8, 2022

I would appreciate references on the time complexity of merging sorted lists (if I am not worried about space complexity at all), for arbitrary n and m. There is an excellent discussion in Volume 3 of TAOCP, but the references within are quite old.

pubby · on March 8, 2022

It's trivial to solve this problem under those criteria by using selection sort.

https://en.wikipedia.org/wiki/Selection_sort