Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is specific to Microsoft SQL Server, which has a rather fraught history in this department.

I believe the best option in PostgreSQL is still INSERT ... ON CONFLICT ...

In general, this is one of those spots where every single RDBMS behaves differently, so knowledge from one platform doesn't necessarily transfer well.

(Which, in turn, is one of the reasons why it's not necessarily a good idea to try to be database-agnostic.)



Sqlite has the same as Postgres; INSERT ... ON CONFLICT ... DO ...

It's very ergonomic and intuitive. I understand why SQL Server needs to retain older forms, but why can't they also adopt these new forms that are more pleasant for the end user? I suppose the people buying licenses to SQL Server usually aren't the ones actually using it.


Officially, MERGE is the method defined in the SQL standard, and is by far the most widely supported syntax.

Everyone else uses different, DBMS-specific, non-portable syntaxes. A lot of them (e.g., PostgreSQL and Sqlite) have some similarities, but are still mutually incompatible.

I think that Microsoft arguably made the right decision by choosing to conform to the standard. It's just kind of unfortunate that the ISO standard syntax is (in my personal opinion) not very good.


I am the primary author of INSERT ... ON CONFLICT in Postgres. I extensively studied similar features from other database systems as part of that project.

MERGE simply doesn't provide the guarantees that most users want -- not in theory and not in practice. For example, the Oracle docs introduce MERGE by saying: "This statement is a convenient way to combine multiple operations. It lets you avoid multiple INSERT, UPDATE, and DELETE DML statements."

It's likely impossible to make the concurrency guarantees work sensibly in READ COMMITTED mode while supporting MERGE's very general semantics. There are subtle but very real problems with making it behave like ON CONFLICT in one context but not in others. In my view MERGE is a perfectly reasonable feature, but it just isn't what people want when they say they want to upsert.

The SQL standard deliberately underspecifies anything related to transaction isolation or concurrency. I cannot see the standard providing any guidance on upsert for this reason.


Thank you for your work! Postgres is great to use and the regular release of useful features has been a very nice benefit.

One unobvious (to me) but possibly necessary side effect of the INSERT...ON CONFLICT pattern is that it reserves a primary id for every row ahead of time, presumably because it does not know what will be updated and what will be inserted. We have a heavily updated table that started failing inserts one day because the primary id sequence hit the max int (2.1b), despite the table only reaching 100mm rows. The immediate fix was to reset the sequence to -1 and have it start going down...


I hear that one from time to time. It's a consequence of the fact that sequences are generally not transactional, and the fact that the underlying "speculative insertion" infrastructure really does have to do most of the work of the insertion before it can decide whether or not the insertion really should go ahead.

I'm sympathetic, but unfortunately I cannot think of any tractable way of avoiding the problem at the implementation level. Maybe it would be possible to do something with an identity column (as opposed to a SERIAL column or a raw sequence) -- we have access to sufficient context there. Maybe we could skip consuming a sequence in the common case where there is a clear conflict at the start, and the sequence isn't accessed through the EXCLUDED pseudo table. We wouldn't actually promise anything more, but in practice we wouldn't burn through sequences at the same rate in cases where updates are common.

This is quite a significant project, but maybe it'll happen some day.


insert isn't really reserving so much as just getting nextval from the seq, and may have to nextval a second time. I've found for tables like this to NOT use int-pk - I use ULIDs for these kinds (and actually, I've been using ULID for like everything lately)


The Postgres team decided against implementing MERGE when they designed the ON CONFLICT feature (it actually started as a MERGE).

It's been a while, but as far as I remember, the main reason was that the behaviour of MERGE mandated by the SQL standard, wasn't clear enough and doesn't deal with race conditions when multiple transactions try the same insert (or update) (not so much the syntax itself).


Unfortunately, SQL Server's merge statement has a LOT of issues. One of the worst being that it is isn't atomic (without non-SQL-standard hints)[0]!

It really feels like it was a half-assed feature.

Great that they followed the standard, not so great that they made it nearly unusable with such a poor implementation.

I was super excited when we got SQL server 2008 at work (since upserts are a relatively common issue) and was devastated to learn all the issues with MERGE. I didn't care that the syntax was funky, I just wanted to avoid needing to manage transactions/locks/etc.

[0] https://weblogs.sqlteam.com/dang/2009/01/31/upsert-race-cond...


> it is isn't atomic

merge is a single, logically-atomic statement in sql server (with or without hints)

it doesn't have the same physical locking characteristics as a single insert or update statement, how could it?


The way I see it, the standard is should be taken as a bare minimum requirement. Ideally standards compliant SQL would work on any RDBMS, though I don't think that ideal has ever been realized. Supplementing the standard with extensions when the standard happens to have ergonomic problems seems to do more good than harm. IMHO writing SQL with the extensions of postgres or sqlite is a lot more enjoyable, which I think counts for a lot.


There's room for argument there. From my perspective, as someone who's had to spend a lot of time maintaining databases, I would rather have just one way of doing it, even if the one way of doing it that I get stuck with is not the way I would have picked if I had had my druthers.

The problem with TMTOWTDI is that it means there's also more than one way to have to understand and reason about. Which ultimately makes my job harder.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: