Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, listen/notify is something very different. We would often write new projections that consumes events from years back and until today.

You want sequence numbers that indicate the event's position in a partitioned log.

Something like "int identity" except that the int is assigned during commit, so that you have guarantee that if you see IDs 5 and 7, then 6 will never show up, so that each consumer can store a cursor of its progress of consuming the table which is safe against inserts.

I was hoping to do it using CDC, but Microsoft SQL has a minimum 1 minute delay on CDC which destroys any live data usecase. Perhaps postgres allows listening to the replication log with lower latency?



> Perhaps postgres allows listening to the replication log with lower latency?

Yes, I think that's what the "logical decoding" referred to. Postgres can emit a "logical" version of the WAL (something with a stable spec writtten down so that other services can stream and decode it). My understanding was that "logical replication" was designed for low latency situations like creating read replicas.

I haven't heard of the logical log being preserved for "years back" but that's an interesting case...


That is OK, guess I would write a job to listen to the logical WAL and use it to do an update that writes an event sequence number.


> Something like "int identity" except that the int is assigned during commit, so that you have guarantee that if you see IDs 5 and 7, then 6 will never show up

I don't think that's possible, nor is it something you should actually need.

If two transactions tx1 and tx2 are concurrent (let's say tx2 begins after tx1 and also finishes after tx1), then tx2 has done some work without access to tx1's data (as tx1 hadn't committed yet when tx2 began). So either:

- tx1's data is relevant to tx2, so tx2 needs to abort and retry. In which case the sequence number doesn't _need_ to be assigned at commit time, it can be assigned at any time and will be increasing monotonically between related transactions. - tx1's data is irrelevant to tx2, in which case the ordering is irrelevant and you don't need to assign the sequence number as late as commit-time.

The "relevance" is what partition keys encode: if tx1 and tx2 are potentially conflicting, they should use the same partition key. It doesn't enforce that sequence numbers increase monotonically within a physical partition, but it enforces that they do for a given _partition key_ (which is what should matter, the key->partition assignment is arbitrary).

> Perhaps postgres allows listening to the replication log with lower latency?

Pretty sure it does, you can listen to the WAL which is as instant as it gets. We were doing that in a previous company: a process (debezium) would listen to the WAL for a specific "events" table and write to Kafka. The main downside is that the table isn't an outbox, it keeps growing despite the events having been pushed to Kafka.


You explain why I don't need them for a very different usecase than what I refer to.

My point is I want a new primitive -- a pub/sub sequence number -- to avoid having Kafka around at all.

What Kafka does is "only" to generate and store such a sequence number after all (it orders events on a partition, but the sequence number I talk about is the same thing just different storage format). So also you do need it in the setup you describe, you just let Kafka generate it instead of having it in SQL.

Assuming your workload is fine with a single DB, the only thing Kafka gives you is in fact assigning such a post-commit row sequence number (+API/libraries building on it).

This is the mechanism used to implement pub/sub: Every consumer tracks what sequence number it has read to (and Kafka guarantees that the sequence number is increasing).

That is what mssql-changefeed linked above is about: Assigning those event log sequence numbers in the DB instead. And not use any event brokers (or outboxes) at all.

For postgres I would likely then consume the WAL and write sequence numbers to another table based on those...

It may seem clunky but IMO installing and operating Kafka just to get those pub/sub sequence numbers assigned is even clunkier.


It sounds like the Log Sequence Number in Postgres is what you are looking for. If you subscribe to a Postgres publication via logical replication, each commit will be emitted with a monotonically increasing LSN.


I'm not aware of any one minute minimum delay for CDC. We currently are running an on prem SQL Server -> CDC -> Debezium -> Azure Event Hub -> Azure Function App back to on prem SQL Server and that has 5-10 second delay from source system transaction commit till update/insert.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: