Reminds me of PG-Strom[1] which is a Postgres extension for GPU-bound index acce...

bob1029 · 2025-06-28T20:00:00 1751140800

> However, I wonder whether the GPU's are a good fit for this to begin with.

I think the GPU could be a great fit for OLAP, but when it comes to the nasty OLTP use cases the CPU will absolutely dominate.

Strictly serialized transaction processing facilities demand extremely low latency compute to achieve meaningful throughput. When the behavior of transaction B depends on transaction A being fully resolved, there are no magic tricks you can play anymore.

Consider that talking to L1 is at least 1,000x faster than talking to the GPU. Unless you can get a shitload of work done with each CPU-GPU message (and it is usually the case that you can), this penalty is horrifyingly crippling.

tucnak · 2025-06-28T20:28:14 1751142494

I think, TrueTime would constitute a "trick," insofar ordering is concerned?

> Consider that talking to L1 is at least 1,000x faster than talking to the GPU.

This is largely true for "traditional" architectures, but s/GPU/TPU and s/L1/CMEM and suddenly this is no big deal anymore. I'd like Googlers to correct me here, but it seems well in line with classic MapReduce, and probably something that they're doing a lot outside of LLM inference... ads?

bob1029 · 2025-06-28T21:41:57 1751146917

How does the information get to & from the GPU in the first place?

If a client wishes to use your GPU-based RDBMS engine, it needs to make a trip through the CPU first, does it not?

tucnak · 2025-06-28T22:14:14 1751148854

Not necessarily! The setup I'm discussing is explicitly non-GPU, and it's not necessarily a TPU either. Any accelerator card with NoC capability will do: the requests are queued/batched from network, trickle through the adjacent compute/network nodes, and written back to network. This is what "compute-in-network" means; the CPU is never involved, main memory is never involved. You read from network, you write to network, that's it. On-chip memory on these accelerators is orders of magnitude larger than L1 (FPGA's are known for low-latency systolic stuff) and the on-package memory is large HBM stacks similar to those you would find in a GPU.

dbetteridge · 2025-06-29T00:51:35 1751158295

Could you (assuming no care about efficiency)

Send the query to both GPU and CPU pipelines at the same time and use whichever comes back first

Joel_Mckay · 2025-06-29T02:19:01 1751163541

Most database query optimizer engines do a few tests to figure out the most pragmatic approach.

GPUs can incur higher failure risks, and thus one will not normally find them in high-reliability roles. =3

philippemnoel · 2025-07-01T00:42:55 1751330575

> Postgres is not typically considered to "scale well," but oftentimes this is a statement about its tablespaces more than anything; it has foreign data[4] API, which is how you extend Postgres as single point-of-consumption, foregoing some transactional guarantees in the process. This is how pg_analytics[5] brings DuckDB to Postgres, or how Steampipe[6] similarly exposes many Cloud and SaaS applications. Depending on where you stand on this, the so-called alternative SQL engines may seem like moving in the wrong direction. Shrug.

Maintainer of pg_analytics (now part of pg_search) here. I 100% agree that the statements against Postgres are often exaggerated. In practice, we see both the smallest and the largest companies "just use Postgres" while mid-scale companies often overthink their solution.

That said, there are indeed phenomenal "alternate" SQL engines. I've seen many users see great success on tools like ClickHouse, which ParadeDB is not yet competitive with, and sometimes (dare I say) even Elasticsearch. As for whether this one is one of them... That I couldn't say

Joel_Mckay · 2025-06-29T02:15:17 1751163317

Thanks for reminding us of the project name.

Personally, I'd rather have another dual cpu Epyc host with maximum ECC ram, as I have witnessed NVIDIA GPU failed closed to take out host power supplies. =3