Hacker Newsnew | past | comments | ask | show | jobs | submit | n_u's commentslogin

> Submit the write to the primary file

> Link fsync to that write (IOSQE_IO_LINK)

> The fsync's completion queue entry only arrives after the write completes

> Repeat for secondary file

Wait, so the OS can re-order the fsync() to happen before the write request it is supposed to be syncing? Is there a citation or link to some code for that? It seems too ridiculous to be real.

> O_DSYNC: Synchronous writes. Don't return from write() until the data is actually stable on the disk.

If you call fsync() this isn't needed correct? And if you use this, then fsync() isn't needed right?


> Wait, so the OS can re-order the fsync() to happen before the write request it is supposed to be syncing? Is there a citation or link to some code for that? It seems too ridiculous to be real.

This is an io_uring-specific thing. It doesn't guarantee any ordering between operations submitted at the same time, unless you explicitly ask it to with the `IOSQE_IO_LINK` they mentioned.

Otherwise it's as if you called write() from one thread and fsync() from another, before waiting for the write() call to return. That obviously defeats the point of using fsync() so you wouldn't do that.

> If you call fsync(), [O_DSYNC] isn't needed correct? And if you use [O_DSYNC], then fsync() isn't needed right?

I believe you're right.


I guess I'm a bit confused why the author recommends using this flag and fsync.

Related: I would think that grouping your writes and then fsyncing rather than fsyncing every time would be more efficient but it looks like a previous commenter did some testing and that isn't always the case https://news.ycombinator.com/item?id=15535814


I'm not sure there's any good reason. Other commenters mentioned AI tells. I wouldn't consider this article a trustworthy or primary source.


Yeah that seems reasonable. The article seems to mix fsync and O_DSYNC without discussing their relationship which seems more like AI and less like a human who understands it.

It also seems if you were using io_uring and used O_DSYNC you wouldn't need to use IOSQE_IO_LINK right?

Even if you were doing primary and secondary log file writes, they are to different files so it doesn't matter if they race.


> It also seems if you were using io_uring and used O_DSYNC you wouldn't need to use IOSQE_IO_LINK right? Even if you were doing primary and secondary log file writes, they are to different files so it doesn't matter if they race.

I think there are a lot of reasons to use this flag besides a write()+f(data)sync() sequence:

* If you're putting something in a write-ahead log then applying it to the primary storage, you want it to be fully committed to the write-ahead log before you start changing the primary storage, so if there's a crash halfway through the primary storage change you can use the log to get to a consistent state (via undo or redo).

* If you're trying to atomically replace a file via the rename-a-temporary-file-into-place trick, you can submit the whole operation to the ring at once, but you'd want to use `IOSQE_IO_LINK` to ensure the temporary file is fully written/synced before the rename happens.

btw, a clarification about my earlier comment: `O_SYNC` (no `D`) should be equivalent to calling `fsync` after every write. `O_DSYNC` should be equivalent to calling the weaker `fdatasync` after every write. The difference is the metadata stored in the inode.


> I think there are a lot of reasons to use this flag besides a write()+f(data)sync() sequence:

> * If you're putting something in a write-ahead log then applying it to the primary storage, you want it to be fully committed to the write-ahead log before you start changing the primary storage, so if there's a crash halfway through the primary storage change you can use the log to get to a consistent state (via undo or redo).

I guess I meant exclusively in terms of writing to the WAL. As I understand most DBMSes synchronously write the log entries for a transaction and asynchronously write the data pages to disk via a separate API or just mark the pages as dirty and let the buffer pool manager flush them to disk at its discretion.

> * If you're trying to atomically replace a file via the rename-a-temporary-file-into-place trick, you can submit the whole operation to the ring at once, but you'd want to use `IOSQE_IO_LINK` to ensure the temporary file is fully written/synced before the rename happens.

Makes sense


> As I understand most DBMSes synchronously write the log entries for a transaction and asynchronously write the data pages to disk via a separate API or just mark the pages as dirty and let the buffer pool manager flush them to disk at its discretion.

I think they do need to ensure that page doesn't get flushed before the log entry in some manner. This might happen naturally if they're doing something in single-threaded code without io_uring (or any other form of async IO). With io_uring, it could be a matter of waiting for completion entry for the log write before submitting the page write, but it could be the link instead.


> I think they do need to ensure that page doesn't get flushed before the log entry in some manner.

Yes I agree. I meant like they synchronously write the log entries, then return success to the caller, and then deal with dirty data pages. As I recall the buffer pool manager has to do something special with dirty pages for transactions that are not committed yet.


Cool! I've always wanted something like this. Usually I just have to manually remove redundant CSS and styling options.

Can you explain why the viewport width and height are needed?


Content that appears in the viewport before scrolling is considered 'above-the-fold' and is thus prioritised to load quickest. The viewport dimensions are used to figure out what will be above-the-fold.


Wireless ethernet adapters


What was the CRDT bug?


> PyTorch has dominated the AI scene since TF1 fumbled the ball at 10th yard line

can you explain why you think TensorFlow fumbled?


I see good answers already, but here's a concrete example:

In my University we had to decide between both libraries so, as a test, we decided to write a language model from scratch. The first minor problem with TF was that (if memory serves me right) you were supposed to declare your network "backwards" - instead of saying "A -> B -> C" you had to declare "C(B(A))". The major problem, however, was that there was no way to add debug messages - either your network worked or it didn't. To make matters worse, the "official" TF tutorial on how to write a Seq2Seq model didn't compile because the library had changed but the bug reports for that were met for years with "we are changing the API so we'll fix the example once we're done".

PyTorch, by comparison, had the advantage of a Python-based interface - you simply defined classes like you always did (including debug statements!), connected them as variables, and that was that. So when I and my beginner colleagues had to decide which library to pick, "the one that's not a nightmare to debug" sounded much better than "the one that's more efficient if you have several billions training datapoints and a cluster". Me and my colleagues then went on to become professionals, and we all brought PyTorch with us.


This was also my experience. TensorFlow's model of constructing then evaluating a computation graph felt at odds with Python's principles. It made it extremely difficult to debug because you couldn't print tensors easily! It didn't feel like Python at all.

Also the API changed constantly so examples from docs or open source repos wouldn't work.

They also had that weird thing about all tensors having a unique global name. I remember I tried to evaluate a DQN network twice in the same script and it errored because of that.

It's somewhat vindicating to see many people in this thread shared my frustrations. Considering the impact of these technologies I think a documentary about why TensorFlow failed and PyTorch took off would be a great watch.


The inability to use print debug to tell me the dimensions of my hidden states was 100% why TF was hard for me to use as a greenhorn MSc student.

Another consequence of this was that PyTorch let you use regular old Python for logic flow.


In 2018, I co-wrote a blog post with the inflammatory title “Don’t use TensorFlow, try PyTorch instead” (https://news.ycombinator.com/item?id=17415321). As it gained traction here, it was changed to “Keras vs PyTorch” (some edgy things that work for a private blog are not good for a corporate one). Yet the initial title stuck, and you can see it resonated well with the crowd.

TensorFlow (while a huge step on top of Theano) had issues with a strange API, mixing needlessly complex parts (even for the simplest layers) with magic-box-like optimization.

There was Keras, which I liked and used before it was cool (when it still supported the Theano backend), and it was the right decision for TF to incorporate it as the default API. But it was 1–2 years too late.

At the same time, I initially looked at PyTorch as some intern’s summer project porting from Lua to Python. I expected an imitation of the original Torch. Yet the more it developed, the better it was, with (at least to my mind) the perfect level of abstraction. On the one hand, you can easily add two tensors, as if it were NumPy (and print its values in Python, which was impossible with TF at that time). On the other hand, you can wrap anything (from just a simple operation to a huge network) in an nn.Module. So it offered this natural hierarchical approach to deep learning. It offered building blocks that can be easily created, composed, debugged, and reused. It offered a natural way of picking the abstraction level you want to work with, so it worked well for industry and experimentation with novel architectures.

So, while in 2016–2017 I was using Keras as the go-to for deep learning (https://p.migdal.pl/blog/2017/04/teaching-deep-learning/), in 2018 I saw the light of PyTorch and didn’t feel a need to look back. In 2019, even for the intro, I used PyTorch (https://github.com/stared/thinking-in-tensors-writing-in-pyt...).


Actually, I opened “Teaching deep learning” and smiled as I saw how it evolved:

> There is a handful of popular deep learning libraries, including TensorFlow, Theano, Torch and Caffe. Each of them has Python interface (now also for Torch: PyTorch)

> [...]

> EDIT (July 2017): If you want a low-level framework, PyTorch may be the best way to start. It combines relatively brief and readable code (almost like Keras) but at the same time gives low-level access to all features (actually, more than TensorFlow).

> EDIT (June 2018): In Keras or PyTorch as your first deep learning framework I discuss pros and cons of starting learning deep learning with each of them.


The original TensorFlow had an API similar to the original Lua-based Torch (the predecessor to PyTorch) that required you to first build the network, node by node, then run it. PyTorch used a completely different, and much more convenient approach, where the network is built automatically for you just by running the forward pass code (and will then be used for the backward pass), using both provided node types and arbitrary NumPy compatible code. You're basically just writing differentiable code.

This new PyTorch approach was eventually supported by TensorFlow as well ("immediate mode"), but the PyTorch approach was such a huge improvement that there had been an immediate shift by many developers from TF to PyTorch, and TF never seemed able to regain the momentum.

TF also suffered from having a confusing array of alternate user libraries built on top of the core framework, none of which had great documentation, while PyTorch had a more focused approach and fantastic online support from the developer team.


LuaTorch is eager-execution. The problem with LuaTorch is the GC. You cannot rely on traditional GC for good work, since each tensor is megabytes (at the time), now gigabytes large, you need to collect them aggressively rather than at intervals (Python's reference-counting system solves this issue, and of course, by "collecting", I don't mean free the memory (PyTorch has a simple slab allocator to manage CUDA memory)).


With Lua Torch the model execution was eager, but you still had to construct the model graph beforehand - it wasn't "define by run" like PyTorch.

Back in the day, having completed Andrew Ng's ML coursew, I then built my own C++ NN framework copying this graph-mode Lua Torch API. One of the nice things about explicitly building a graph was that my framework supported having the model generate a GraphViz DOT representation of itself so I could visualize it.


Ah, I get what you mean now. I am mixing up the nn module and the tensor execution bits. (to be fair, the PyTorch nn module carries over many these quirks!).


I'm no machine learning engineer but I've dabbled professionally with both frameworks a few years ago and the developer experience didn't even compare. The main issue with TF was that you could only chose between a powerful but incomprehensible, poorly documented [1], ultra-verbose and ever changing low-level API, and an abstraction layer (Keras) that was too high level to be really useful.

Maybe TF has gotten better since but at the time it really felt like an internal tool that Google decided to just throw into the wild. By contrast PyTorch offered a more reasonable level of abstraction along with excellent API documentation and tutorials, so it's no wonder that machine learning engineers (who are generally more interested in the science of the model than the technical implementation) ended up favoring it.

[1] The worst part was that Google only hosted the docs for the latest version of TF, so if you were stuck on an older version (because, oh I don't know, you wanted a stable environment to serve models in production), well tough luck. That certainly didn't gain TF any favors.


For me it was about 8 years ago. Back then TF was already bloated but had two weaknesses. Their bet on static compute graphs made writing code verbose and debugging difficult.

The few people I know back then used keras instead. I switched to PyTorch for my next project which was more "batteries included".


Imagine a total newbie trying to fine-tune an image classifier, reusing some open source example code, about a decade ago.

If their folder of 10,000 labelled images contains one image that's a different size to the others, the training job will fail with an error about unexpected dimensions while concatenating.

But it won't be able to say the file's name, or that the problem is an input image of the wrong size. It'll just say it can't concatenate tensors of different sizes.

An experienced user will recognise the error immediately, and will have run a data cleansing script beforehand anyway. But it's not experienced users who bounce from frameworks, it's newbies.


> An experienced user will recognise the error immediately, and will have run a data cleansing script beforehand anyway. But it's not experienced users who bounce from frameworks, it's newbies.

Even seasoned developers will bounce away from frameworks or libraries - no matter if old dogs or the next hot thing - if the documentation isn't up to speed or simple, common tasks require wading through dozens of pages of documentation.

Writing good documentation is hard enough, writing relevant "common usage examples" is even harder... but keeping them up to date and working is a rarely seen art.

And the greatest art of all of it is logging. Soooo many libraries refuse to implement detailed structured logging in internal classes (despite particularly Java and PHP offering very powerful mechanisms), making it much more difficult to troubleshoot problems in the field.


I just remember TF1 being super hard to use as a beginner and Google repeatedly insisting it had to be that way. People talk about the layering API, but it's more than that, everything about it was covered with sharp edges.


I personally believe TF1 was serving the need of its core users. It provided a compileable compute graph with autodiff, and you got very efficient training and inference from it. There was a steep learning curve, but if you got past it, things worked very very well. The distributed TF never really took off—it was buggy, and I think they made some wrong earlier bets in the design for performance reasons that they should have been sacrificed in favor of simplicity.

I believe some years after the TF1 release, they realized the learning curve was too steep, they were losing users to PyTorch. I think also the Cloud team was attempting to sell customers on their amazing DL tech, which was falling flat. So they tried to keep the TF brand while totally changing the product under the hood by introducing imperative programming and gradient tapes. They killed TF1, upsetting those users, while not having a fully functioning TF2, all the while having plenty of documentation pointing to TF1 references that didn’t work. Any new grad student made the simple choice of using a tool that was user-friendly and worked, which was PyTorch. And most old TF1 users hopped on the band wagon.


First, the migration to 2.0 in 20219 to add eager mode support was horribly painful. Then, starting around 2.7, backward compatibility kept being broken. Not being able to load previously trained models with a new version of the library is wildly painful.


I only remember 2015 TF and I was wondering: why would I use Python to assemble a computational graph when what I really want is to write code and then differentiate through it?


Greenfielding TF2.X and not maintaining 1.X compatibility


> Reranking: the highest value 5 lines of code you'll add. The chunk ranking shifted a lot. More than you'd expect. Reranking can many times make up for a bad setup if you pass in enough chunks. We found the ideal reranker set-up to be 50 chunk input -> 15 output.

What is re-ranking in the context of RAG? Why not just show the code if it’s only 5 lines?


OP. Reranking is a specialized LLM that takes the user query, and a list of candidate results, then re-sets the order based on which ones are more relevant to the query.

Here's sample code: https://docs.cohere.com/reference/rerank


What is the difference between reranking versus generating text embeddings and comparing with cosine similarity?


My understanding:

If you generate embeddings (of the query, and of the candidate documents) and compare them for similarity, you're essentially asking whether the documents "look like the question."

If you get an LLM to evaluate how well each candidate document follows from the query, you're asking whether the documents "look like an answer to the question."

An ideal candidate chunk/document from a cosine-similarity perspective, would be one that perfectly restates what the user said — whether or not that document actually helps the user. Which can be made to work, if you're e.g. indexing a knowledge base where every KB document is SEO-optimized to embed all pertinent questions a user might ask that "should lead" to that KB document. But for such documents, even matching the user's query text against a "dumb" tf-idf index will surface them. LLMs aren't gaining you any ground here. (As is evident by the fact that webpages SEO-optimized in this way could already be easily surfaced by old-school search engines if you typed such a query into them.)

An ideal candidate chunk/document from a re-ranking LLM's perspective, would be one that an instruction-following LLM (with the whole corpus in its context) would spit out as a response, if it were prompted with the user's query. E.g. if the user asks a question that could be answered with data, a document containing that data would rank highly. And that's exactly the kind of documents we'd like "semantic search" to surface.


I've been thinking about the problem of what to do if the answer to a question is very different to the question itself in embedding space. The KB method sounds interesting and not something I thought about, you sort work on the "document side" I guess. I've also heard of HYDE, the works on the query side, you generate hypothetical answers instead to the user query and look for documents that are similar to the answer, if I've understood it correctly.


The main point didn't get hit on by the responses. Re-ranking is just a mini-LLM (for latency/cost reasons) that does a double heck. Embedding model finds the closest M documents in R^N space. Re-ranker picks the top K documents from the M documents. In theory, if we just used Gemini 2.5 Pro or GPT 5 as the re-ranker, the performance would even be better than whatever small re-ranker people choose to use.


text similarity finds items that closely match. Reranking my select items that are less semantically "similar" but are more relevant to the query.


the reranker is a cross encoder that sees the docs and the query at the same time. What you normally do is you generating embeddings ahead of time, independent of the prompt used, calculate cosine similarity with the prompt, select the top-k best chunks that match the prompt and only then use a reranker to sort them.

embeddings are a lossy compression, so if you feed the chunks with the prompt at the same time, the results are better. But you can't do this for your whole db, that's why the filtering with cosine similarity at the beginning.


Because LLMs are a lot smarter than embeddings and basic math. Think of the vector / lexical search as the first approximation.


Dumb question but just cuz I didn’t see it mentioned have you tried using a Disallow: / in your robots.txt? Or Crawl-delay: 10? That would be the first thing I would try.

Sometimes these crawlers are just poorly written not malicious. Sometimes it’s both.

I would try a zip bomb next. I know there’s one that is 10 MB over the network and unzips to ~200TB.


It's for crawlers not custom scrapers


Respecting robots.txt is a convention not enforced by anything so yes the bot is certainly free to ignore it.

But I’m not sure I understand your distinction. A scraper is a crawler regardless of whether it is “custom”or an off the shelf solution.

The author also said the bot identifed itself as a crawler

> Mozilla/5.0 (compatible; crawler)


Could you explain this some more? How are your costs so low in comparison? Are you using serverless?


That's how the cloudy platforms get you. They're very cheap on the low end, until they're not.


No, it's because we changed the way we process our metrics.

Previously we processed our metrics by consolidating them into multidimensional entries on a minute basis.

We moved to single metric second-based collection, because it was getting too complicated to process and because we wanted second-by-second measurement to measure engagement more granularly. That increased our data retention tremendously. We're still under the cost for the other timestream products, but we'll be adjusting how we do that in a quarter or two.


I've heard this a few times. Can you explain a bit more why you think that's a problem.

I've always made the assumption that once they become "not cheap" you now have the cost to offset investment against.


It depends on you understanding your app and how things need to be structured. We have what essentially is a video CMS, so we have two parts: a management UI that end-users use and a backend that actually delivers the video and collects metrics.

They are essentially two products, and are designed that way; if the management UI barfed the backend would continue along forever.

You can combine management and delivery in one app, but that makes delivery more fragile and will be slower because presumably it has to invoke a lot of useless stuff just to deliver bytes. I remember working with a spring app that essentially built and destroyed the whole spring runtime just to serve a request, which was an unbelievably dumb thing to do. Spring became the bottleneck, and for most requests there was actually no work done; 99% of the time was in spring doing spring things.

So really, once you separate the delivery and management it becomes easier to figure out the minimum amount of stuff you need. Redis, because you need to cache a bunch of metadata and handle lots of connections. Mysql, because you need a persistent store. Lambda, as a thin layer between everything. And a CDN, because you don't want to serve stuff out of AWS if you can help it. SQS for what essentially becomes job control. And for metric collection we use fastly with synthetic logging.

To be fair, our AWS cost was low but our CDN cost is like $1800/mo for some number of PB/mo (5? 10? I forget).

In the old days this would require at least (2 DB + 2 App server + 2 NAS) * 2 locations = 8 boxes. If we were going to do the networking ourselves we'd add 4 f5s. Ideally we'd have the app server, redis, and the various lambdas on different boxes, so 2 redis + 2 runners = 8 more servers. If we didn't use f5s we'd have 2 reverse proxies as the front end at each location. Each box would have 2 PSUs, at least a raid 1, dual NICs, and ECC. I think the lowest end Dell boxes with those features are like $5k each? Today I'd probably just stuff some 1TB SSDs in them and mirror them instead of going SAS. The NAS would be hard to spec because you have to figure out how much storage you need and they can be a pain to reconfigure. You don't want to spend too much up front, but you also don't want to have downtime while you add some more drive space.

Having built this out, it's not as easy as you'd think. I've been lucky enough to have built this sort of thing a few times. It's fun to do, but maintaining it can be a PITA. If you don't believe in documentation your deployment will fail miserably because you did something out of order.


There was a recent VLDB paper[1] demonstrating that the extension DuckPGQ[2] for DuckDB (an embedded database) offers competitive graph query performance compared to Neo4j and Umbra. No data on how it compares to KuzuDB.

[1] https://vldb.org/cidrdb/papers/2023/p66-wolde.pdf [2] https://duckpgq.org/


> No foreign-keys allowed

So it was sharded? And no cross-shard queries or txns were allowed?

Or it was replicated? And no FKs were allowed because of consistency problems?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: