Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a great point. Why is Git LFS uploading a large file in 50 byte chunks?


Ideally large files would upload in MTU sized packets, which Nagle's algorithm will often give you, otherwise you may have a small amount of additional overhead at the boundary where the larger chunk may not be divisible into MTU sized packets.

Edit: I mostly work in embedded (systems that don't run git-lfs), perhaps my view is isn't sensible here.


Dividing packets into MTUs is the job of the tcp stack - or even the driver or NIC in the case of offloads. Userspace software shouldn’t deal with MTUs and always use buffer sizes that make sense for the application - eg 64kB or even more. Otherwise the stack wouldn’t be very efficient with every tiny piece of data causing a syscall and independent processing by the networking stack


Right; it sounds to me like the real bug is that git-lfs isn't buffering writes to the network driver. Correct me if I'm wrong but if git-lfs was buffering its writes (or using sendfile) then Nagle's algorithm wouldn't matter.


It matters less often - it can still matter at the end of each write buffer. Larger write-buffers remove a lot of chances for this to happen.

If the application can buffer the entire file or use sendfile, probably best to disable Nagle's algorithm so the last packet goes out immediately. Nginx does this.

Another option is turning off Nagle's algorithm at the end of each transfer, and on at the start of the next, but this causes extra syscalls.


I do not know Go. But what if there are so many high level abstractions in the Go language that it operates on streams directly?


The standard convention is to slap bufio.Reader/bufio.Writer on streams to make them more performant.

Though how LFS ends up with ~50 byte chunks is probably something very, very, dumb in the LFS code itself. Better to fix that mistake than to paper over it.


bufio is for adding buffering regardless of source/dest. Better in this case is ReaderFrom (which will also be used transparently by io.Copy) to let the socket control the buffering and apply even more optimizations. For something like git-lfs I could expect sendfile to provide a huge improvement, depending on the underlying storage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: