More

notaplumber · on July 25, 2021

Likely because the author doesn't care about that and are productivity focused. Besides, OpenBSD uses the same drivers as Linux, ported to the OpenBSD kernel.

https://github.com/openbsd/src/commit/ad8b1aafbcc34f7eb86e4e...

notaplumber · on July 18, 2021

Although as mentioned in the article, this won't work on systems like *BSD, Solaris or macOS.

kubkon · on July 19, 2021

One tiny correction that we do support seamless C/C++ cross-compilation to macOS now with the advent of our own in-house linker `zig ld`. :-)

notaplumber · on July 18, 2021

> And while it is likely that dhclient will eventually be removed from OpenBSD Base, it will live on in Ports, where it will be available to those who desire to use it:

To be clear, OpenBSD's dhclient is not ISC dhcp client, it is a version that has been maintained in OpenBSD for many years and has a weaker form of privsep, but dhcpleased is a new daemon written to be more in-line with other OpenBSD daemons.

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sbin/dhcpleased...

OpenBSD's previous dhclient: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sbin/dhclient/

notaplumber · on July 18, 2021

It definitely should not have been, this is news related to OpenBSD, there are no portable versions of either daemon.

notaplumber · on July 1, 2021

I don't buy this. Some Acer laptops in the past have required that you set a Supervisor BIOS password in order to change the Secure Boot settings, but after you do that you can set the password to blank to clear it again.

zozbot234 · on July 1, 2021

Good point, this is definitely a thing on some hardware. Still crappy, but not a huge deal breaker.

zoomablemind · on July 1, 2021

+1 on the need to set the Supervisor PW in BIOS, otherwise the settings were greyed out. Puzzling indeed, I had to deal with it back some 5 years ago. Definitely found this answer on forums rather than from mfgr suport pages/docs.

However in this case it may be a different "lockdown" issue, though quite an odd choice reasoning-wise.

nrjames · on July 2, 2021

With this machine, if purchased from Amazon, the settings remain greyed out even if you set the Supervisor PW. That is not the case if you buy it elsewhere.

nrjames · on July 2, 2021

You can do that on this laptop if you don’t buy it from Amazon. There are plenty of YouTube videos showing how. If you but it from Amazon, like I did, you can set/change BIOS passwords, but Secure Boot cannot be changed and you cannot add a USB drive to the boot order. I spent hours trying to find a workaround.

josephcsible · on July 1, 2021

Even if this is the case, it's still terrible that such unintuitive, undiscoverable behavior (1) isn't documented as being the case, and (2) isn't known and explained by their tech support. And why would it only work like this if you bought from Amazon?

notaplumber · on June 20, 2021

> strlcpy... it’s not standard so it doesn’t satisfy 5 either.

It's a de facto standard supported on *BSD/macOS/Android/Linux(w/ musl libc) and a number of other commercial systems, it's also on track for standardization in POSIX (Issue 8).

https://www.austingroupbugs.net/view.php?id=986#c5050

https://www.opengroup.org/austin/docs/austin_1110.pdf

While it might not meet your arbitrary set of requirements, it already has decent adoption.

saagarjha · on June 20, 2021

Yikes, I hope they don’t add it, that would make people want to use it even more. Given its surprising performance characteristics it’s usually not what you want.

notaplumber · on June 20, 2021

It's looking very much like it will, but even if that wasn't the case, it's already widely in use and available in OS libc's and easy to copy, and it has been since OpenBSD first introduced it over 20 years ago. I also believe your performance claims are exaggerated.. and strlcpy does exactly what people want and expect in most scenarios.

saagarjha · on June 20, 2021

I've seen a lot of strlcpys, and of that I think I have seen maybe one place where the return value was used. I think if you asked people why they used strlcpy, 95+% would say "security" and then list out the characteristics my strxcpy has, rather than "I want snprintf-like behavior with a size_t return" which is what strlcpy is. I would not be surprised if most people got the time complexity of the function incorrect because of this.

asvitkine · on June 20, 2021

If the return value of strlcpy isn't used, the compiler can optimize the extra inefficiency away. (I don't know if any do, though.)

This is similar to how strlen(x)>0 calls get optimized to *x by clang and likely other compilers.

notaplumber · on June 11, 2021

> Because flash does not overwrite anything, ever.

This is repeated multiple times in the article, and I refuse to believe it is true. If NVME/SSDs never overwrote anything, they would quickly run out of available blocks, especially on OSs that don't support TRIM.

cduzz · on June 11, 2021

There's nuance to this; the deletes / overwrites are accomplished by bulk wiping entire blocks.

Rather than change the paint color in a hallway you have to tear down the house and build a new house in the vacant lot next door that's a duplicate of the original, but with the new hallway paint.

To optimize, you keep a bucket of houses to destroy, and a bucket of vacant lots, and whenever a neighborhood has lots of "to be flattened houses" the remaining active houses are copied to a vacant lot and the whole neighborhood is flattened.

So, things get deleted, but not in the way people are used to if they imagine a piece of paper and a pencil and eraser.

slaymaker1907 · on June 11, 2021

Just to add to the explanation, SSDs are able to do this because they have a layer of indirection akin to virtual memory. This means that what your OS thinks is byte 800000 of the SSD may change it's actual physical location on the SSD over time even in the absence of writes or reads to said location.

This is a very important property of SSDs and is a large reason why log structured storage is so popular in recent times. The SSD is very fast at appends, but changing data is much slower.

hinkley · on June 11, 2021

> The SSD is very fast at appends, but changing data is much slower.

No, it's worse than that. The fact that it's an overly subtle distinction is the problem.

SSDs are fast while write traffic is light. From an operational standpoint, the drive is lying to you about its performance. Unless you are routinely stress testing your system to failure, you may have a very inaccurate picture of how your system performs under load, meaning you have done your capacity planning incorrectly, and you will be caught out with a production issue.

Ultimately it's the same sentiment as people who don't like the worst-case VACUUM behavior of Postgres - best-effort algorithms in your system of record make some people very cranky. They'd rather have higher latency with a smaller error range, because at least they can see the problem.

hawski · on June 11, 2021

Are there write-once SSDs? They would have a tremendous capacity. Probably good for long term backups or archiving. Also possibly with a log structured filesystem only.

eqvinox · on June 11, 2021

Making them write-once doesn't increase the capacity; that's mostly limited by how many analog levels you can distinguish on the stored charge, and how many cells you can fit. The management overhead and spare capacity to make SSDs rewritable is –to my knowledge– in the single digit percentages.

(Also you need the translation layer even for write-once since flash generally doesn't come 100% defect free. Not sure if manufacturers could try to get it there, but that'd probably drive the cost up massively. And the translation layer is there for rewritable flash anyway... the cost/benefit tradeoff is in favor of just living with a few bugged cells.)

entangledqubit · on June 11, 2021

I suspect that hawki was assuming that a WORM SSD would be based on a different non-flash storage medium. I don't know any write once media that has similar read/write access times to an SSD.

FWIW, there are WORM microsd cards available but it looks like they still use flash under the hood.

hawski · on June 11, 2021

I don't know enough specifics, so I didn't assume anything :) In fact I was not aware of non-flash SSDs.

Because of the Internet age there probably is not much place for write once media anyway, even it would be somewhat cheaper. But maybe for specialized applications or if it would be much much cheaper per GB.

wongarsu · on June 11, 2021

The only write once media I'm aware of that is in significant use are WORM tapes. They don't offer significant advantages over regular tapes, but for compliance reasons it can be useful to just make it impossible to modify the backups.

Pet_Ant · on June 12, 2021

What about EPROMs? I mean could those be scaled down with 7nm lithography to be energy efficient incorruptible fast storage?

AstralStorm · on June 12, 2021

You mean the UV erasable kind? Essentially phase change memory? Very hard to miniaturize?

Because the older Flash aren't as stable when miniaturized as you'd expect. Current flash is a direct descendant of these, they only are more stable because the cells are much chunkier and thus with lower leakage.

Pet_Ant · on June 14, 2021

I was thinking of the anti-fuse based PROMs not EPROMs sorry. I figure if you miniatures those they'd be faster and denser and use-based fully reliable.

entangledqubit · on June 14, 2021

I thought along that route as well but I'm not sure how the feature scale of a fuse compares to the size of a flash cell - especially since the latter can contain multiple bits worth of info (MLC). Assuming the fuse write results in a serious physical state change of some sort, I suspect that the energy required for high speed writes (at SSD speeds) may become substantial.

That being said, it's not clear how innovation has occurred in this direction in the storage space.

jonny_eh · on June 11, 2021

> Making them write-once doesn't increase the capacity

It could theoretically make them cheaper. But I guess that there wouldn't be enough demand, so you'd be better off having some kind of OS enforced limitation on it.

weinzierl · on June 11, 2021

I find this a super interesting question. I always assumed that long term stability of electronic non-volatile memory is worse than that of magnetic memory. When I think about it, I can't think of any compelling reason why that should be the case. Trapped electrons vs magnetic regions; I have no intuition which one of them is likely to be more stable.

There is a question on stackoverflow about this topic with many answers but no definitive conclusion. There seem to be some papers touching the subject but at a glance I couldn't find anything useful in them.

[1] https://superuser.com/questions/4307/what-lasts-longer-data-...

olejorgenb · on June 11, 2021

According to https://www.ni.com/en-no/support/documentation/supplemental/... (Seems kinda reputable at least)

"The level of charge in each cell must be kept within certain thresholds to maintain data integrity. Unfortunately, charge leaks from flash cells over time, and if too much charge is lost then the data stored will also be lost.

During normal operation, the flash drive firmware routinely refreshes the cells to restore lost charge. However, when the flash is not powered the state of charge will naturally degrade with time. The rate of charge loss, and sensitivity of the flash to that loss, is impacted by the flash structure, amount of flash wear (number of P/E cycles performed on the cell), and the storage temperature. Flash Cell Endurance specifications usually assume a minimum data retention duration of 12 months at the end of drive life."

wtallis · on June 12, 2021

> During normal operation, the flash drive firmware routinely refreshes the cells to restore lost charge. However, when the flash is not powered the state of charge will naturally degrade with time.

You have to be careful how you interpret this bit. "Normal operation" here assumes not just that the SSD is powered, but that it is actively used to perform IO. Writes to the SSD will eventually cause data to be refreshed as a consequence of wear leveling; if you write 1TB per month to a 1TB drive then every (in-use) cell will be refreshed approximately monthly, and data degradation won't be a problem.

If you have an extremely low-write workload, the natural turnover due to wear leveling won't keep the data particularly fresh and you'll be dependent on the SSD re-writing data when it notices (correctable) read errors, which means data that is never accessed could degrade without being caught. But in this scenario, you're writing so little to the drive that the flash stays more or less new, and should have quite long data retention even without refreshing stored data.

dataflow · on June 11, 2021

> When I think about it, I can't think of any compelling reason why that should be the case. Trapped electrons vs magnetic regions; I have no intuition which one of them is likely to be more stable.

My layman intuition (which could be totally wrong) is that trapped electrons have a natural tendency to escape due to pure thermal jitter. Whereas magnetic materials tend to stick together, so there's at least that. Don't how much of this matches the actual electron physics/technology though...

sharikone · on June 11, 2021

Hmm I don't think this is conclusive. Thermal jitter makes magnetic boundaries change too, and of course you have to add to it that it is more susceptible to magnetic interference.

I don't have intuition either, but I don't think this explanation is sufficient

madacol · on June 11, 2021

> Trapped electrons vs magnetic regions;

From the physics point of view, aren't both cases the same thing?.

Isn't magnetic regions a state of the electric field? so if I move electrons in and out, the electric field should be changing as well

mananaysiempre · on June 11, 2021

No. A region of a piece of material is magnetized in a certain direction when its (ionized) atoms are mostly oriented in that direction, the presence of a constant magnetic field is (roughly speaking) only a consequence of that.

So flash memory is about the electrons, while magnetic memory is about the ions.

pas · on June 11, 2021

Aren't permanent magnetics a direct result of oriented spins? (So due to quantum effects?)

dmitrygr · on June 11, 2021

Modern multi-bit-per-cell flash has quite terrible data retention. It is especially low if it is stored in a warm place. You'd be lucky to see ten years without an occasional re-read + error-correct + re-write operation going on

phonon · on June 11, 2021

Like https://en.wikipedia.org/wiki/ROM_cartridge ?

I think Nintendo uses https://www.mxic.com.tw/en-us/products/ROM/Pages/default.asp...

https://www.mxic.com.tw/CachePages/en-us-Product-ROM-default...

salawat · on June 12, 2021

Any SSD you go through the trouble of building a max capacity disk image for, then dd'ing onto the disk before removing?

I mean... This is general purpose HW here. Write once SSD is a workflow more than an economically tenable use-case in terms of making massive size write once then burn the write circuit devices.

wtallis · on June 11, 2021

I don't think anyone would make literally write-once drives with flash memory; that's more optical disk territory. But zoned SSDs and host-managed SMR hard drives make explicit the distinction between writes and larger-scale erase operations, while still allowing random-access reads.

juloo · on June 11, 2021

That would be magnetic tapes.

alisonkisk · on June 11, 2021

Append-only garbage-collected storage was used in data center even when hard disks were (and are) popular because it's more reliable and scalable.

zdw · on June 11, 2021

inspired by that last sentence, the analogy could be rewritten as:

  - lines on page
  - pages of paper
  - whole notebooks

and might be easier for people to grok than the earlier houses/paint analogy.

jazzyjackson · on June 11, 2021

I don’t know, I like the drama of copying a neighborhood and tearing down the old one xD

jnwatson · on June 11, 2021

Reminds me of https://xkcd.com/1737/.

"When a datacenter catches fire, we just rope it off and rebuild one town over."

jdironman · on June 11, 2021

Speaking of xkcd, 2021 is return of "All your bases" See alt-text on image.

https://xkcd.com/286/

wand3r · on June 11, 2021

I think the explanation is sound maybe (I am not that familiar) but the analogy gets a bit lost when you talk about buckets of houses and buckets of vacant lots.

Maybe there is a better analogy or paradigm to view this through.

cduzz · on June 14, 2021

I should have been a little more clear -- the urban planner managing the house building / copying and neighborhood destruction (the realtime controller) The rules are: 1) You can build a house kinda quickly 2) You can't modify a house once it is built 3) you can only build a house on a vacant lot 4) you can change the "mailing address" (relative to the physical location) of the house 5) you can only knock down whole blocks of houses at once (not one at a time) 6) each time you flatten a block more crap accumulates in that block until after a while you can't build there anymore. 7) the flatten / rebuild step may be quite slow (because you have lots of houses to build) 8) You can lie and say you built a house before it is finished, if you don't have too many houses to build. (if you've got an SSD with a capacitor / battery or tiny cache and reserved area for that cache) 9) you've lied to the user and you actually have 5-100% more build-able area than you've advertised. 10) you have a finite area so eventually the dead space accumulates to the point where you can no longer safely build.

So -- you keep track of vacant lots and "dead" houses (abandoned but not flattened); whenever you've got spare time you will copy blocks with some ratio of "live" to abandoned houses to new lots so the new block only has live houses.

These pending / anticipatory compaction/garbage collection operations are what I refer to as "buckets" -- having to compact 300 (neighborhoods) blocks to achieve 300 writes is going to result in glacial performance because of this huge write amplification (behind the scenes the drive is duplicating 100s of mb / gb of data to write a small amount of user modifications)

As you might imagine, there are lots of strategies to how to approach this problem, some of which give you an SSD with extremely unpredictable (when full) performance, others will give a much more consistent but "slower" performance.

daniellarusso · on June 11, 2021

Spoiler alert - This is the plot to ‘The Prestige’.

eqvinox · on June 11, 2021

It's true and untrue depending on how you look at it. Flash memory only supports changing/"writing" bits in one direction, generally from 1 to 0. Erase, as a separate operation, clears entire sectors back to 1, but is more costly than a write. (Erase block size depends on the technology but we're talking MB on modern flash AFAIK, stuff from 2010 already had 128kB.)

So, the drives do indeed never "overwrite" data - they mark the block as unused (either when the OS uses TRIM, or when it writes new data [for which it picks an empty block elsewhere]), and put it in a queue to be erased whenever there's time (and energy and heat budget) to do so.

Understanding this is also quite important because it can have performance implications, particularly on consumer/low-end devices. Those don't have a whole lot of spare space to work with, so if the entire device is "in use", write performance can take a serious hit when it becomes limited by erase speed.

[Add.: reference for block sizes: https://www.micron.com/support/~/media/74C3F8B1250D4935898DB... - note the PDF creation date on that is 2002(!) and it compares 16kB against 128kB size.]

matheusmoreira · on June 11, 2021

> Understanding this is also quite important because it can have performance implications

Security implications too. The storage device cannot be trusted to securely delete data.

effie · on June 11, 2021

If you write whole drive capacity of random data, you should be fine.

robocat · on June 11, 2021

No. Say a particular model of SSD has over-provisioning of 10%, then even after writing the "whole" capacity of the drive, you can still be left with up to 10% of data recoverable from the Flash chips.

effie · on June 12, 2021

Right, so one better write 2x or 10x drive capacity of random data to it.

isotopp · on June 14, 2021

You should be running flash with self-encryption (and make sure you have a drive that implements that correctly).

To zap a drive you ask it to securely drop the self-encryption key. The data will still be there, but without the key it is indistinguishable from random noise.

effie · on June 15, 2021

Well who has time and energy to verify that. Just overwrite it several times, or destroy the drive.

dannyw · on June 12, 2021

For some family photos? Probably. For sensitive material or crypto keys? Absolutely not, due to overprovisoning as mentioned (which can be way higher than 10% for enterprise drives), but also due to controllers potentially lying to you especially when drives have things like pSLC caches, etc.

IshKebab · on June 11, 2021

By any reasonable definition they do overwrite data. It's just that they can't overwrite less than a block of data.

tzs · on June 11, 2021

If a logical overwrite only involved bits going from 1 to 0, are and drives smart enough to recognize this and do it as an actual overwrite instead of a copy and erase?

eqvinox · on June 11, 2021

On embedded devices, yes, this is actually used in file systems like JFFS2. But in these cases the flash chip is just dumb storage and the translation layer is implemented on the main CPU in software. So there's no "drive" really.

On NVMe/PC type applications with a controller driving the flash chips… I have absolutely no idea. I'm curious too, if anyone knows :)

jasonwatkinspdx · on June 12, 2021

I do know. Apparently you downvoted my sibling response to you as too simplistic, but I was clearly responding to someone where the embedded bare drive situation is irrelevant.

When it comes to what non bare flash drives do, you can start here: http://www.vldb.org/pvldb/vol13/p519-kakaraparthy.pdf

This paper is imperfect and the following citations are worth skimming. There's a cohort of similar papers chasing the same basic question in recent years that aren't densely cited amongst each other.

Go here next: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46... but note that's just a jumping off point to the more recent papers.

It's hard to gain a full understanding of this layer because it's the basis of intense competition, hence held closely by controller manufacturers.

I'm far from world expert on this, but have read a lot about it and can answer with what I know to the best of my ability.

eqvinox · on June 12, 2021

> Apparently you downvoted my sibling response to you as too simplistic,

I didn't downvote your sibling response, but I did ignore it since it provided neither any sources nor any context for why I should trust your knowledge. Apparently others were less kind on your short statement.

With the additional information in this post, I'm much more willing to accept it into my head — thanks for answering this!

jasonwatkinspdx · on June 12, 2021

Yeah sorry that was unnecessarily grouchy of me.

jasonwatkinspdx · on June 11, 2021

Generally no, because the unit of write is a page.

isotopp · on June 11, 2021

Flash has a flash translation layer (FTL). It translates linear block addresses (LBA) into physical addresses ("PHY").

Flash can write blocks at a granularity similar to a memory page (cells, around 4-16 KB). It can erase only sets of blocks, at a much larger granularity (around 512-ish cell sized blocks).

The FTL will try to find free pages to write your data to. In the background, it will also try to move data around to generate unused erase blocks and then erase them.

In flash, seeks are essentially free. That means that it does no longer matter if blocks are adjacent. Also, because of the FTL, adjacent FTL are not necessarily adjacent on the physical layer. And even if you do not rewrite a block, it may be that the garbage collection moves data around at the PHY layer in order to generate completely empty erase blocks.

The net effect is that positioning as seen from the OS no longer matters at all from the OS layer, and that the OS layer has zero control over adjacency and erase at the PHY layer. Rewriting, defragging, or other OS level operations cannot control what happens physically at the flash layer.

TRIM is a "blatant layering violation" in the Linus sense: It tells the disk "hardware" what the OS thinks it no longer needs. TRIM'ed blocks can be given up and will not be kept when the garbage collector tries to free up an erase page.

anarazel · on June 11, 2021

> In flash, seeks are essentially free. That means that it does no longer matter if blocks are adjacent.

> The net effect is that positioning as seen from the OS no longer matters at all from the OS layer, and that the OS layer has zero control over adjacency and erase at the PHY layer. Rewriting, defragging, or other OS level operations cannot control what happens physically at the flash layer.

I don't agree with this. The "OS visible position" is relevant, because it influences what can realistically be written together (multiple larger IOs targeting consecutive LBAs in close time proximity). And writing data in larger chunks is very important for good performance, particularly in sustained write workloads. And sequential IO (in contrast to small random IOs) does influence how the FTL will lay out the data to some degree.

dannyw · on June 12, 2021

Disagree, because my understanding your OS visible positions have zero relevance to what will actually be translated to PHYs.

If you feed your NVMe a stream of 1GB writes spread out at completely randomised OS visible places (LBAs), the FTL may very well write it sequentially and you get the solid sustained write performance.

Conversely, you may try to write 1GB of sequential LBAs, and your FTL may very well spread it out all across the physical blocks simply because that's what’s available.

What I'm saying is that sequential reads and writ workloads are good, but whether the OS considers them sequential or not in terms of LBAs is irrelevant. The controller ignores LBAs and abstracts everything away.

My understanding could be wrong, so please correct me if I am.

anarazel · on June 12, 2021

That may sometimes be true the first times you write the random data (but in my experience it's often not true even then, and only if you carefully TRIMed the whole filesystem and it was mostly empty). But on later random writes it's rarely true, unless your randomness pattern of exactly the same as in the first run. To make room the FTL will (often in the background) need to read the non-written parts of erase blocks sized data assigned in the previous runs, just to be able to write out the new random writes. At some point new writes need to wait for this. Slowing things down.

Whereas with larger/sequential writes, there's commonly no need for read-modify-write cycles. The entire previous erase block sized chunks can just be marked as reusable with new content - the old data isn't relevant anymore.

This is pretty easy to see by just running benchmarks with sustained sequential and random write IO. But on some devices it'll take a bit - initially the writes are all in a faster area (e.g. using SLC flash instead of denser/cheaper mlc/tlc/qlc).

Of course, if all the random writes are >= erase block size, with a consistent alignment to multiples of the write size, then you're not going to see this - it's essentially sequential enough.

notaplumber · on June 11, 2021

Thanks for this part, I feel like this was a crucial piece of information I was missing. Also explains my observations about TRIM not being as important as people claim it is, the firmware on modern flash storage seems more than capable of handling this without OS intervention.

isotopp · on June 11, 2021

The GC in the device cleans up.

TRIM is useful, it gives the GC important information.

TRIM is not that important as long as the device is not full (less than 80%, generally speaking, but it is very easy to produce pathological cases that are way off in either direction). Once the device fills up above that it is crucial.

throwaway09223 · on June 11, 2021

The author clearly explains how this works in the sentence immediately following. "Instead it has internally a thing called flash translation layer (FTL)" ...

notaplumber · on June 11, 2021

I unfortunately skimmed over this, isotopp's explanation helped clear things up in my head.

throwaway09223 · on June 11, 2021

I just saw his post, it's a great explanation.

It might also help to keep in mind that both regular disk drives and solid state drives remap bad sectors. Both types of disks maintain an unaddressable storage area which is used to transparently cover for faulty sectors.

In a hard drive, faulty sectors are mapped during production and stored in the p-list, and are remapped to sectors in this extra hidden area. Sectors that fail at runtime are recorded in the g-list and are likewise remapped.

Writes may usually go to the same place in a hard drive, but it's not guaranteed there either.

dannyw · on June 12, 2021

This is not true anymore for many recent SMR HDDs. They have a translation layer, just like flash storage.

This is because for SMR HDDs, each block can either be SMR (higher density, EXTREMELY SLOW WRITES like <10mb/s possible, erases will remove multiple blocks just like flash memory), or normal (standard density, normal write speeds).

The controller abstracts this away and does writes as normal, but while the drive is idle, the controller in the background, converts these standard blocks into SMR blocks.

This is also why SMR HDDs support TRIM.

throwaway09223 · on June 12, 2021

Thanks for the info that makes a lot of sense. It looks like this tech has emerged in the time since I last did much work with disk drives.

Seems it's increasingly a bad idea to presume the implementation of internals.

aidenn0 · on June 11, 2021

Perhaps they mean it must erase an entire block before writing any data, unlike a disk that can write a single sector at a time?

dragontamer · on June 11, 2021

The issue is that DDR4 is like that too. Not only the 64 byte cache line, but DDR4 requires a transfer to the sense amplifiers (aka a RAS, row access strobe) before you can read or write.

The RAS command eradicated the entire row, like 1024 bytes or so. This is because the DDR4 cells only have enough charge for one reliable read, after that the capacitors don't have enough electrons to know if a 0 or 1 was stored.

A row close command returns the data from the sense amps back to the capacitors. Refresh commands renew the 0 or 1 as the capacitor can only hold the data for a few milliseconds.

------

The CAS latency statistic assumes that the row was already open. It's a measure of the sense amplifiers and not of the actual data.

eqvinox · on June 11, 2021

It's vaguely similar, but there's a huge difference in that flash needs to be erased before you can write it again, and that operation is much slower and only possible on much larger sizes. DDR4 doesn't care, you can always write, just the read is destructive and needs to be followed by a write.

I think this makes the comparison unhelpful since the characteristics are still very different.

Dylan16807 · on June 11, 2021

The difference is that on DDR you have infinite write endurance and you can do the whole thing in parallel.

If flash was the same way, and it could rewrite an entire erase block with no consequences, then you could ignore erase blocks. But it's nowhere near that level, so the performance impact is very large.

dragontamer · on June 11, 2021

That's a good point.

There are only 10,000 erase cycles per Flash cell. So a lot of algorithms are about minimizing those erases.

zymhan · on June 11, 2021

What does DDR have to do with NVMe?

nine_k · on June 11, 2021

You can't write a byte, or a word, either.

The "fact" that you can do it in your program without disturbing bytes around it is a convenient fiction that the hardware fabricates for you.

dragontamer · on June 11, 2021

DDR4 is effectively a block device and not 'random access'.

Pretty much only cache is RAM proper these days (aka: all locations have equal access time... that is, you can access it randomly with little performance loss).

bertday · on June 11, 2021

I’m confused. What’s the difference between a cache line and a row in RAM? They’re both multiples of bytes. You have data sharing per chunk in either case.

The distinction seems to be how big the chunk is not uniformity of access time (is a symmetrical read disk not a block device?)

dragontamer · on June 11, 2021

Hard disk chunks are 512 bytes classically, and smaller than the DDR4 row of 1024 bytes !!

So yes. DDR4 has surprising similarities to a 512byte sector hard drive (modern hard drives have 4k blocks)

>> What’s the difference between a cache line and a row in RAM?

Well DDR4 doesn't have a cache line. It has a burst length of 8, so the smallest data transfer is 64 bytes. This happens to coincide with L1 cache lines.

The row is 1024 bytes long. Its pretty much the L1 cache on the other side, so to speak. When your CPU talks to DDR4, it needs to load a row (RAS all 1024 bytes) before it can CAS read a 64 byte burst length 8 chunk.

-----------

DDR4, hard drives, and Flash are all block devices.

The main issue for Flash technologies, is that the erase size is even larger than the read/write block size. That's why we TRIM for NVMe devices.

bertday · on June 12, 2021

Thanks, I see what you mean at the interface level.

In terms of performance analogy though, hard drives do not get random access to blocks, but RAM does. The practical block size of hard drives is sequential reads of 100kiB+ due to seeks.

Dzugaru · on June 11, 2021

Of course it does [0]. It's just it assigns writes as evenly as possible (to have as even wear as possible), so log-like internal "file system" is a way to go.

https://pages.cs.wisc.edu/~remzi/OSTEP/file-ssd.pdf

notaplumber · on June 2, 2021

> I modified the OpenBSD kernel for the raspberry pi with gaming related extensions, in such a way that when the game runs there is no latency and no stuttering.

What does that even mean? What "gaming related extensions"?-- could you elaborate?

OpenBSD doesn't have any accelerated drivers for the GPU on the RPi, so I'm curious what your "no latency/no stuttering" modifications would even be, beyond perhaps maybe recompiling the kernel with HZ=1000 or something else..

alehyze · on June 2, 2021

Without going into the details: I have read the linux drivers and understood how HDMI audio and hardware sprites (planes) work amongst other details. Then implemented that myself in C and assembly with no OS. Then modified the OpenBSD kernel in a way that it can use that stuff and do even more sophisticated stuff. This is not just recompiling the kernel with some different options. It's writing substantial code. I modified 6.8 because 6.9 was not even out when I started, so it's been a couple of months of very hard work on the kernel only, not counting all the time I spent on it before when I was working bare metal. And it's not finished yet :-)

notaplumber · on June 2, 2021

This is HN, I'm sure people would appreciate hearing the details. Like how does this integrate with OpenBSD's existing audio/graphics stacks, do you plan to upstream your work? With it being OpenBSD, are there security implications?

dr_zoidberg · on June 2, 2021

Well, he did say he wants to turn that into a product, so I understand he won't release details as they are a form of trade secret for his idea.

alehyze · on June 2, 2021

I just released the "big picture". I am sure that the OpenBSD kernel devs can redo what I did ten times quicker (in terms of development time) and a million times better (in terms of clean integration with the kernel and security) if they want to. They know all of the details of the kernel :-)

That said yes, I would like to be able to turn this into a product.

SLWW · on June 2, 2021

I'm interested in this; though i struggle to understand how you could turn something like this into a product.

There's a lot of logistics in selling this type of thing. Also the BSD license isn't exactly easy to sell something alongside of.

Have you considered open-sourcing and doing the ol' "if you like my work, buy me a coffee"

I know it's extremely communist of me, but I just can't imagine you not stepping on a lot of toes (both legally and otherwise) because of the spirit that OpenBSD was written in.

ac29 · on June 2, 2021

> BSD license isn't exactly easy to sell something alongside of

Why's that? There's no reason you cant have a closed-source commercial fork of BSD code. The Playstation 4 OS is a fork of FreeBSD: https://en.wikipedia.org/wiki/PlayStation_4_system_software

alehyze · on June 2, 2021

I haven't thought this through I have to admit.

I thought that the BSD license was friendly to this kind of stuff.

If I step on people's toes I would ask for forgiveness and try to give them something back, which I think is just the right thing to do.

For example, I read that OpenBSD developers were sad that most of the contributions they got was from individuals more than from companies, when they actually help companies a lot.

Maybe this could be a way to change that situation?

At the end of the day I want to be happy with myself. I don't want to make enemies. Possibly, I would love to keep working on this for a living as I enjoy it more than everything else I did in the past.

This is not a product yet. If it's going to be, it's going to be a very long and hard way before that happens. Why is everyone so concerned about money at this stage?

Personally, I thought that this was a cool thing that some people would like. Maybe I could make a simple living out of it.

Most people comments seem to me like they are assuming that this is going to be a huge success that could make money and are worried that the other devs would be left out with nothing.

Isn't this a bit of a prejudice? And isn't it premature to assume it's going to be a success?

alehyze · on June 2, 2021

Regarding the "open source, if you like it buy me a coffee" thing, I am not against that. By saying "I'd like to turn this into a product", I don't rule that out. I already mentioned that in another comment.

To me at this stage it's more about making a living out of something I love doing rather than becoming the next big tycoon.

jmiskovic · on June 3, 2021

You would get something like a cup of coffee per month. The ecosystem is not really ripe for donation-sustained open source development. GitHub Sponsors feature works a bit better than one time donations. I also like the idea of feature/bug bounties, but I don't know how well they work in practice.

I think the best model if you want to both share the code and make money is to offer commercial version with newest features, and open source a version that's a bit behind. As you publish a new version, you open-source the previous commercial version. This is more of a "source available" model because it actively discourages any community code contributions.

It is still not clear to me how to receive money as founder and maintainer on actual open source project, without resentments from contributors that don't get paid. Doesn't seem fair and has potential to turn into big drama.

alehyze · on June 3, 2021

This makes a lot of sense. Thank you :-)

Those are complicated matters.

I just have no idea how to do this properly at the moment. I am just hearing people's opinions for now and I'll try and do something that people are generally happy with. On the other hand, it's also true that one cannot please everyone.

To me it's like: "I have done something that I think is cool, let's see what people think about it".

I mentioned that I wanted to do a product out of this and people got emotional, as if it was granted that it's going to be a huge success.

I really wish it was that easy. I think that this is not realistic. It would take a lot of work from a lot of people who need to be paid because they need to pay their own bills to make this into a success. And it still could fail.

I regret having mentioned that I may want to turn this into a product, because it put too much focus on that rather than on the tech that I have developed.

alehyze · on June 2, 2021

What it does at the moment is a bit hacky, in that it does not integrate with the existing graphics stack in the way people could think.

I haven't written a driver for the pi graphics card or for HDMI audio. What I do with graphics is to save the existing state before beginning the game, do whatever I want during the game and restoring the state when it ends. With audio I am not doing it that cleanly as I believe that OpenBSD does not currently even touch HDMI audio registers.

As for the reduction of stuttering, when the game begins I stop 3 of the four cores, assign them entirely to the game ( also with new interrupt vector tables both in EL0 and EL1 ) and when the game exits give them back to OpenBSD. That way, while the game is running without interruption, the single core that is left to OpenBSD is free to run admin tasks. Since the game has a process that can be scheduled by that single core, the game can do networking or file I/O using OpenBSD, because the different cores have the same entries in the user space MMU tables and so they share memory and can talk to each other. OpenBSD cannot interrupt the game, it can only kill it if needed.

Regarding upstreaming my work the answer is "Probably not. But if the world wants it open sourced I am not against it and could think about doing a fork (Say we call it OpenBSD4Games). Or I could just give some help with the raspberry pi drivers."

The reason for that is that I am doing quite some stuff that am pretty sure the OpenBSD devs would not be happy to put in. OpenBSD is strongly focused on security. I am pretty sure they would not want to have some code in there that hijacks 3 cores out of 4 and gives them to a user process. Another example: I have been told that giving several contiguous memory pages to a user task is something that should not be done in OpenBSD. I understand why but then I give a game quite a few contiguous memory pages that the hardware will use for the frame buffer and so has to be contiguous.

Also, this is prototype code. It works for me, it still needs a ton of work and might not even be 100% correct. I cannot possibly understand all of the little details in the kernel in 2 months.

I am proud of this achievement though and would love to be able to turn this into a real product that many people enjoy.

Regarding the "home computer" part, I thought about game devs. When doing a game, it's not only the game code that is important. The tools are very important as well. So I built GNUstep ( on OpenBSD it does not currently work on arm64 ) so that it's super easy to build tools for game development in Objective-C.

Also I thought about playing around and experimenting, and I am planning a GUI app with GNUstep that integrates a C/C++ interpreter called cling, which is developed by CERN for their physics simulations if I am not wrong. With that, a game developer will be able to experiment with code and tweak it in a way that is similar to what Xcode playgrounds does with swift on the Mac.

alehyze · on June 2, 2021

By the way, here is a video demonstration of all this. It's not very polished and VSYNC does not work, but it works pretty much how I describe here.

https://youtu.be/KhRnGbVhejk

MonkeyClub · on June 3, 2021

Kudos to you, that's very interesting!

Could you share any resources you used to understand, navigate, and extend the OpenBSD kernel?

There doesn't seem to be much in the way of resources available on that -- nothing like the NetBSD Internals guide, not to mention the wealth of information on the Linux kernel.

A blog post on that would probably be widely appreciated as well, if you have the time :)

alehyze · on June 3, 2021

Thanks :-)

The article that got me started is the pdf taken from a conference called "OpenBSD Kernel Internals: The Hitchhiker's Guide" mentioned here: https://www.openbsd.org/events.html

I just checked the link and the site seems to be down for me now, so I refer you to that page. Hopefully it will come back up or the link will be updated.

In that article it describes how to add a syscall to openbsd. So you start by writing a very simple one just to see that it works and get comfortable rebuilding the kernel and testing your syscall with a userspace program.

Then, since I roughly knew what I wanted to achieve, I started looking at kernel code and finding the parts relevant to what I wanted to do.

Then take baby steps with your changes, try and fail until you succeed with a small bit before moving on to the next. Especially value failures, as they give you hints about what is going on. Think why it could have failed and try a different approach that should not fail in the same way. Very tedious and time consuming, but a very good way to learn when you don't have the books.

Then looked for hints on the web and talked to the awesome people on the mailing lists and on the reddit OpenBSD channel.

I did not talk too much to people though, I asked a few questions at the beginning and then had a short conversation about allocating contiguous physical memory to a userspace task, which is something that OpenBSD does not do. I figured out how to do that by myself, by trial and error. Same with other details.

My mindset was: I know how to do this with bare metal, there must be a way to connect to the kernel code and do the same in a way that does not break things too much if I keep it limited in scope.

I hope this helps :-)

MonkeyClub · on June 4, 2021

Yep, this is quite nice and helpful, many thanks!

alehyze · on June 4, 2021

notaplumber · on May 26, 2021

> "OpenBSD users: Hi Mark!"

This is Mark Kettenis, who has despite comments made jokingly by marcan, been working with a few other OpenBSD developers to bring-up OpenBSD/arm64 on the Apple M1. At least on the Mac Mini the Gigabit Ethernet works, Broadcom Wi-Fi, and work on the internal NVMe storage is progressing.

There was an early teaser dmesg posted in Feburary showing OpenBSD booting multi-user (on bare metal): https://marc.info/?l=openbsd-arm&m=161386122115249&w=2

Mark has also been adding support for the M1 to the U-Boot project, which will not only benefit OpenBSD, but also Asahi Linux.

Another OpenBSD developer posted these screenshots and videos on Twitter.

https://twitter.com/bluerise/status/1359644736483655683

https://twitter.com/bluerise/status/1354216838406823936

notaplumber · on May 10, 2021

Why wouldn't it be true? It's not as if the source and images aren't available for you to look at for yourself.

As for why BSD isn't more popular than Linux, well, that's a much bigger question. It could come down to licensing, project goals (not winning popularity contests), but mostly decades of history and Linux appearing at the right place, at the right time. There is place for alternative operating systems, choice is important.

hawski · on May 10, 2021

I have some interest in DragonflyBSD and BSD ethic is close to me, but I'll say that Linux being GPL made it successful. It centralizes development and with every developer joining in it cements it even further. But right place and time above all, if it would start today the license would be a nonstarter for many.

toast0 · on May 10, 2021

> Linux being GPL made it successful. It centralizes development and with every developer joining in it cements it even further.

Centralization doesn't have much to do with the GPL. With a BSD system, you can take an upstream release and use it as the base of your system and not publish your changes. With a GPL system, you can do the same, but if you distribute your system to others, you have to publish your changes; but you don't have to work with the upstream system unless you want to; your changes might get pushed upstream by someone else, or used as inspiration by upstream, but that's not that common. If you wanted to work with upstream, you can do that with BSD as well.

It may be forgotten or not known by many, but there was a 1991 lawsuit from AT&T over code in BSDi and the code in question was in other BSD distributions at the time including FreeBSD until the 2.0 release in November 1994. Certainly Linux had its moment of legal uncertainty that turned out fine, but it came after it was already well established. BSD had a legal shadow at a much earlier time, and that may have driven some people away.

tored · on May 10, 2021

Linux is like PHP and BSD is like Haskell.

tharne · on May 10, 2021

This is an absurdly good analogy.

mikece · on May 10, 2021

If PHP supported containers and Haskell didn't -- though we all know Haskell's `jails` feature is absurdly better in all respects, and all white-coated CompSci PhDs know it.

gautamcgoel · on May 10, 2021

Can you elaborate please? I'm not quite grasping the analogy.

tored · on May 10, 2021

Linux/PHP

  * GPL license (until PHP4)
  * evolutionary
  * has warts
  * pragmatic
  * gets shit done
  * everyone uses it
  * lots tips online (good and bad)
  * you have crazy idea, too late, it has already been done

BSD/Haskell

  * BSD license 
  * designed
  * elegant
  * by the book
  * theoretically correct
  * nobody uses it
  * read the man pages
  * you shouldn't do that

probably more, maybe something about globals vs jails/monads. Many of these things stems from evolutionary vs designed.