More

evanweaver · on Nov 30, 2023

Yeah, this history is just wrong. What really happened is as so:

Early 90s: SGI invented OpenGL to make realtime 3D graphics practical, initially for CAD/CAM and other scientific/engineering pursuits, and started shipping expensive workstations with 3d accelerated graphics. Some game artists used these workstations to prerender 3d graphics for game consoles. Note that 2D CAD/CAM accelerators had already been in market for nearly a decade, as had game consoles with varying degrees of 2D acceleration.

Mid-90s: Arcades and consoles starting using SGI chips and/or chip designs to render 3d games in real time. 3DFx, founded by ex-SGI engineers, created the Voodoo accelerator to bring the technology down market to the PC for PC games, which was a rapidly growing market.

Late 90s: NVIDIA entered the already existing and growing market for OpenGL accelerators for 3D PC gaming. This was a fast-follow technical play. They competed with 3DFx on performance and won after 3DFx fell behind and made serious strategy mistakes.

Later 90s: NVIDIA created the “GPU” branding to draw attention to their addition of hardware texture and lighting support, which 3DFX didn’t have. Really this was more of an incremental improvement in gaming capability.

Early 00s: NVIDIA nearly lost their lead to ATI with the switch to the shader model and DirectX 9, and had to redesign their architecture. ATI is now part of AMD and continues to compete with NVIDIA.

Mid 00s: NVIDIA releases CUDA, which adapts shaders to general purpose computation, completing the circle in a sense and making NVIDIA GPUs more useful for scientific work like the original SGI workstations. This later enabled the crypto boom and now generative AI.

Of course, along the way, OpenGL and GPUs have been used a lot for art, including art in games, but at no point did anybody say "hey, a lot of artists are trying to make 3D art, we should make graphics hardware for artists". Graphics hardware was made to render games faster with higher fidelity.

benbreen · on Nov 30, 2023

Author here - thank you for this. I definitely don't claim to be an expert on the history of 3d graphics, and you clearly know a lot more than me about the detailed history of NVIDIA.

That said, starting in the early 1990s is missing the whole first half of the story, no? Searching Google Books with a 1980-1990 date range for things like "3d graphics" "art" or "3d graphics" "special effects" yields a lot of primary sources that indicate that creative applications were driving demand for chips and workstations that focused on graphics. For instance this is from a trade journal for TV producers in 1987: "Perhaps the greatest dilemma facing the industrial producer today is what to do about digital graphics... because special effects, 2d painting, and 3d animation all rely on basically the same kind of hardware, it should be possible to design a 'graphics computer' that can handle several different kinds of functions." [https://www.google.com/books/edition/E_ITV/0JRYAAAAYAAJ?hl=e...]

It's not hard to find more examples like this from the 1985-1989 period.

evanweaver · on Nov 30, 2023

The idea didn't spring fully formed from SGI. It was a natural extension of 2D graphics accelerators which were initially used for engineering (high value, small market) and later for business applications generally and games (lower value, large markets). 3D acceleration took the exact same path, but the utility for gaming was much higher than the general business utility.

Of course graphics hardware was also used for more creative purposes including desktop publishing, special effects for TV, and digital art, so you will find some people in those communities vaguely wishing for something better, but artistic creation, even for commercial purpose, was never the market driver of 3D acceleration. Games were. The hardware was designed for gamers first, game programmers second, game artists a distant third, and for nobody else.

The closest thing to an "art computer" around that time was the Amiga which targeted the design/audio/video production markets.

sonicanatidae · on Nov 30, 2023

Hi Ben,

It was mostly gamers. As a gamer from that time, the hardware was marketed to gamers, hard. I don't doubt that artists had an impact, but the world had many, many more gamers, than artists and gamers spend money for the best/mostest/etc.

I mainly know this from living through the CGA/EGA/VGA/SVGA/3D add-on card/3D era.

Thank you for taking the time to delve into this. While I may not agree with your conclusions, I respect your work, and the effort put in. :)

benbreen · on Nov 30, 2023

I think we agree, just define terms differently -- video games are art! In other words, gamers are consumers of artwork, and that consumer demand for a new kind of art drove demand for the hardware to go with it. (Naturally that wasn't the only source of demand - engineering and research applications were there from the beginning too).

Edit: this discussion is interesting because I have always just taken it for granted that video games are a form of art. Clearly others don't see it that way, which is fair! Nevertheless, I think a strong case can be made: https://en.wikipedia.org/wiki/Video_games_as_an_art_form

evanweaver · on Nov 30, 2023

Games are a medium for artistic expression but saying that 3D hardware was designed to improve art production, or that NVIDIA was first in market, is incorrect. The hardware was designed to improve the consumption experience of something that is a mix of programming, game mechanics (which are both math and psychology), and potentially various art forms including visual, music, and narrative. It all needs to add up to fun or it won’t find much of an audience.

Gamers aren’t primarily spending time or money for the art and neither was NVIDIA. I will grant that the hardware improvements did make the visual aspects more lifelike and detailed and that allowed for increased artistic range, but production costs generally increased accordingly.

evanweaver · on March 2, 2022

I know people for whom the traditional way of building a web app is completely foreign. I am curious how you would describe the concept and tools to someone who has never encountered them before outside an SPA architecture.

evanweaver · on Nov 4, 2021

Curious what makes a cache better to you.

DeathArrow · on Nov 4, 2021

It would be faster and I still get to keep microservices from accessing the data they don't need or shouldn't access.

But I rather use RPC for communication.

evanweaver · on Nov 4, 2021

I mean, yeah, this is why people stopped using this pattern. But these problems are getting solved, especially in Fauna:

1. Schemaless/document/schema-on-need databases like Fauna don't mandate the application breakage on every change that SQL does

2. It's hard to reason about if its not transparent, but it can be transparent now, see below

3. Fauna is a temporal database, which acts like version control on your stored procedures, so you can easily check and revert any change

4. Fauna is serverless and horizontally scalable without consistency/latency impact

5. This was definitely a problem when you were occupying precious CPU cores on a vertically scaled RDBMS with business logic, but compute in Fauna or in serverless lambdas scales horizontally indefinitely

evanweaver · on Nov 4, 2021

Stored procedures and the integration database have come back for our users in a big way. It would be great to hear examples of how others are applying this pattern with other databases and APIs.

robsutter · on Nov 4, 2021

My favorite thing about this pattern is that it allows developers to build globally distributed applications while reasoning about them like a single, centralized monolith.

A lot of complicated practices that we’ve adopted to overcome the challenges of distributed systems just disappear when you have fast, strongly consistent transactional writes on a single data store. Either a thing happened or it didn’t. No need to wait around and see.

This matters even more as applications move from single failover to multi-region to edge computing with hundreds of locations of compute. How do you get consistent state out to each PoP?

You don’t, you integrate through the database.

chrisjc · on Nov 4, 2021

Since you're the one making the claim that they're making a come back, I'd love to hear your own personal story or some of the stories you've heard.

Personally I haven't noticed anything resembling a come back, but I'm certainly using them more than ever... and I'm loving it.

evanweaver · on Sept 21, 2021

"There are three kinds of lies: lies, damned lies, and benchmarks." - Mark Twain

What other methods does the community use for measuring distributed latency?

PeterCorless · on Sept 21, 2021

"We investigate the issue of coordinate stability over time and show that coordinates drift away from their initial values with time, so that 25% of node coordinates become inaccurate by more than 33 ms after one week. However, daily re-computations make 75% of the coordinates stay within 6 ms of their initial values."

That's the intro from a 2007 paper from Google:

https://static.googleusercontent.com/media/research.google.c...

igouy · on Sept 21, 2021

After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, 'Lies--damned lies--and statistics,' still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of.

Leonard Henry Courtney, 1895

evanweaver · on July 30, 2021

GemStone/GemFire use a transactional protocol akin to Tuxedo. Open a bunch of locks, write a bunch of updates, release the locks. As per the docs (https://gemfire82.docs.pivotal.io/docs-gemfire/latest/develo...) this does not offer isolation or even atomicity, so it doesn't give you the C in CAP at all.

These are exactly the kind of "transactions" you get when you try to implement everything at the application level rather than the database level. Couchbase transactions (in the article) are the same. And it's not that different from Vitess cross-shard transactions either, which are not isolated (https://vitess.io/docs/reference/features/two-phase-commit/). Tandem SQL used the same scheme as well I believe.

Prior to Spanner, there were no production databases that offered ACID transactions across distributed, disjoint shards.

jhgb · on July 30, 2021

I'm sorry, what does Gemfire have to do with Gemstone/S? That seems like a completely different software from a different vendor.

> Open a bunch of locks, write a bunch of updates, release the locks.

That's how transactional databases using two-phase locking generally work, isn't it?

evanweaver · on July 30, 2021

I thought GemFire was directly derived from GemStone, via numerous acquisitions. If GemStone has a different transaction model I don't know it.

The point is, no distributed database with a naive two-phase lock is truly transactional.

jhgb · on July 30, 2021

Gemstone/S is a "classic" Smalltalk-based object database developed by GemTalk Systems. On the other hand, Gemfire, courtesy of VMWare, seems to be something completely different: https://gemfire.docs.pivotal.io/96/gemfire/getting_started/g...

I have no idea how you can draw similarities between the two. One of them is basically a Smalltalk VM with automatic object persistence. The other is some kind of real-time Java-based distributed key/value data platform.

evanweaver · on July 30, 2021

As far as I can tell, GemStone/S doesn't offer any server-side partitioning, clustering, or replication. GemFire was developed to scale the GemStone/S patterns horizontally.

The GemStone transaction docs describe a scheme that would work properly on a single machine, but don't discuss anything about distributed coordination across servers or failure modes. The installation instructions don't discuss setting up a cluster. The marketing docs discuss using thousands of VMs (clients) and scaling the dataset to "hundreds of gigabytes" based on disk storage instead of memory which is not what I would expect from a distributed system. Various benchmarks and user comments refer to using a single server for GemStone.

I will update the post to clarify that we are discussing distributed document databases only. It's easy to do anything you want on a single machine.

jhgb · on July 30, 2021

But of course I didn't say anything about partitioned and replicated systems, only about the options for document databases. The article of course is about "transaction models in document databases" and the observation that "document databases are very convenient for modern application development, because modern application frameworks and languages are also based on semi-structured objects", which Gemstone/S's model fits nicely since it's based on the same assumption.

evanweaver · on July 30, 2021

These are fair critiques, and we expect document databases to evolve towards schema support in the the future. GraphQL is part of this trend.

evanweaver · on July 30, 2021

If only there was some database that let you store flexibly structured documents but keep the data normalized. Perhaps you could even construct views and indexes to accelerate different access patterns.

bob1029 · on July 30, 2021

If only. Can you imagine if we also had some form of normalization so complete that it could actually manage an arbitrary number of dimensions?

Can you tell me how many times that users email address changed in that document over the last 3 years by executing a simple query? What was the email after the 2nd time it changed?

In 6th normal form, such a thing is trivial to manage. Discipline is the only price to pay for admission.

evanweaver · on July 30, 2021

Fauna is temporal, so yes. Normal forms not required.

bob1029 · on July 30, 2021

Does Fauna support multiple temporal dimensions in the same query?

evanweaver · on July 30, 2021

Can you give me an example of the kind of query you have in mind?

bob1029 · on July 30, 2021

I think your original claim is where I would like to focus my argument:

> Normal forms not required.

3NF (approximately where document databases live) struggles to decouple the time domain from individual facts. Let me give you an example.

Assume a Customer document has a LastModifiedUtc property. Does this tell you when their email specifically changed? No. It just says "something" in the customer document was modified.

Now, you could say "how about we just add a property per thing I want to track?" Ok - so we now have Customer.EmailLastModifiedUtc, Customer.NameLastModifiedUtc, etc. This is pretty good, but now assume we also need to know what the previous email addresses and names were. How do we go about this? Ok - no big deal, lets just add another column that is some JSON array or whatever. Customer.PreviousEmailAddresses. Cool, so now we know when the email was last modified AND we know what every last variant of it was.

What is missing? Oh right. What about when each of the previous email addresses actually changed? Who changed it? From what IP address? Certainly, we could nest a collection of documents within our document to keep track of all of this, but I hope the point is starting to come across that there may be some value in exploring higher forms of normalization. Imagine if I wanted to determine all of the email addresses that were modified by a specific IP address (across ALL customers), but only over the last 25 days. I feel like this is entirely out of the scope of a document database.

Don't get me wrong. 3NF is extremely powerful and handles many problem domains with total ease. But, once you start talking about historization of specific fields and rates of change, you may need to consider something higher order.

evanweaver · on July 30, 2021

This is possible in Fauna. All documents are actually collections of document versions within the configurable retention period. If you ensure that every writer decorates the document with the facets you want to search by (ip address, etc.) then you can construct indexes on those facets and query them temporally. They will return event records that show when the document entered the index (when that ip updated it) and left the index (when a different ip updated it).

Map the index additions and their timestamps onto the documents themselves and you can retrieve the entire state of each record that the ip wrote at the time that it wrote it. If you want to know specifically what that ip changed, then diff it with the previous record, for example, to filter down to updates that only changed the email address.

jimsimmons · on July 30, 2021

You can have never a system that’s capable of all types post hoc querying. If your model is general enough to handle all the things you want you won’t have a performant system as it can’t exploit any information about the problem. The only thing I can think of that’s capable of all you describe is a Write Ahead Log without any compaction.

bob1029 · on July 30, 2021

> If your model is general enough to handle all the things you want you won’t have a performant system as it can’t exploit any information about the problem

Define "performant". Denormalizing your domain model because you feel like the database might get slow is a strong case of premature optimization, unless you have actually tried to model it this way and have measured.

You will find that most modern SQL database systems have no problem querying databases with thousands or tens of thousands of tables. In fact, having narrow tables can dramatically improve your utilization of memory bandwidth since you aren't scanning over a bunch of bytes you will never use in the result set.

teitoklien · on July 30, 2021

Sounds a lot like sql databases with json extension to me

yellowapple · on July 30, 2021

I suspect that might indeed have been the joke.

evanweaver · on July 30, 2021

We haven't seen much difference in practice between frequent aborts and frequent timeouts. Both are better than deadlocks.

I meant transactions issued from a server ("cloud") client to the database, as opposed to a mobile client.