Hacker Newsnew | past | comments | ask | show | jobs | submit | more evanweaver's commentslogin

I guess most ORMs are like that. Is the object shared by reference across the entire runtime or do you end up with divergent objects?


It's only valid in the context of that function invocation. The docs say not to write impure functions or do anything with concurrency -- if you start two simultaneous transactions, the client isn't "smart" about it unfortunately.


I regularly worked from bed when I had an actual office to go to...variety is good.


I’m around to answer questions and discuss.

Anybody who has used Couchbase transactions, or sharded Mongo transactions, and can corroborate our analysis?


I like the presentation of ideas. Can you expand on use cases of Firebase’s nested document model? Seems powerful as a file system but not sure how that will play with the complexities of distributed applications.


Firebase was originally designed more as a realtime communication mechanism than an operational database. The idea was that clients would subscribe to different nodes in a data hierarchy to receive realtime notifications from other clients that were publishing to those nodes. Depending on what was in the client view, sometimes you wanted to subscribe to a leaf, sometimes to a subtree, sometimes to everything.

As these things tend to go, when there is a place to store arbitrary data, all kinds of things get shoved into it, so the mixed model in Firestore is a compromise between the original tree-of-nodes data model and a more conventional document data model.

My assumption is the Firestore-to-Spanner mapping creates subcollections as shared tables with foreign keys to the parent documents, but I don't actually know. However, that would match the mandatory 1-to-many-to-1-to-many data layout, and makes more sense than shoving all the dependent data into the document itself or creating multiple millions of SQL tables for millions of documents.


This is classic HN speak but it seems like it should be easy to build a Raft or Paxos consensus based sharded database. We have had plenty of Spanner esque implementations since it came out. Would it be too hard to take something like TiDB and modify it to support nested document model? Of course testing and bringing it up to production levels is its own feat but nonetheless I’m surprised such an interesting DB avenue gets so little attention.


As others have pointed out, measuring latency from an AWS Lambda function to a co-located single node in-memory non-durable key-value database (Redis), or to a co-located single AZ eventually consistent key-value database (DynamoDB), doesn't have anything to do with measuring client-observed latency from the browser to a globally distributed ACID-compliant document database.

A similar process co-located with a Fauna region normally can also perform simple reads in small numbers of milliseconds.

Similarly, a browser client querying a lambda function multiple times from the other side of the world will also be quite slow, even if the lambda "thinks" its queries are fast because its database is right next to it.

It is not completely clear to me what else is going wrong, but the basic premise of the benchmark is invalid, and there are other errors in regard to Fauna. For example, index serializability only affects write throughput; it has nothing to do with read latency. The status page reports write latency, not read latency, etc.

For a more even-handed comparison of Fauna to DynamoDB, see this blog post: https://fauna.com/blog/comparing-fauna-and-dynamodb-pricing-...


I think we figured out at least one issue here.

Fauna is a temporal database, not a time series database. The code in the test that updates the score on every post after every read is creating new historical events every time it does that. These have to be read and skipped over during the read query which will continually increase latency proportional to the number of updates that have occurred.

By default, Fauna retains this data and makes it queryable (with transactional guarantees) for 30 days, unlike DynamoDB or Redis. Reducing the retention period would help a bit, but event garbage collection is not immediate so there will still be differences for heavily churned documents. Normally, having a few updates to a document or an index has no noticeable impact but in this case it appears to be swamping the other factors in the latency profile.

It is possible to manually remove the previous events in the update query; doing that should reduce the latency. Nevertheless, Fauna is not a time series database so this is a bit of an anti-pattern.


I have commented this section and redeployed the code.

See the Fauna endpoint: https://71q1jyiise.execute-api.us-west-1.amazonaws.com/dev/f...

See the code: https://github.com/upstash/latency-comparison/blob/master/ne...


Did you delete all the extra events that have been created already?


If you mean histogram, yes I reset the histogram for Fauna.

If you mean deleting Faunadb internal events, I do not know how to do. Can you guide me?


AWS DynamoDB is multi-AZ and is strongly consistent (at least for a single key update).

I believe you confused it with a Dynamo DB described in a paper Amazon published long ago. AWS DynamoDB has nothing to do with a eventually consistent design the paper describes. It was purely marketing gimmick to call the new AWS Service that.



Good article, thanks.

I'm currently reading The DynamoDB Book, and even it's author acknowledges that DynamoDB is serverless more as a byproduct and shouldn't be used for that reason alone

But as a product developer all these highly integrated AWS services that can be provisioned with CloudFormation are pretty convincing.

What's Fauna's IaC story?


I will be happy if someone from Fauna team helps me to improve my code. https://github.com/upstash/latency-comparison

Upstash is not non-durable. It is based on multitier storage (memory + EBS) implementing Redis API. In a few weeks, I will add upstash premium which replicates data to multiple zones, to the benchmark app.

In the blog post, I mentioned the qualities where Fauna is stronger than the others: https://blog.upstash.com/latency-comparison#why-is-faunadb-s...


Your own docs say that by default “writes are not guaranteed to be durable even if the client receives a success response”.


Upstash has two consistency modes. Eventual consistency and Strong consistency. Please see: https://docs.upstash.com/overall/consistency

In my code, Upstash database was eventually consistent. Similarly the index in the FaunaDB was not serialized.

But both of those should not affect the latency numbers in the histogram because those numbers are all read latency.


That's an apples to oranges comparison though. Upstash couples durability with consistency/isolation. Regardless of configuration, FaunaDB and DynamoDB both always ensure durability of acknowledged writes with a fault tolerance of greater than 1 node failure. To compare them on equal footing, Upstash would need to be configured for strong consistency, at least according to the docs.


DynamoDB also guarantees your write is distributed to multiple data centers as well.


No it doesn't. The write is not durable in other datacenters when the client acknowledgement is returned. It is still possible to lose data.


It does. The three AZs they replicate across are just as good as anything else someone typically calls a "datacenter." Amazon itself operates retail out of one region and uses multi-AZ as a fault tolerance strategy.


It won’t lose data in the event of a data center failure. Each of the replicas is in a different AZ, and at least two of three have to durably write the data before the put succeeds.


An AZ is not a datacenter.


This is what I was referring to: An Availability Zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center. All AZs in an AWS Region are interconnected with high-bandwidth, low-latency networking, over fully redundant, dedicated metro fiber providing high-throughput, low-latency networking between AZs. All traffic between AZs is encrypted. The network performance is sufficient to accomplish synchronous replication between AZs. AZs make partitioning applications for high availability easy. If an application is partitioned across AZs, companies are better isolated and protected from issues such as power outages, lightning strikes, tornadoes, earthquakes, and more. AZs are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.


While your points are valid, that you're CTO of Fauna is relevant information.


not really. these are all facts, not opinions.


> For a more even-handed comparison of Fauna to DynamoDB, see this blog post: https://fauna.com/blog/comparing-fauna-and-dynamodb-pricing-...

An even handed comparison from the authors of FaunaDB?


I found that comparison to be technically objective. Did you find some bias on it?


“Writes always go through a leader replica first; reads can come from any replica in eventually-consistent mode, or the leader replica in strongly consistent mode.”

This part isn’t correct. The two follower replicas can serve a consistent read even if the leader is behind / down. And there is no guarantee the primary even has the data persisted to disk when the subsequent read call is made.


If people really want to know how DynamoDB works, this is a good tech talk: https://www.youtube.com/watch?v=yvBR71D0nAQ


Dynamo strongly consistent transactions are still limited to a single region, and are eventually consistent outside of that region. For example, in Dynamo, it is not possible to enforce uniqueness via multi-region strongly consistent transactions. Fauna can do this.

The Dynamo transactions do increase latency, but not to the same degree as Fauna. However, they are not achieving the same level of transactional correctness either, or really any correctness at all in a multi-region context.


I struggle to think of a scenario where cross-region transactions would be useful. Why not just pick a primary region where transactions will be carried out for a given piece of data if consistency guarantees are needed?


I'm a DBA, and I agree. Cross-region transactions rarely make sense because if one region is not reachable (down, network-partitioned for more than 1 minute), then you can't do writes to any region. Think about it. :)

I guess if your partition time interval was known to be very short, like a flaky ISDN link, it could make sense for some use cases using retries, but then you should just get a better link.

CockroachDB discusses a multi-city vehicle sharing use case where multi-region transactions could be worth consideration, but I'm skeptical:

https://www.cockroachlabs.com/docs/stable/multi-region-use-c...

(Developers and students get all excited about distributed systems, CAP, etc. but as a DBA, network partitions are largely not solvable from both technical and business standpoints. The solutions that do work include using vector clocks, or investing in a very reliable network, which is what Google is doing with dedicated fiber.)


You can indeed perform writes in other regions; this is the entire point of Calvin, Spanner, and other modern distributed transaction algorithms: maintaining consistency and maximizing availability in the face of partitions. Your perspective is about a decade out of date.


Yes, I know.

I was just interested if a ConsistentRead would change the latency.


It has to to acquire and release locks, so yes.


Are you sure about that? DynamoDB is paxos based, so that seems unnecessary.

Given data is always replicated to at least 2 of 3 storage nodes before ACK’ing, you can always just read from 2 different replicas and be sure you have the latest data.


Oh, I thought you were referring to the multi-key consistent transactions, not single key strong consistency mode. I think you are correct.


We get people asking about it because they are both serverless—-it’s not like there’s a lot of serverless databases on the market.

Dynamo is indeed closer and there is a comparison here: https://fauna.com/blog/comparing-fauna-and-dynamodb-pricing-...


It is primarily an architectural and operational comparison.


Fauna has no native syntax, instead it has language-specific DSLs. This helps with type safety and query composability but it does indeed lead to awkward syntactical situations when the host language and Fauna semantics don’t quite align.

The snippet in the grandparent post is JavaScript; is there an ORM or other library that has an example of an idiomatic JS or Typescript syntax that either of you would prefer?


I’ve really enjoy writing queries in RethinkDB’s [REQL language](https://rethinkdb.com/docs/introduction-to-reql/)

> ‘r.table(“users”).filter(u => u(“email”).match(“.*@gmail.com$”)).run(conn)’


Knex syntax (http://knexjs.org/) is reasonable. It's basically method chaining.


mongodb is so well integrated with Javascript and JSON. db.collection().find({}) is just so fluent.

I find all those capitalized functions in Fauna weird. In most C based languages like Javascript functions are lowercase.

Would it be possible for the Fauna JS wrapper to also accept lowercased functions?


It comes from the minimum ACU requirement per isolated cluster.

Even if scale to zero is eventually supported, some minimum capacity will still be required to avoid cold start latency.


We are in agreement, the difference in experience as you move between denormalized key/value style modeling and normalized relational modeling is the core of the post. DynamoDB has added relational-like features, but using them in a traditional relational way goes against its architectural grain.

Is it necessary that data modeling flexibility must decline as an application matures and scales, though? This was one of the larger millstones around our neck at Twitter and what we are building Fauna to avoid.


> Is it necessary that data modeling flexibility must decline as an application matures and scales, though?

Yeah, this is a valuable question, and I agree its not an obvious answer. The tricky part is that at enough scale, humans are really the bottleneck. Teams step on each others toes, abuse schemas, add data to a schema "because", etc, etc. Its possible that to best manage the humans, de-normalization and relying on replicated data stores with slightly different views of the data is simplest.

Add again if "overly flexible" is a long term product requirement, I'd argue you're going to eventually need full text search with all the power of Lucene (I'm betting its on your roadmap).

If Fauna perfectly addresses this problem domain, its likely quite helpful, but this article did not convince me it'll always be cheaper / better than DynamoDB + ElasticSearch for the complex usecases. That said, I look forward to the day I'm proved wrong :)


Easy searching is definitely in our roadmap but for people who might pass by I did want to point out that you can already get some form of text search due to the way we index arrays. You can easily write a sort of 'inverted index' yourself if you place ngrams in an array within a document. The concept of bindings makes this particularly easy. This is option 2 as explained here: https://stackoverflow.com/questions/62109035/how-to-get-docu...

We realize it's not the perfect solution and doesn't deliver the best Dev XP at this point though :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: