I've worked with Redshift for about 5-ish years and BigQuery for about a year. I...

dehrmann · on May 23, 2022

I briefly worked on Redshift and have used Athena/Presto and Bigquery. Redshift felt like an architectural middle ground. Presto can query almost anything, and Bigquery requires storing data in Bigquery, but like Presto, you don't have to pay for inactive compute use. Redshift's scaling story is more complicated, and paying for inactive compute wasn't ideal. It sounds like it might have improved, but you're still essentially building Bigquery at that point. There might be some use cases that need a fast, columnar store that's already online, so queries take 3s, not 10s with Bigquery.

I generally prefer Bigquery, and between it and Bigtable, I actually prefer GCP over AWS because their offerings for hard-to-do things are really good. I'd honestly pick GCP just for those two products.

contol-m · on May 23, 2022

I work on BigQuery. All of these are great points: just wanted to point out that BigQuery can federate into external data sources as well: e.g. files on cloud storage and BigTable. Relevant feature is BigLake: https://cloud.google.com/bigquery/docs/biglake-intro

rockostrich · on May 24, 2022

Are there any performance benefits of BigLake over external tables stored in Parquet governed by Hive? Or is the main benefit the governance flexibility?

contol-m · on May 24, 2022

Currently the main benefit of BigLake over the current external tables is governance: you get row and column level security over cloud storage data. The governance is uniformly enforced across BigQuery and also the BigQuery storage API. The storage API can be used by any engine and we have pre-built open source connectors available for Spark, Presto/Trino, Dataflow and Tensorflow.

We're constantly working on improving BigQuery performance over open file formats on cloud storage. Some of these features will be specific to BigLake. Please stay tuned.

beckingz · on May 23, 2022

Bigquery alone is the reason I recommend GCP.

That said, both AWS and GCP have some real rough edges.

agiamas · on May 24, 2022

IME BigQuery is one of the very few reasons to recommend GCP over AWS.

TuringNYC · on May 23, 2022

Our most shocking discovery on Redshift was that primary key constraints are not honored. (Not sure why they even have PK identifiers given this, it just adds more confusion.)

jw887c · on May 24, 2022

This is the case for most column oriented data warehouses (including BigQuery, but Snowflake does allow for 1 PK). It's just the nature of the technology.

dits59 · on May 24, 2022

Suppose you are using AWS Datapipeline, it will make use of PK to get rid of duplicates.

bigq · on May 24, 2022

I worked on BQ, it is a very poor product I have to say. Poor performance and only few customers use it. Let's don't preach BQ... After I switched to snowflake, I can tell how good snowflake is in terms of the performance.