I've worked with Redshift for about 5-ish years and BigQuery for about a year. IMO BigQuery wins hands down.
From an operational perspective we've had almost 0 issues with BQ, whereas with Redshift we had to constantly keep giving it TLC. Right from creating users /schemas to WLM tuning, structuring files as Parquet for Spectrum access, understanding why and how spectrum performs in different scenarios, etc. everything was a chore. All this redshift specific specialization I was learning was not really contributing to the product in a meaningful way.
Switched to Bq, a year ago, and it's been mostly self driving. the only thing we had to tend to was the slots and bit of a learning curve for the org about partitioning keys (there is a setting in BQ that fails your query IF partition key is not specified)
Having switched to BQ it's really hard for me to imagine going back to Redshift. It almost feels antiquated.
I briefly worked on Redshift and have used Athena/Presto and Bigquery. Redshift felt like an architectural middle ground. Presto can query almost anything, and Bigquery requires storing data in Bigquery, but like Presto, you don't have to pay for inactive compute use. Redshift's scaling story is more complicated, and paying for inactive compute wasn't ideal. It sounds like it might have improved, but you're still essentially building Bigquery at that point. There might be some use cases that need a fast, columnar store that's already online, so queries take 3s, not 10s with Bigquery.
I generally prefer Bigquery, and between it and Bigtable, I actually prefer GCP over AWS because their offerings for hard-to-do things are really good. I'd honestly pick GCP just for those two products.
I work on BigQuery. All of these are great points: just wanted to point out that BigQuery can federate into external data sources as well: e.g. files on cloud storage and BigTable. Relevant feature is BigLake: https://cloud.google.com/bigquery/docs/biglake-intro
Are there any performance benefits of BigLake over external tables stored in Parquet governed by Hive? Or is the main benefit the governance flexibility?
Currently the main benefit of BigLake over the current external tables is governance: you get row and column level security over cloud storage data. The governance is uniformly enforced across BigQuery and also the BigQuery storage API. The storage API can be used by any engine and we have pre-built open source connectors available for Spark, Presto/Trino, Dataflow and Tensorflow.
We're constantly working on improving BigQuery performance over open file formats on cloud storage. Some of these features will be specific to BigLake. Please stay tuned.
Our most shocking discovery on Redshift was that primary key constraints are not honored. (Not sure why they even have PK identifiers given this, it just adds more confusion.)
This is the case for most column oriented data warehouses (including BigQuery, but Snowflake does allow for 1 PK). It's just the nature of the technology.
I worked on BQ, it is a very poor product I have to say. Poor performance and only few customers use it. Let's don't preach BQ... After I switched to snowflake, I can tell how good snowflake is in terms of the performance.
From an operational perspective we've had almost 0 issues with BQ, whereas with Redshift we had to constantly keep giving it TLC. Right from creating users /schemas to WLM tuning, structuring files as Parquet for Spectrum access, understanding why and how spectrum performs in different scenarios, etc. everything was a chore. All this redshift specific specialization I was learning was not really contributing to the product in a meaningful way.
Switched to Bq, a year ago, and it's been mostly self driving. the only thing we had to tend to was the slots and bit of a learning curve for the org about partitioning keys (there is a setting in BQ that fails your query IF partition key is not specified)
Having switched to BQ it's really hard for me to imagine going back to Redshift. It almost feels antiquated.