Hui I wonder why this is not a cost trap. The S3 API request s where relative ex...

fulmicoton · on May 7, 2021

The bandwidth is free if you are in the same region.

djdjdjdjdj · on May 7, 2021

But you pay for requests.

fulmicoton · on May 12, 2021

A normal search experience (displaying a 20 hits search page) requires num segments * (1 + num terms * 2) + 20 GET requests.

We have 180 segments for our commoncrawl index. So we can consider a generous upper bound of 1000 requests.

The GET request costs adds $0.0004 per commoncrawl search request. Storage costs us $5 per day, so the cost of GET request starts topping storage cost at >10k request per day.

Our search engine is meant for searching large datasets, with a low number of queries: Logs, SIEM, e-discovery, exotic big data datasets, etc. These use case have typically a low daily query rate.

For high request rate, (1 query per second) like e-commerce, entirely decoupling storage and compute is actually a bad idea. For low request rate (< 1000 per day), using S3 without caring about the GET request cost is perfectly fine. And in the middle, you might probably want to use another object model with a more favorable pricing model.