Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hui I wonder why this is not a cost trap. The S3 API request s where relative expensive.


The bandwidth is free if you are in the same region.


But you pay for requests.


A normal search experience (displaying a 20 hits search page) requires num segments * (1 + num terms * 2) + 20 GET requests.

We have 180 segments for our commoncrawl index. So we can consider a generous upper bound of 1000 requests.

The GET request costs adds $0.0004 per commoncrawl search request. Storage costs us $5 per day, so the cost of GET request starts topping storage cost at >10k request per day.

Our search engine is meant for searching large datasets, with a low number of queries: Logs, SIEM, e-discovery, exotic big data datasets, etc. These use case have typically a low daily query rate.

For high request rate, (1 query per second) like e-commerce, entirely decoupling storage and compute is actually a bad idea. For low request rate (< 1000 per day), using S3 without caring about the GET request cost is perfectly fine. And in the middle, you might probably want to use another object model with a more favorable pricing model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: