How are you dealing with the fact common crawl updates their data much less regularly than commercial search engines? And that each update is only a partial refresh?
Edit: And I will say your site design is very nice.
Thank you! We did not plan to regularly update the index.
But as it takes only 24 hours to index 1B pages, the easiest way would be to reindex everything, upload it to S3 and update the metadata so the search engine will query the right segments.
Ah I understand you're showcasing the methodology for the underlying index but you're going to open source the engine. I see, great stuff then, super novel and honestly the rest of the open source search engines can definitely use some competition. Love it!
Edit: And I will say your site design is very nice.