Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fantastic. I was actually working on something like this myself. I was planning to use an LLM as a fallback for recipes that don't contain properly formatted recipe data.

Curious as to how you get around some of the anti-scraping measures like Cloudflare. I put in a recipe blog (https://www.maangchi.com) that usually blocks me with Cloudflare but your site was able to scrape it just fine.

Edit: also as a very minor point your counter on how many recipes have been imported seems to keep going up each time I try to visit the same recipe. It says I've converted 5 but I've just tried to visit the same recipe 5 times.





Hello, I am using different datacenter IPs first. If all fail to crawl, I have a Raspberry Pi in my house that crawls using my residential IP. ;)

My home IP has not been blocked yet since I regularly do human-like operations from it. Hehe.

But on a serious note, you can try services like Bright Data or Apify that have a ton of residential IPs. So if you see a cloudflare block page, just rotate the proxy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: