Hacker Newsnew | past | comments | ask | show | jobs | submit | tallytarik's commentslogin

Yes this is clearly verbatim output from an LLM.

But it's perfect HN bait, really. The title is spicy enough that folks will comment without reading the article (more so than usual), and so it survives a bit longer before being flagged as slop.


Is HN guidelines to flag AI content? I am unsure of how flagging for this is supposed to work on HN and have only ever used the flag feature for obvious spam or scams.

It might be wrong, but I have started flagging this shit daily. Garbage articles that waste my time as a person who comes on here to find good articles.

I understand that reading the title and probably skimming the article makes it a good jumping off point for a comment thread. I do like the HN comments but I don't want it to be just some forum of curious tech folks, I want it to be a place I find interesting content too.


I agree. It seems this is kind of a shelling point right now on HN and there isn't a clear guideline yet. I think your usage of flagging makes sense. Thanks

It’s entirely written by an LLM.

LOL, I was having an online chat with a friend the other day and commented I sound like an LLM.

Are you sure?

Ask an LLM to write such an article and you'll have exactly this.

- random bold emphasis that would make disney and marvell comics blush

- overuse of "one paragraph then bullet points"

- a lot of the bullet lists has a small bold prefix then one line for no good reason

- every section has a "why it matters"

- and then each section with an useless comparison table that are direct screenshots of ChatGPT/Gemini

I would not mind if the author indeed used his alleged insights in the domain, but as other have noted, numbers are way off, and are what CSPs want you to believe to sell more instances in Kubernetes. This does not inspires "I proofread the LLM output".

It's a shame because the article has some good advices, but also a lot of misled ideas that would make Grug scratch their head. No, you don't need Redis to have stateless applications. Having a load-balancing tier is as useful for resiliency than it is for scaling. Autoscaling is a trap. If you can afford it, start with the app and DB separated. Let your application perform connection pooling itself from day one, your framework knows more than PgBouncer how connections can be safely reused.

Overall, at a high level, the article is good and is a good outline on the order in which to optimize (sharding is dead last), but the details don't meet expectations.


Yes. The over-use of bold in the intro (hell, in the first sentence) is a good hint.

All of it aligns with https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

Maybe the images were made by hand.


Great post and a great little tool. Some of my experience using these techniques in production:

1. Trilateration mostly doesn't work with internet routing, unlike GPS. Other commenters have covered this in more detail. So the approach described here - to take the closest single measurement - is often the best you can do without prior data. This means you need a crazy high distribution of nodes across cities to get useful data at scale. We run our own servers and also sponsor Globalping and use RIPE Atlas for some measurements (I work for a geo data provider), yet even with thousands of available probes, we can only accurately infer latency-based location for IPs very close to those probes.

2. As such, latency/traceroute measurements are most useful for verifying existing location data. That means for the vast majority of IP space, we rely on having something to compare against.

3. Traceroute hops are good; the caveat being that you're geolocating a router. RIPE IPmap already locates most public routers with good precision.

4. Overall these techniques work quite well for infrastructure and server IP addresses but less so for eyeball networks.

https://ping.sx is also a nice comparison tool


agree but...

https://youtu.be/_iAffzWxexA

20 minutes talk at DEFCON


Working on improving the data pipeline for https://iplocate.io - an IP intelligence service I've worked on since 2017. A couple of recent focuses:

1. VPN and proxy detection. We already track dozens of providers, but we can do better here. There's also a bunch of metadata we collect as part of this process which we don't currently surface, so I'm looking at what else we can bring to our databases and free API.

2. Better detail and evidence on how we build and test our own geolocation database, which we create from scratch. There's been a recent trend of misinformation about geo accuracy, including from some other providers, so I want to better explain the accuracy (and inaccuracy) of various techniques, our policy for when we prefer certain data, and so on.

(Open to partnerships for any folks looking for a new provider!)


There are plenty of VPN and proxy detection services, either as a service (API) or downloadable database, which are surprisingly comprehensive. Disclaimer: I’ve run one since 2017. Years on, our primary data source is literally holding dozens of subscriptions to every commercial provider we can find, and enumerating the exit node IP addresses they use.

There are also other methods, like using zmap/zgrab to probe for servers that respond to VPN software handshakes, which can in theory be run against the entire IP space. (this also highlights non-commercial VPNs which are not generally the target of our detection, so we use this sparingly)

It will never cover every VPN or proxy in existence, but it gets pretty close.


> Years on, our primary data source is literally holding dozens of subscriptions to every commercial provider we can find, and enumerating the exit node IP addresses they use.

Assuming your VPN identification service operates commercially, I trust that you are in full compliance with all contractual agreements and Terms of Service for the services you utilize. Many of these agreements specifically prohibit commercial use, which could encompass the harvesting of exit node IP addresses and the subsequent sale of such information.


TOS are pretty meaningless in cases like this. It amounts to getting rejected as a customer and your account canceled.


I think ToS violations can also run afoul of CFAA.


Those are pretty old cases that I think the courts have moved away from and even in those cases it was a TOS violation and explicit c&d that the company ignored.


I don't think they can any longer, I think there is case law on this.

Illinois law makes it a misdemeanor to violate web site ToS, though. And felony for the second time IIRC. Other states probably also.


Maybe the tables could be turned and we can build a service with dozens of subscriptions to every VPN detection service and report them for ToS violations ;)


> I trust that you are in full compliance with all contractual agreements and Terms of Service

Why? It's not like there's any real moral (or, likely, legal) reason to care beyond avoiding the service's ban hammer.


In Illinois you could, in theory, be jailed for up to three years for violating a web site ToS. (classified as "Computer Tampering")


I don't think that would hold up in court anymore.


It's a statutory offense, so you could get lucky and the prosecutor wouldn't prosecute it, but it's there for them to use:

https://www.ilga.gov/Documents/legislation/ilcs/documents/07...

... "the owner authorizes patrons, customers, or guests to access the computer network and the person accessing the computer network is an authorized patron, customer, or guest and complies with all terms or conditions for use of the computer network that are imposed by the owner;"


There's a little secret that most of the business world knows but individuals do not know: You don't have to follow Terms of Service. In most cases, the maximum penalty the company can impose for a ToS violation is a termination of your account. And it's not illegal to make a new account. They can legally ban you from making a new account, and you can legally evade the ban.

Unless you're the one-in-a-million unlucky user who gets prosecuted under the CFAA's very generic "unauthorized access to a protected computer" clause, like Aaron Swartz. It seems the general consensus is this doesn't apply to breaking a website ToS, and Aaron was only in so much trouble because he broke into a network closet, as well as for copyright violation. But consult a lawyer if unsure. (That's another difference: A business will ask a lawyer if it wants to do something shady, while an individual will simply avoid doing it)


Tangent: if you hold access to all VPN providers, have you thought about also releasing benchmarks for them? I would be interested in knowing which ones offer the best bandwidth / peering (ping).


> which are surprisingly comprehensive

How does the buyer even know what the precision and recall rates might be?


Probably contrary to the stealth aspect.


This will also cause problems with anyone that happens to (even accidentally/unknowingly) use apps that integrate services from companies such as BrightData/Luminati/HolaVPN/etc. where they sell idle time on your device/connection to their VPN/proxy customers.

The legitimate end-user will then no longer be able to use e.g. SoundCloud.


I fail to see the problem if people that allow their internet connection used by scammers/AI crawlers are banned from every service


I’m with you on this one. Some of my projects are flooded with sus traffic from Brazil. I don’t believe there are a million eager Brazilian hackers targeting me in particular. It’s pretty clear from analysis that they’re all residential hosts running proxies, knowingly or otherwise.

The more concise word for this is “botnet”. Computers participating in one should be quarantined until they stop.


> unknowingly

Often times random shovelware apps will have these proxy SDKs embedded in them, and the only mention of it being part of the software is buried in some long ToS that nobody reads.


Sort of valid today.

But the more sites that require a residential VPN for normal use, the less legitimate that argument becomes.


You might want to learn how internets work today: https://en.wikipedia.org/wiki/Network_address_translation


Interesting. I assumed all VPNs switched to IPv6 by now, making detection much harder.


IPv6 isn't magically unrouteable, it just routes much larger blocks of "end IP addresses."

You just track and block /24 or /16 as necessary.


Much of the internet still does not support IPv6, so most providers will give you an IPv4 address. In fact only a few providers even support IPv6 at all.

Even with IPv6 it's not a huge problem. With a few samples we can know that a provider is operating in a given /64 or /48 or even /32 space, and can assign a confidence level that the range is used for VPNs.


Many websites including Soundcloud are still only accessible through IPv4, so this is moot, even if VPNs support IPv6 it's enough to block their V4 exit nodes for Soundcloud.


just out of curiosity: if i'm located in spain and i setup an ec2 or digital ocean instance in germany and use it as a socks proxy over ssh, do you will detect me?


It is even easier to block hosting providers. They typically publish official lists. Here's the full list for both of those providers:

https://ip-ranges.amazonaws.com/ip-ranges.json

https://digitalocean.com/geo/google.csv

(And even if they don't publish them, you can just look up the ranges owned by any autonomous network with the appropriate registry.)


It won’t end up in our proxy detection database, but we track hosting provider ranges separately: https://www.iplocate.io/data/hosting-providers/


That's a hosting service IP block. Some sites block them already. Netflix for instance.


who's buying your service ?


Sounds like snitching as a service


Most of these providers are in fact open about the fact that these locations are “virtual”, so it’s misleading to say they don’t match where they claim to be.

There is however an interesting question about how VPNs should be considered from a geolocation perspective.

Should they record where the exit server is located, or the country claimed by the VPN (even if this is a “virtual” location)? In my view there is useful information in where the user wanted to be located in the latter case, which you lose if you only ever report the location of servers.

(disclaimer: I run a competing service. we currently provide the VPN reported locations because the majority of our customers expect it to work that way, as well as clearly flagging them as VPNs)


Yeah, Proton is quite explicit about that: https://protonvpn.com/support/how-smart-routing-works


I work for IPinfo, and I appreciate your comment.

Our product philosophy is centered on accuracy and reliability. We intentionally diverge from the broader IP geolocation industry's trust-based model. Instead of relying primarily on "aggregation and echo", we focus on evidence-backed geolocation.

Like others in the industry, we do ingest self-reported IP geolocation data, and we do that well. Given our scale and reputation, we receive a significant volume of feedback and guidance from network operators worldwide. We actively conduct outreach, and exchange ideas with ISPs, IXPs, and ASNs. We attend NOG events, participate in research conferences, and collaborate with academia. We have a community and launch hackathon events, which allow us to talk to all the stakeholders involved.

Where we differ is in who our core users are. Our primary user base operates at a critical scale, where compromises on data accuracy are simply not acceptable. For these users, IP geolocation cannot be a trust-based model. It must be backed by verifiable data and evidence.

We believe the broader internet ecosystem benefits from this approach. That belief is reflected in our decision to provide free data downloads, a free API with unlimited requests, and active collaboration with multiple platforms to make our data widely accessible. Our free datasets are licensed under CC-BY-SA 4.0, without an EULA, which makes integration, even for commercial use straightforward.

I appreciate you recognizing that our product philosophy is different. We are intentionally trying to differentiate ourselves from the industry at large, and it is encouraging to see competing services acknowledge that they are focused on a different model.


If we can pay them in virtual dollars, no problem


Working on improving the data pipeline for https://iplocate.io - an IP intelligence service I've worked on since 2017.

Recent focus has been on geolocation accuracy, and in particular being able to share more data about why we say a resource is in a certain place.

Lots of folks seem to be interested in this data, and there's very little out there. Most other industry players don't talk about their methodology, and those that do aren't overly honest about how X or Y strategy actually leads to a given prediction, or the realistic scale or inaccuracies of a given strategy, and so on. So this is an area I'm very interested in at the moment and I'm confident we can do better in. And it's overall a fascinating data challenge!


Our government has been paying Deloitte & co. to produce slop for years before AI was being used to generate said slop.

Can we get a refund for all of the others too?


ISPs have no obligation, although the ubiquity of sites and apps relying on IP geolocation mean that ISPs are incentivized to provide correct info these days.

I run a geolocation service, and over the years we've seen more and more ISPs providing official geofeeds. The majority of medium-large ISPs in the US now provide a geofeed, for example. But there's still an ongoing problem in geofeeds being up-to-date, and users being assigned to a correct 'pool' etc.

Mobile IPs are similar but are still certainly the most difficult (relative lack of geofeeds or other accurate data across providers)


Mobile IPs reflect the user's "registered area" at best, not their actual location.

This is mostly because of how APNs / G-GNS / P-GW systems work. E.G. you may have an APN that puts you straight in a corporate network, and the mobile network needs you to keep using that APN when roaming. This is why your roaming IP is usually in the country you're from, not the one you're currently in.

I've heard of local breakout being possible, but never actually seen it in practice.


I thought this was going to be an analysis of articles that are clearly AI-generated.

I feel like that’s an increasing ratio of top posts, and they’re usually an instant skip for me. Would be interested in some data to see if that’s true.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: