Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Counting an IP address as PII is kind of crappy, you need a court order to turn an IP alone into PII.

Operators should be free to log traffic at the network level, PII should only come into play once you're asking someone to provide personal information.



Lots of comments here about how IPs aren't PII b/c they can change, etc. I'm not arguing that, but consider that there is an entire _industry_ around using IPs to specifically target people, companies and households that is effective enough for businesses to write large checks to them.

Household IP Targeting - https://www.vicimediainc.com/ip-targeting-direct-mail-intern...

Or even just your ISP (who for sure know your IP addr and your address) - https://arstechnica.com/information-technology/2017/03/how-i...

The larger issue that we (HN tech people) treat IPs as fallible because we're thinking of it like an absolute. The advertising side of the Internet looks at them like a goldmine b/c even a 75% correlation to "truth" can still make their ads reach the people they're trying to reach in a much cheaper way.


The amount of information that can be found using your ip address https://clearbit.com/attributes (refer only the reveal api)


I just signed up for the trial and Reveal turns up nothing on my home IP address.


Yeah it is odd. You decided to hit my server, I should be able to record the occurance. How am I suppposed to deflect DoS attacts if I can't maintain a list of nefarious IPs. I know that's a fairly low tech attack, but they still happen constantly. Is Fail2Ban no longer compliant?

I wouldn't be surprised if some policies pertaining to record keeping in some sectors contradict that requirement as well.


Not sure about this law but that sounds completely fine under GDPR. You need to keep your log files secure and not longer than necessary for what youre doing though.

https://termsfeed.com/blog/gdpr-recitals/#Recital_49_8211_En...


You absolutely can still maintain that list under CCPA. What you can't do is sell your list of nefarious IP addresses. You could sell (or buy) the service of checking various IP addresses against a proprietary list of nefarious IP addresses.


To deflect a DoS attack you should not need the records for an extended amount of time. There is no reason why you cannot specify you are keeping records for security purposes and getting rid of them when no longer pertinent.


You can do all those things under GDPR as they are required for the running of the service


CCPA will probably be amended at least once more before it goes into effect. If you feel that it shouldn't apply to non-membership website operators who merely log IP address and requested URL... consider writing to your California State Assemblymember and California State Senator, and possibly to the California Attorney General who will be publishing guidance regarding CCPA.

Amusingly enough, California consumers will not have privacy rights regarding any written comments sent to the California Attorney General.


Could you salt and perform a one-way hash on the IP address and store that? It would alleviate a large amount of leakage issues while still giving you uniqueness counts.


I built an Nginx plugin to do something like this https://github.com/masonicboom/ipscrub


IPv4 addresses are only 32 bits, which makes building rainbow tables almost trivial.


Yeah, though a salt would at least mean you'd have to rebuild the table for each site/database/whatever. However I'm having a hard time seeing how to really protect against this.

The IP is a an identifier, so unlike password salt (where the user is the identifier) you need a way to know what the salt is to hash the IP, and it needs to be consistent.

You can do a lookup table of IP-to-salt, but this either gives away your list of addresses (if only containing IPs you've seen) or is huge (entire ipv4 range), and either way doesn't prevent rainbow tables.

You can have a static salt for the entire site, but again this is not really helping much against rainbow tables (beyond requiring recalculating the table, once).

Is there a mitigation I'm not thinking of?


You could encrypt instead of hash, and then have some policy (e.g. the decryption library/service/piece will only allow decrypting ciphertext newer than 30 days).

If you need the ability to group ciphertexts without decrypting them, you could create a scheme which will make cryptographers cringe, but could be justified in this specific case.


For large sites, you also have the risk that you might be able to say something statistically useful about the plaintext.

For instance, you can probably assert things like which IP blocks are likely to comprise most of the entries in the table or which IP blocks or addresses cannot be in the table.

That just makes me all sorts of uncomfortable.


I thought salt was supposed to be unique per hashed value. Rainbow tables don't work in that case.


No matter how complex your scheme is, if IP address is the only input, it's a (mathematical) function of f: IP → hash. Since IP(v4) space is 32bit (in practice, slightly less), if you know the function f, you can trivially enumerate all inputs.

From security point of view, if you use a fixed (unrelated to input) salt, the attacker will have a harder time to discover the function f (unless you store the salt next to your IP hashes). But from privacy point of view, in relationship between me (user) and you (service provider), you are the attacker. And you know your function f. Hashing IPv4 addresses, salt or not, gives me no privacy protection, since you can trivially reverse the hash - just due to small domain size. With IPv6, this problem will resolve itself somewhat; till then, I'd prefer if you encrypted those IPs with keys that have finite and short life time, in a way that a third party could audit if need be.


that only work if you had two pieces of information. username and password works because you can find the salt value associated with that username and then use that for the password hash. an ip would still require an unhashed thing to lookup to get the hash if you did it per ip address. for this you might be able to get away with using a sole salt value for all ip addresses but even then if you get hacked it would be trivial to write a script to compute the rainbow table when you steal the salt value.


For passwords, yes, this is generally best practice. Also, the salt is normally stored with the hashed password, as it’s not regarded as a secret.

Modern GPUs can manage several thousand million SHA256 hashes/sec, so even with a salt per hash it’s not going to take long to get a given entry, given the 32bit address space of IPv4


You can use bcrypt or argon2 to make it much slower than that.


but why?

If I am got a DoS attack or Spam, I need the IP to find out to whom I should file abuse complain.

Do we need to sanitize SMTP header too? How about shuting down DNSBL?


It's not possible to one-way hash a 32-bit IP address. A hash of a 32-bit value can always be reversed because the search space is so small.


Store only the first 16 bits of the hash maybe?


Google Analytics is supposedly GDPR compliant when they store only the first 3 octets, un-hashed.

However I'm not sure myself it makes sense. Some people will be identified by just a partial IP or even a partial hash.


Who cares if it’s trivially hackable; we’re talking about a legal checkbox that you have to tick.


A reversable hash "could reasonably be linked" with the plaintext. You can't get around the law on technicalities. Judges are not computers.


> You can't get around the law on technicalities.

Simply out of curiosity, what do you mean by that?

All my life experience and knowledge tells me it's exactly how you get around the law, unless court has its own agenda or strong bias.



I do. People treating privacy protections as "legal checkbox that you have to tick" are the reason regulations like this show up in the first place.


Aren't IP addresses used as PII by companies to track users that have profiles but aren't logged in?


I’d hope not. From the company’s perspective, there’s never any guarantee at all that an IP is going to be 1:1 to a real identity. IPs will be dynamically reassigned to new consumers constantly, and there are many situations where you’ll have many (some times very many) users sitting behind the same IP. The only situation I’ve come across where some level of PII has been retrieved from an IP are services that will be able to link an IP to a particular company’s office. I’ve seen that used in Account Based Marketing funnels where you can get information that ‘somebody at ACME Corp viewed these pages on your website’.


Trackers don't care if they're wrong some of the time. The prediction problems they're using the data to build models for are pretty noisy anyway. If using inexact identifiers improves their model, they'll get used. Many technically dynamic IPs change only rarely... I think my home Comcast IP has changed once in the last 2.5 year. So the correlation between a Comcast IP and a perfect household identifier is going to be pretty good. If you have a dataset that's got search history timestamped and labeled with IP, it's probably pretty easy to figure out the physical address that goes with the IP from map searches. Cross-reference an address to name database and now you've got a dataset with each household's (labeled by name and address, with some error) search history.


From a company’s perspective a person uses only a handful of IPs most of the time: home and work.

Combine that with cross-site tracking and phone companies selling your info...


From a companies perspective, almost all global mobile users are behind cgnat, a huge portion of homes are too, and offices have hundreds or thousands of people exiting from a single or a few public IPs.


You can probably mask off a few low order bits and still get most of the value for network management applications.


Another note... Per 1798.140(c)(1)(B), CCPA applies to a business that receives PII of =>50k consumers for the business’ commercial purposes. Which might not apply to access logs kept purely for diagnostic purposes.


A commercial purpose of ours is keeping the web site up.


Right now it maybe isn't but that could quickly change if newer protocol versions get more common.

If IP addresses were as anonymous as claimed, there would be little incentive to save them in any long time storage.


Especially considering many home connections don't even have static ips any more. Websites can't tell whether or not the IP is static or dynamic; it would be pretty silly for them to use it too.


There's been a lot of FUD surrounding the logging of IP addresses for network diagnostic and abuse purposes as a violation of GDPR (and now CCPA), but I'm not aware of any cases where that alone was sufficient to cripple a business.

Until I hear otherwise, I'm going to gamble that for now that's not the kind of reckless mishandling of personal information that regulators are trying to crack down on.


> Until I hear otherwise, I'm going to gamble that for now that's not the kind of reckless mishandling of personal information that regulators are trying to crack down on.

And you're probably right until they do otherwise.

The problem with badly-drafted laws is that they can be used to attack people who are annoying but who haven't done anything wrong... except for technically violating a law which is "supposed to" mean something else but which can be read to penalize some harmless activity the gadfly happened to engage in.

So, maybe you'll be patient when I'm not comforted by people telling me to not worry about it.


GDPR gives regulators a lot of leeway on how to crack down on things.


And that’s problematic for someone trying to understand if their business operations are legal.


Courts are not run by robots, judges are generally smart people. I agree - I think most people overthink the whole IP == PII nonsense. I think it’s more likely that IP + other factors, and your USE (or misuse) is where things become more gray.


I think the whole point of the rule of law (versus rule of authority) is to remove some of the massive ambiguity about enforcement and make the courts a bit more “robotic” and regular. You don’t want a situation where it’s luck of the draw on a judge, or where the ambiguity allows selective enforcement against people one judge or prosecutor particularly dislikes.


I agree with you ideologically. I'm not defending this law, or bad laws, or laws applied unevenly. I've been a vocal opponent of all those.

But we also have to have a certain pragmatism when deciding how to behave in a society with an impossible legal system. How much effort should I, as a developer or as a consultant to business owners or as a systems administrator, spend on purging IP addresses versus all the other things that need attention?

For that we look to how the law is applied in practice.

I was active on Slashdot back when the DMCA was first proposed and then fought its way into becoming law. There is no topic about which HN is as rancorous as Slashdot was about the DMCA. What does the situation look like now, twenty years later? Yes, there are and have been and continue to be abuses of the DMCA, but not at the internet-destroying scale that Slashdot predicted.

So I'm not going to tell you to ignore IP addresses in your log files. That's up to your judgement. But I'm going to ignore them in mine, until I see a reason to do otherwise, and when it's a topic of discussion with others, I'll tell them that according to a strict reading of the law, logged IP addresses may be a liability, but that there have been exactly 0 cases to date which have been only about some business having IP addresses in its logs for abuse and diagnostic purposes.


Agreed. On the other hand, you can't make courts fully robotic. The absurdly large size of existing laws are the consequence of trying to make them more like computer code, and having to patch countless vulnerabilities and corner cases in the process. In general, writing good laws as computer code is an AI-complete problem. That's why all laws leave some space for human judgment.


The advice we were given, and my general understanding is that you absolutely have the right to use IP addresses for network diagnostic and [anti-] abuse purposes. What you can’t do is leave those IP addresses lying around unsecured, share them with anyone who doesn’t have a legitimate requirement for access, or otherwise use them for random purposes. Also, you probably need a lifecycle policy so you don’t hang onto that data indefinitely.


The GDPR means you need a lawful basis for processing the data. Not that you can't process it at all.

There's lots of talk about consent as a basis for processing. For lots of purposes "Legitimate Interests" is likely a better basis. You'll have to perform a legitimate interests assessment and be able to justify that the potential negative impact of your processing is outweighed by the benefits.

The ICO has a interactive tool for selecting a basis for processing https://ico.org.uk/for-organisations/resources-and-support/l... with links to more information.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: