It's easy to achieve high precision when you just define a hit very broadly. See also the political flap about teachers factually describing the existence of alternative sexuality as 'grooming'.
If you want to see the lie behind any of these child protection surveillance initiatives when they talk about things like 90% precision or millions of hits per year ask them how many of those detection of vile child predators resulted in an arrest warrant -- not even an actual arrest, or an actual conviction, but just an attempt.
The answers is extraordinarily few and that tells you everything that you need to know.
I don't mean in the algorithm itself, I mean in your evaluation of the algorithm, where you also don't do so equivalently for recall (or don't report on recall).
Evaluate your algorithm thusly: If it made a hit, it's a grooming true positive unless its extraordinary undeniably a false positive. Absent any ground truth data you just don't evaluate recall, of if you have any test data it's only a false-negative if it's undeniably abuse. Benefit of doubt always goes to the algorithm. All hail the algorithm. All hail.
If you want to see the lie behind any of these child protection surveillance initiatives when they talk about things like 90% precision or millions of hits per year ask them how many of those detection of vile child predators resulted in an arrest warrant -- not even an actual arrest, or an actual conviction, but just an attempt.
The answers is extraordinarily few and that tells you everything that you need to know.