Hacker Newsnew | past | comments | ask | show | jobs | submit | thecatspaw's commentslogin

personally I just open up these videos in private mode


also on a webpage you would be able to filter it by available tools/knowledge


Or not. Maybe you'd get "infinite scroll". Or some other modern webdev bullshit that's strictly inferior to PDF (or plain HTML) and CTRL+F.

I'm actually surprised by the anti-PDF sentiment here (in general case, not necessarily this book). Modern web is so bad, that almost every day I end up on some page that would be strictly better if it were a PDF. So, to play devil's advocate, PDFs are cool because:

- The links may rot, but they remain, and so does surrounding content. Once you get a PDF, no one can take it away from you.

- It's self-contained. It can easily be transferred between devices and read without an Internet connection.

- It's a file. Yes, it's important to mention because in 2024, files cannot be taken for granted.

- Rich format without spurious dynamics and other web nonsense. Sure, PDFs technically can run arbitrary JavaScript, but hardly any reader supports that.

- Can't track you or spy on you (theoretically it could, in practice, see previous point).

I could come up with a few more. Point being, you could do worse, and modern web quite often is worse.

As for what could be an even better format, my mind is drawn towards CHM[0]. You know, like the help files in old Windows software. A self-contained file built of interlinked HTML pages, complete with index and internal search/xref. Kind of a better EPUB[1].

(Ironically, marketers should actually love PDF - total control over presentation is exactly what they've been trying to gain on the web all these years.)

--

[0] - https://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help

[1] - Despite being a decade older.


I guess the sentiment is because

a) it cannot be automatically reflowed

b) it’s a complete mess of a file format with tech deb spanning 30 years

c) the linked ressources info

Everything else is fine with it and apparently lotta folks write plenty of software to read it.

Perhaps epubs (and .mobi etc) come close but they are not universally adopted


I do agree, especially on

- It's self-contained. It can easily be transferred between devices and read without an Internet connection. - It's a file. Yes, it's important to mention because in 2024, files cannot be taken for granted.

(Btw it’s a sad state of affairs that we can less and less own our files)

But I do think in the post here, the issue is that the PDF in itself is pretty useless : it doesn’t contain any information and only links to … websites.

(Btw I wouldn’t criticize this a lot because I’m admirative of the indexation work, it’s pretty cool !)


you might be interested in This Old Tony


to fix the interrupt issue they could initially load a page with begnign information, and then load the help text afterwards


can you expand on what honour based abuse means?


https://en.wikipedia.org/wiki/Honor_killing

> An honor killing (American English), honour killing (Commonwealth English), or shame killing is a traditional form of murder in which a person is killed by or at the behest of members of their family or their partner, due to culturally sanctioned beliefs that such homicides are necessary as retribution for the perceived dishonoring of the family by the victim.

> Methods of murdering include stoning, stabbing, beating, burning, beheading, hanging, throat slashing, lethal acid attacks, shooting, and strangulation. Sometimes, communities perform murders in public to warn others in the community of the possible consequences of engaging in what is seen as illicit behavior

> Often, minor girls and boys are selected by the family to act as the murderers, so that the murderer may benefit from the most favorable legal outcome. Boys and sometimes women in the family are often asked to closely control and monitor the behavior of their siblings or other members of the family, to ensure that they do not do anything to tarnish the 'honor' and 'reputation' of the family

> Sharif Kanaana, professor of anthropology at Birzeit University, says that honor killing is: "A complicated issue that cuts deep into the history of Islamic society. .. What the men of the family, clan, or tribe seek control of in a patrilineal society is reproductive power. Women for the tribe were considered a factory for making men. Honor killing is not a means to control sexual power or behavior. What's behind it is the issue of fertility or reproductive power."

> Nighat Taufeeq of the women's resource center Shirkatgah in Lahore, Pakistan says: "It is an unholy alliance that works against women: the killers take pride in what they have done, the tribal leaders condone the act and protect the killers and the police connive the cover-up." The lawyer and human rights activist Hina Jilani says, "The right to life of women in Pakistan is conditional on their obeying social norms and traditions."

> Fareena Alam, editor of a Muslim magazine, writes that honor killings which arise in Western cultures such as Britain are a tactic for immigrant families to cope with the alienating consequences of urbanization. Alam argues that immigrants remain close to the home culture and their relatives because it provides a safety net. She writes that 'In villages "back home", a man's sphere of control was broader, with a large support system. In our cities full of strangers, there is virtually no control over who one's family members sit, talk or work with.'

Hopefully that expands on it. A rotten culture of "family values" that sees women as nothing more than baby factories and keeps them under control at all times, through intimidation, persecution, monitoring, and straight up state-sanctioned killing and blaming of the victim if they try to assert themselves.


I think the idea is that you can tell people "hey, if you're suffering from abuse, you can check a websites footer for this icon to get help"


This has probably helped so many people.... In the imaginations of other people


Are there any statistics, or frankly any reason, to expect this to have helped anybody?

I'm not trying to be dismissive, but I genuinely can't imagine this helping anyone. I am completely open to being wrong though.


I don’t know if you’re a kiwi but I assume that what it does is more common knowledge there.

I do think there is real value in being able to report domestic abuse without leaving an obvious paper trail. Call logs can be accessed by the account owner, I imagine a lot of people don’t know how to clear their history or aren’t confident enough they can do it correctly, etc.

It’s some small amount of peace of mind for victims who file reports, and with a cost of “adding a fake social link and a devs salary for a month” I’m pretty okay with it even if it literally only helps a single person. Bandwidth well spent.


Thanks for giving it a second chance. I read all of it, and it was very interesting indeed


what does gpt say how we should validate email addresses?


Prompt:

'I'm writing a nodejs javascript application and I need a regex to validate emails in my server. Can you write a regex that will safely and efficiently match emails?'

GPT4 / Gemini Advanced / Claude 3 Sonnet

GPT4: `const emailRegex = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;` Full answser: https://justpaste.it/cg4cl

Gemini Advanced: `const emailRegex = /^[a-zA-Z0-9.!#$%&'+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)$/;` Full answer: https://justpaste.it/589a5

Claude 3: `const emailRegex = /^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$/;` Full answer: https://justpaste.it/82r2v


Whereas email more or less lasts forever (mailbox contents), and has to be backwards compatible with older versions back to (at least) RFC 821/822, or those before. It also allows almost any character (when escaped at 821 level) in the host or domain part (domain names allow any byte value).

So a Internet email address match pattern has to be: "..*@..*", anything else can reject otherwise valid addresses.

That however does not account for earlier source routed addresses, not the old style UUCP bang paths. However those can probably be ignored for newly generated email.

I regularly use an email address with a "+" in the host part. When I used qmail, I often used addresses like: "foo-a/b-bar-tat@DOMAIN". Mainly for auto filtering received messages from mailing lists.


Still doesn't support internationalized domain names.


Terrible answers as far as I can tell, especially Chat got would throw out many valid email addresses.


chatgpt-4:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

https://chat.openai.com/share/696f7046-7f43-4331-b12b-538566...

chatgpt-3.5:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

https://chat.openai.com/share/aaa09ae8-3fd9-4df7-a417-948436...


…which both excludes addresses allowed by the RFC and includes addresses disallowed by the RFC. (For example, the RFC disallows two consecutive dots in the local-part.)


I take the descriptivist approach to email validation, rather than the prescriptivist.

I know an email has to have a domain name after the @ so I know where to send it.

I also know it has to have something before the @ so the domain’s email server knows how to handle it.

But do I care if the email server is supports sub addresses, characters outside of the commonly supported range (eg quotation marks and spaces), or even characters which aren’t part of the RFC? I do not.

If the user gives me that email, I’ll trust them. Worst case they won’t receive the verification email and will need to double check it. But it’s a lot better than those websites who try to tell me my email is invalid because their regex is too picky.


The HTML email regex validation [1] is probably the best rule to use for validating an email address in most user applications. It prohibits IP address domain literals (which the emailcore people have basically said is of limited utility [2]), and quoted strings in the localpart. Its biggest fault is allowing multiple dots to appear next to each other, which is a lot of faff to put in a regex when you already have to individually spell out every special character in atext.

[1] https://html.spec.whatwg.org/multipage/input.html#email-stat...

[2] https://datatracker.ietf.org/doc/draft-ietf-emailcore-as/


I generally agree, but the two consecutive dots (or leading/trailing dots) are an example that would very likely be a typo and that you wouldn’t particularly want to send. Similar for unbalanced quotes, angle brackets, and other grammar elements.


I wonder whether simply (regex) replacing a sequence of .'s with a single one as part of a post-processing step would be effective.


That would be bad form, IMO. The user may have typed john..kennedy@example.com by mistake instead of john.f.kennedy@example.com, and now you’ll be sending their email to john.kennedy@example.com. Similar for leading or trailing dots. You can’t just decide what a user probably meant, when they type in something invalid.


Yeah, that's about as far as I've ever been comfortable going in terms of validating email addresses too: some stuff followed by "@" followed by more stuff.

Though I guess adding a check for invalid dot patterns might be worthwhile.


What is maybe more important to note, it completely disallows the language of some 4/5 of the humanity. And partially disallows some 2/3 of the rest.


Actually pretty good response if the programmer bothers to read all of it

I'd be more emphatic that you shouldn't rely on regexes to validate emails and that this should only be used as an "in the form validation" first step to warn of user input error, but the gist is there

> This regex is *practical for most applications* (??), striking a balance between complexity and adherence to the standard. It allows for basic validation but does not fully enforce the specifications of RFC 5322, which are much more intricate and challenging to implement in a single regex pattern.

^ ("challenging"? Didn't I see that emails validation requires at least a grammar and not just a regex?)

> For example, it doesn't account for quoted strings (which can include spaces) in the local part, nor does it fully validate all possible TLDs. Implementing a regex that fully complies with the RFC specifications is impractical due to their complexity and the flexibility allowed in the specifications.

> For applications requiring strict compliance, it's often recommended to use a library or built-in function for email validation provided by the programming language or framework you're using, as these are more likely to handle the nuances and edge cases correctly. Additionally, the ultimate test of an email address's validity is sending a confirmation email to it.


Not good at all, but a little better than expected. I use + in email addresses prominently and there are so many websites who don't even allow that...


Remember to first punycode the domain part of an email address before trying to validate it, or it will not work with internationalized domain names.


Support for IDN email addresses is still patchy at best. Many systems can’t send to them; many email hosts still can’t handle being configured for them.


There really ought to be a regex repository of common use cases like these so we don't have to reinvent the wheel or dig up a random codebase that we hope is correct to copy from every time.


You can still ask their hosting company to take down the content for example, or try to pay them off, or other things.

Or since it was in the US make up some bogus claim and take them to court and hope they cannot afford court proceedings and fold.


I just use the preview side-by-side with my document


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: