Paraphrased from the repo's readme: Suppose you have a program that consumes user input. Users often give bad input so the program needs to validate user input before acting on it. One way to validate is to call a function (e.g., `check_input`) on the user input and if it doesn't raise an error the input is safe for consumption by the rest of the program. The repo author considers this approach to be risky because the programmer can inadvertently omit or bypass `check_input` and the program still compiles and runs without complaint.
The repo presents an alternative validation approach, which is to parse the user input into a data type (or, not quite equivalently, into a class). The parsing process serves as validation. Consumer functions are written such that they only accept the parsed data type. Therefore it is now impossible for the programmer to inadvertently omit or bypass validation of user input.
The library is a set of convenience functions for actually writing these parsing / validation functions.
So in short: instead of representing user input (e.g. a Email address) as a string – which you can forget to validate – the idea here is to create a own data type for it, and use the validation step to create said data type.
The rest of your program then works with this data type instead of the string and this way you will get a type error whenever you accidentally use unvalidated data.
A nice idea that goes into a similar direction is to expand on this and create more types for different levels of trust. E.g. you could have the data types ValidatedEmail, VerifiedEmail and TrustedEmail and define precisely how one becomes the other. This way your typesystem will already tell you what is valid and what is not and you can't accidental mix them up.
You can also further generalize this idea by noticing you can encode all kinds of life cycle information in your type system. As you transform some data in a sequence of steps, you can use types to document and enforce the steps are always executed in order.
In this example, the user input validation step is f(String) -> ValidatedEmail, then the process of verifying it is f(ValidatedEmail) -> VerifiedEmail. But the same principle can apply to e.g. append() operation being f(List[T], T) -> NonEmptyList[T], and you can write code accepting NonEmptyList to save yourself an emptiness check. Or, take a multi-step algorithm that gets a list of users, filters them by some criterion, sorts the list, and sends these users e-mails. Type-wise, it's a flow of Users -> EligibleUsers -> SortedEligibleUsers -> ContactedEligibleUsers.
And then, why should types be singular anyway? You should be able to tag data with properties, and then filter on or transform a subtag of them. This is the area of theory I'm not familiar with yet, but I imagine you should be able to do things like:
I'm sure there's a programming language that does that, and then there's probably lots of reasons that this doesn't work in practice. I'd love to know about them, as I haven't encountered anything like it in practice, except bits and pieces of compiler code that can sometimes propagate such information "in the background", for optimization and correctness checking.
The closest I've seen is Idris, which lets you pass around data along with proofs about that data. The proofs can be generated at either runtime or compile time (the usual distinction between the kinds of things you can do at runtime vs compile time is blurred in all sorts of delightful ways in Idris).
I really like this idea. It feels like weapons-grade TypeScript and could solve an entire class of logical error. If someone has a good method they're already using to encode strict typing like this I'd love to check it out.
This site on F# used to show how to do the same in C# (Which would be more applicable to typescript?) but, the usually coding via types is best in a language designed to make that easy. AFAIK Typescript (nor C++, C#, Java) are designed for that purpose where as Haskell, F# are.
That doesn't mean you can't do it in those other languages, only that it's tedious compared to languages that are designed for it.
To keep building on this, I think the word "parsing" is just the tip of the iceberg. Parsing is one way to port data across a type boundary, where the source and dest types are optimized for different use cases (e.g. serialization vs type-safe representation). Since the semantic Venn diagrams of any two types might have areas of non-overlap, parse-don't-validate means establishing clear boundaries in your program where those translations happen, then defining the types on either side of the boundary to rule out the possibility of nonsense states elsewhere throughout the program. The idea of nonsense states is closely related and discussed more here[0] and here[1].
Yes. Also I thought of when I have created a Domain in Postgres (https://www.postgresql.org/docs/9.5/sql-createdomain.html) with a constraint written in PL/Perl to ensure that the data going in is always valid (eg. check digits in IDs, a bit like in credit cards).
The repo presents an alternative validation approach, which is to parse the user input into a data type (or, not quite equivalently, into a class). The parsing process serves as validation. Consumer functions are written such that they only accept the parsed data type. Therefore it is now impossible for the programmer to inadvertently omit or bypass validation of user input.
The library is a set of convenience functions for actually writing these parsing / validation functions.