The complexity issues seem to happen with every networking protocol that I've seen "grow up", even those designed explicitly for simplicity, like TFTP. The fabled "xmodem" protocol is a great example, starting as a ridiculously naive call and response, sprouting some error correction (xmodem CRC), then getting improvements, morphing into "ymodem" and "zmodem". Is this a modality for software in general, or just for "feral" software, where the spec or source code escapes into the wild, then lots of people "port" it, "improve" it or otherwise tinker with it, and there's some kind of fitness function that determines survival?
A big problem is that text-based protocols are hard. People think they're "simple" but they're not — it's a lie. Text is one of the hardest things to get right, but Eurocentrism (read: ASCII) leads many to write bad parsers.
For starters, there's typically no strict delineations. For HTTP, people see "ends in a new line" and forget to consume the CR or even send it — because CR is a "Windowsism" or whatever. Then people need to modify their software to accept buggy transmissions, and it snowballs.
Take HTML, for example. It's a mess of hacks to parse it[0] because programs 20 years ago took shortcuts. Or they prefered to show something to the user instead of failing (remember XHTML?), so they massage the input to work. We even have an <image> tag that is an alias for <img>.[1] Those shortcuts make bad content "work" accidentally, so people start depending on them.
Or INI files. A nice key-value structure delimited by line endings. Except now we need sections, so we have `[x]` lines. And don't forget the LF/CR-LF problem when splitting on the line endings! And now people want arrays, so we bolt them on with TOML and the funky `[[x]]` syntax.
Text-based parsers are decievingly hard, but programmers don't want to admit it. They're easy to read, sure, but parser-mismatch vulnerabilities[2] will come back to bite you eventually.
That's not to say "binary" formats are easy — just that they have a rigid structure that tends to blow up on failure instead of silently succeeding.
Xmodem and TFTP are pretty close to archetypical binary protocols. They have suffered the same problems with sloppy implementations gaining prominence, then everyone has to account for the sloppiness forever after.
IMO, the problem those had is that they're too simple. XMODEM in particular was basically "send part number, then 128-byte blob". Not to mention the CP/M-ism of <EOT>. It has no room for extensibility, and its usage is very restrictive, so everyone shoves their ideas into it. Including replacing <EOT> with something else.
The complexity issues seem to happen with every networking protocol that I've seen "grow up", even those designed explicitly for simplicity, like TFTP. The fabled "xmodem" protocol is a great example, starting as a ridiculously naive call and response, sprouting some error correction (xmodem CRC), then getting improvements, morphing into "ymodem" and "zmodem". Is this a modality for software in general, or just for "feral" software, where the spec or source code escapes into the wild, then lots of people "port" it, "improve" it or otherwise tinker with it, and there's some kind of fitness function that determines survival?