Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good question. I am not familiar with string-to-double algorithms but maybe it's an easier problem? double-to-string is relatively complex, people even doing PhD in this area. There is also some inherent asymmetry: formatting is more common than parsing.


In implementing Rust's serde_json library, I have dealt with both string-to-double and double-to-string. Of the two, I found string-to-double was more complex.

Unlike formatting, correct parsing involves high precision arithmetic.

Example: the IEEE 754 double closest to the exact value "0.1" is 7205759403792794*2^-56, which has an exact value of A (see below). The next higher IEEE 754 double has an exact value of C (see below). Exactly halfway between these values is B=(A+C)/2.

  A=0.1000000000000000055511151231257827021181583404541015625
  B=0.100000000000000012490009027033011079765856266021728515625
  C=0.10000000000000001942890293094023945741355419158935546875
So for correctness the algorithm needs the ability to distinguish the following extremely close values, because the first is closer to A (must parse to A) whereas the second is closer to C:

  0.1000000000000000124900090270330110797658562660217285156249
  0.1000000000000000124900090270330110797658562660217285156251
The problem of "string-to-double for the special case of strings produced by a good double-to-string algorithm" might be relatively easy compared to double-to-string, but correct string-to-double for arbitrarily big inputs is harder.


I guess one aspect of it is that in really high performance fields where you're taking in lots of stringy real inputs (FIX messages coming from trading venues for example, containing prices and quantities) you would simply parse directly to a fixed point decimal format, and only accept fixed (not scientific) notation inputs. Except for trailing or leading zeros there is no normalisation to be done.

Parsing a decimal ASCII string to a decimal value already optimizes well, because you can scale each digit by it's power of 10 in parallel and just add up the result.


For those wishing to read up on this subject, an excellent starting point is this comprehensive post by one of the main contributors of the fast algorithm currently used in core:

https://old.reddit.com/r/rust/comments/omelz4/making_rust_fl...


> Unlike formatting, correct parsing involves high precision arithmetic.

Formatting also requires high precision arithmetic unless you disallow user-specified precision. That's why {fmt} still has an implementation of Dragon4 as a fallback for such silly cases.


> formatting is more common than parsing.

Is it, though? It's genuinely hard for me to tell.

There's both serialization and deserialization of data sets with, e.g., JSON including floating point numbers, implying formatting and parsing, respectively.

Source code (including unit tests etc.) with hard-coded floating point values is compiled, linted, automatically formatted again and again, implying lots of parsing.

Code I usually work with ingests a lot of floating point numbers, but whatever is calculated is seldom displayed as formatted strings and more often gets plotted on graphs.


For serialization and deserialization, when the goal is to produce strings that will be read again by a computer, I consider the use of decimal numbers as a serious mistake.

The conversion to string should produce a hexadecimal floating-point number (e.g. with the "a" or "A" printf conversion specifier of recent C library implementations), not a decimal number, so that both serialization and deserialization are trivial and they cannot introduce any errors.

Even if a human inspects the strings produced in this way, comparing numbers to see which is greater or less and examining the order of magnitude can be done as easy as with decimal numbers. Nobody will want to do exact arithmetic computations mentally with such numbers.


Think about things like logging and all the uses of printf which are not parsed back. But I agree that parsing is extremely common, just not the same level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: