This paragraph justifies the decision, IMHO. > Collecting audio data for thousan...

westurner · on May 23, 2023

While AFAIU the UN: UDHR United Nations Universal Declaration of Human Rights is the most-translated document in the world, relative to the given training texts there likely hasn't been as much subjective translation analysis of UDHR.

Awesome-legal-nlp links to benchmarks like LexGLUE and FairLex but not yet LegalBench; in re: AI alignment and ethics / regional law https://github.com/maastrichtlawtech/awesome-legal-nlp#bench...

A "who hath done it" exercise:

[For each of these things, tell me whether God, Others, or You did it:] https://twitter.com/westurner/status/1641842843973976082?

"Did God do this?"

sangnoir · on May 23, 2023

The UN's UDHR likely yields <5mins of audio data in any language which is kinda useless as far as training data goes.

westurner · on May 23, 2023

Compared to e.g. religious text translation, I don't know how much subjective analysis there is for UDHR translations. It's pretty cut and dry: e.g. "Equal Protection of Equal Rights" is pretty clear.

"About the Universal Declaration of Human Rights Translation Project" https://www.ohchr.org/en/human-rights/universal-declaration/... :

> At present, there are 555 different translations available in HTML and/or PDF format.

E.g. Buddhist scriptures are also multiply translated; probably with more coverage in East Asian languages.

Thomas Jefferson, who wrote the US Declaration of Independence, had read into Transcendental Buddhism and FWIU is thus significantly responsible for the religious (and nonreligious) freedom We appreciate in the United States today.

omneity · on May 22, 2023

Of course, my critique is less "This project shouldn't exist", but rather "It seems to me there are several biases that affect the performance of this project in the context it was presented in".

This is a great project and an important stepping stone in a multilingual AI future.