You don't need Parallel corpora for all language pairs in a "predict the next to...

goatlover · on Sept 5, 2023

How would an LLM figure out what words to translate animal sounds to? Where does it learn that information? We don't know what animals are communicating if they do have a language of sorts. There's no mapping.

famouswaffles · on Sept 5, 2023

Potentially the same way it knows how to translate concepts with no mappings in that language pair in the dataset. Like i said, not every language in an LLM's corpus has something in another language to map to.

nathanfig · on Sept 5, 2023

Spanish and French are both romance languages and will have massive token overlap. Not likely to be so lucky with whale songs.

famouswaffles · on Sept 5, 2023

It's not about romance or no romance. Same with Korean/Mandarin or any distant human language.

>Not likely to be so lucky with whale songs

Maybe. Maybe not