Hacker Newsnew | past | comments | ask | show | jobs | submit | sam_dam_gai's commentslogin

Do you mean latitude?


Ha ha. He probably means ”at a batch size of 1”, i.e. not even using some amortization tricks to get better numbers.


Ah! That does make more sense!


> given an initial response generated by the target LLM from an input prompt, "backtranslation" prompts a language model to infer an input prompt that can lead to the response.

> This tends to reveal the actual intent of the original prompt, since it is generated based on the LLM's response and is not directly manipulated by the attacker.

> If the model refuses the backtranslated promp, we refuse the original prompt.

ans1 = query(inp1)

backtrans = query('which prompt gives this answer? {ans1}')

ans2 = query(backtrans)

return ans1 if ans2 != 'refuse' else 'refuse'



Good idea!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: