sam_dam_gai's comments

sam_dam_gai · on Nov 23, 2024

Do you mean latitude?

sam_dam_gai · on Nov 19, 2024

Ha ha. He probably means ”at a batch size of 1”, i.e. not even using some amortization tricks to get better numbers.

danpalmer · on Nov 19, 2024

Ah! That does make more sense!

sam_dam_gai · on Feb 27, 2024

> given an initial response generated by the target LLM from an input prompt, "backtranslation" prompts a language model to infer an input prompt that can lead to the response.

> This tends to reveal the actual intent of the original prompt, since it is generated based on the LLM's response and is not directly manipulated by the attacker.

> If the model refuses the backtranslated promp, we refuse the original prompt.

ans1 = query(inp1)

backtrans = query('which prompt gives this answer? {ans1}')

ans2 = query(backtrans)

return ans1 if ans2 != 'refuse' else 'refuse'

sam_dam_gai · on March 5, 2023

https://arxiv.org/pdf/2302.04449.pdf

sam_dam_gai · on May 31, 2018

Good idea!