Submissions from transluce.org

		Automatically Jailbreaking Frontier Language Models with Investigator Agents (transluce.org)
		2 points by simonpure 4 months ago \| past
		Automatically Jailbreaking Frontier Language Models with Investigator Agents (transluce.org)
		2 points by piotrgrabowski 4 months ago \| past
		Investigating truthfulness in a pre-release o3 model (transluce.org)
		4 points by boleary-gl 8 months ago \| past
		Investigating truthfulness in a pre-release o3 model (transluce.org)
		4 points by Luc 8 months ago \| past
		Investigating truthfulness in a pre-release o3 model (transluce.org)
		5 points by Philpax 8 months ago \| past \| 1 comment
		Docent: A system for analyzing and intervening on agent behavior (transluce.org)
		4 points by brimtown 9 months ago \| past
		Releasing AI-driven tools for understanding AI systems (transluce.org)
		1 point by EvgeniyZh on Oct 24, 2024 \| past
		Monitor: An AI-Driven Observability Interface (transluce.org)
		3 points by brimtown on Oct 23, 2024 \| past \| 1 comment
		Show HN: Debugging LLM Failures Like "9.11 > 9.9" via Interpretability (transluce.org)
		8 points by vvvhuang on Oct 23, 2024 \| past \| 1 comment
		Scaling Automatic Neuron Description (Describing Every Neuron in Llama 3) (transluce.org)
		8 points by ekzhang on Oct 23, 2024 \| past \| 1 comment