PromptPex is an LLM-based tool to automatically generate and evaluate unit tests for a given AI model prompt. PromptPex extracts input and output specifications from the prompt and uses them to generate diverse, targeted, and valid unit tests. These tests are valuable in debugging and improving the prompt as well as understanding how the prompt performs when interpreted by different AI models.
For prompt injection attacks which are context-sensitive, we have developed a DSL (SPML) for capturing the context and then we use the same to detect conflict with the originally defined system bot / chat bot specification. Having restricted the domain of attacks helps in finer grain control and better efficiency in detecting prompt injections. We also hypothesize that since our approach works only by looking for conflicts in the attempted overrides, it is resilient to different attack techniques. It only depends on the intent to attack.
https://news.ycombinator.com/item?id=39522245
SPML, a meta language designed for writing system prompts, includes high-level language features such as support for user-defined types and comments. These features make system prompts easier to develop and more maintainable compared to those written in natural language.
The SPML compiler processes an SPML system prompt, performing type checking before converting it into SPML-IR. SPML-IR facilitates various types of analysis and transformations, similar to other compiler intermediate representations. Finally, the SPML-IR is lowered into a natural language system prompt.
Prompt injection attacks represent a significant challenge for LLM-based systems, such as chatbots. Several techniques are in place to proactively detect these attacks, including classifying the input prompt as either safe or unsafe, or determining whether the prompt violates the system's guidelines. However, merely classifying input prompts does not take into account the context in which the chatbot operates, and identifying violations can be complex for LLMs. We propose a technique that uses a meta language and the compiling-parsing approach to detect prompt injection attacks. This technique utilizes a meta language, SPML (System Prompt Meta Language), allowing for detection independent of the attack method used. It focuses solely on identifying conflicts with system prompts, ensuring a robust defense against prompt injection attacks.