Good question. The overhead is designed to be low enough for inline enforcement. For the fast, rule based checks we typically see single digit millisecond evaluation time, and in gateway mode the end to end pre check usually adds around 10 to 15 ms.
You’re right that relative to an LLM call this is usually negligible, but we still treat it seriously because policy checks also sit in front of tool calls and other non LLM operations where latency matters more. That’s why the static checks are compiled and cached and the gateway path is kept tight.
If you want more detail, I have a longer architecture walkthrough that goes into the execution path and performance model: https://youtu.be/hvJMs3oJOEc
You’re right that relative to an LLM call this is usually negligible, but we still treat it seriously because policy checks also sit in front of tool calls and other non LLM operations where latency matters more. That’s why the static checks are compiled and cached and the gateway path is kept tight.
If you want more detail, I have a longer architecture walkthrough that goes into the execution path and performance model: https://youtu.be/hvJMs3oJOEc