Back to Glossary
Latency
The delay between a caller finishing a sentence and the AI agent beginning its response. Sub-500ms is considered natural.
In voice AI, latency is the most critical metric for conversation quality. It measures the time from when a caller stops speaking to when the AI agent begins its response. Natural human conversation has response gaps of 200-400ms.
Total latency is the sum of several components: speech-to-text processing, language model inference, and text-to-speech generation. Each step adds delay.
Modern systems achieve sub-500ms latency through streaming audio processing, edge-deployed models, response pre-generation, and end-to-end speech models. At this speed, most callers cannot distinguish the AI from a human agent.