I've noticed that Alexa doesn't interrupt itself when it says "Alexa," but it does respond when someone else says it. How does it achieve this? Here are a few questions I have:
Self-Recognition: How does Alexa distinguish between its own voice and a user's voice saying "Alexa"?
Voice Characteristics: What specific features (e.g., pitch, tone) does Alexa analyze to recognize its own TTS voice?
Algorithms and Models: What machine learning models or algorithms are used to handle this task effectively?
Implementation: Are there any open-source libraries or best practices for developing a similar functionality?
Any insights or resources would be greatly appreciated. Thanks!
Source: worked on 3rd party Alexa speakers