Speech recognition system for smart home applications

Developing natural language-based speech recognition systems faces many technical challenges, including the use of sophisticated speech recognition engines to translate what the machine hears into text—and a comprehensive natural language processor that can determine the meaning or intent of the content. Then return a meaningful response or action. These topics have been extensively studied for decades and will not be discussed too much here. This article focuses on the technical challenges that are often overlooked but equally important in far-field voice interface systems: speech pre-processing before speech reaches the speech recognition engine.

Even the most modern speech recognition engine has a basic requirement to function well—the input to the engine must be voice. While this seems to be an obvious requirement for far-field voice interface systems, it is one of the most challenging requirements. The "far field" here refers to a system in which the user's voice is separated from the product microphone by more than half a meter. For example, a smartphone near the user's face forms a "near-field" use case, but speaks to a PC or tablet with a long arm or across the room to a TV, stereo system, light switch, Thermostats or smart home controller speech ADCs are counted as "far field" use cases.

There are a number of important differences between near-field and far-field use cases that create technical challenges that are not found in near-field systems but are daunting in far-field systems.

1. Large dynamic range: In a far field system, the user's voice may be very low because he/she is several meters away from the product microphone, but the interference may be very large, such as where there is music playback in a voice controlled speaker system.

2. Low signal-to-noise ratio (SNR), low direct path to reverberation path ratio (DRR), and speech and noise in unknown directions: The speech-to-noise ratio in far-field systems is much smaller than in near-field systems. As the user keeps moving away from the product's microphone, the speech level gets smaller and the background noise level remains the same.

Similarly, the indirect path from the user's mouth to the microphone - the path of reflection from surfaces such as walls and windows along the way - may have significant power (ie, low DRR) compared to the direct path from the user to the microphone. This reverberation effect can cause significant problems when using traditional speech processing techniques and speech recognition engines.

Finally, in far-field systems, the direction of the user's speech relative to the microphone and the direction of the noise relative to the microphone are unknown. In a typical application, the noise is even in the same direction as the user's voice.

3. Full-duplex voice interaction: In many far-field systems, when a user speaks into a product, audio content, such as music, movies, or voice prompts, may be playing in the speaker of the product. A full-duplex echo canceller is required to counteract the playback output of the product while listening to the user's voice. The situation is even more complicated in systems where the echo canceller is not fully aware of the playback content.

In these cases, implementing a system that still picks up speech is a challenging task. This article will explain why traditional methods fail to provide acceptable performance under these far-field conditions, and then propose a solution that provides superior far-field performance in a cost-effective manner.

Large dynamic range

Voice capture systems for smart home devices need to support large signal dynamic range, from soft whispers to loud audio content playback. For devices that are approximately 0.5 to 3 meters away from the user, the speech level at the device microphone ranges from approximately 75 dB to 44 dB SPL. For a small volume audio playback device, the SPL level of the playback content at the device microphone may be close to 95 dB. This typical and challenging use case has a large impact on the selection of microphones and analog-to-digital converters (ADCs) in the device.

For far-field applications, it is important to choose a microphone with a high signal-to-noise ratio. As mentioned above, the SPL level of the target speech signal may be as low as 44 dB. For a 1 kHz tone of 94 dB SPL, if a microphone with a signal-to-noise ratio (SNR) of 66 dB is used, the equivalent noise floor is 28 dB SPL, then the ratio of the worst-case speech to the microphone's own noise is 16 dB. If you choose a microphone with a signal-to-noise ratio of 55dB, the ratio of voice to microphone's own noise can be as low as 5dB!

The noise floor inside the ADC is also important because if the dynamic range of the ADC in the application is not enough, it will also cause signal saturation.

Figure 1 shows the input reference noise for both ADCs, which are functions of the microphone enhancement settings. The red line shows 18-bit ADC performance with a dynamic range of approximately 96 dB, and the blue line shows 24-bit ADC performance with a dynamic range of approximately 106 dB. For reference, the gray line shows the microphone's own noise level with a signal-to-noise ratio of 66 dB and a sensitivity of -43 dBV/Pascal.

Figure 1: The noise of the microphone itself and the noise from the ADC will be added together to form the total noise floor of the system. Electronic Engineering Album

Figure 1: The noise of the microphone itself and the noise from the ADC will be added together to form the total noise floor of the system.

Figures 2 and 3 show the properties of the system when using an ADC with a 96dB dynamic range and a 106dB dynamic range, respectively. The 106dB ADC provides a lower noise floor and a higher saturation point. A reasonable setting is to use a 24dB microphone enhancement for a 96dB ADC and a 12dB microphone enhancement for a 106dB ADC. In this example, the noise floor is 2dB lower and the saturation point is 12dB higher when using a 106dB ADC. A low noise floor of 2 dB is especially important for picking up speech in far field conditions.

Figure 2: This table shows the system properties when using a 96dB ADC. Electronic Engineering Album

Figure 2: This table shows the system properties when using a 96dB ADC.

Figure 3: This table shows the system properties when using a 106dB ADC. Electronic Engineering Album

Figure 3: This table shows the system properties when using a 106dB ADC.

Taking into account factors such as peak content and resonance, the SPL level generated at the microphone due to echo may reach 96 dB or higher. Therefore, for devices with loud playback and small size, saturation problems are common when using ADCs with 96dB or lower dynamic range. When encountering these problems in real systems, the only solution is usually to further reduce the microphone's enhancement value, but doing so will raise the noise floor. In this example, the microphone enhancement value needs to be reduced to 12 dB. However, doing so will result in a noise floor that is 4.3 dB higher than the 106 dB ADC. So we know that the preferred solution for far-field products is to use a microphone with a high signal-to-noise ratio and an ADC with a dynamic range of 106dB or higher.

We make 7.4v 2000mah to 20Ah Li Ion Battery Pack for heated clothing all types, including heated glove, heated jacket, heated coat, heated vest, heated shirt, electric heated jacket, heated pants, heated boots, heated gloves, heated underwear, heated jacket liners, heated glove liners, heated work glove, heating vest, heating clothing, heated motorcycle jacket, heated hunting jacket.

most popular model is 7.4v 2 cells battery, typical capacity 2200-2400mah, 2500-2600mah, 2900-3000mah, 3200-3400mah. 

7v heated clothing battery

7v Heated Glove Battery

7V Heated Glove Battery,Heated Gloves Battery,Hand Warmers Battery,Warmest Gloves Battery

Asarke Industry Co., Limited , https://www.asarke-industry.com

Posted on