In general, I think this is a bad idea, you should start the stream at an acceptable bit rate and stick to it, the video offset is for some reason transferred to the eyes, but the offset of the audio bitrates on the fly is very noticeable and causes a pretty big shift. For video conferencing using an audio channel with mono-channel encoding with a decent bitrate, it will occupy about 1% of the data compared to video, so it is not even controlled, it just does not make sense and is a bad end user experience. A good way to check this out is to take a video with someone talking, encode sections of audio at different bit rates and splic a hole together. Pay attention to the shaking during the shift. For some reason, the human brain reacts sharply to changes in sound quality compared to changes in video quality ... perhaps because although we cannot always see everything around, we can always hear it. In any case, you spend your time much better when it counts: video! Just my $ .02
source share