Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chinese voice error? #3735

Closed
opentld opened this issue May 22, 2022 · 3 comments
Closed

chinese voice error? #3735

opentld opened this issue May 22, 2022 · 3 comments

Comments

@opentld
Copy link

opentld commented May 22, 2022

platform: windows10, vs2019, cuda10.2
testing english model correctly, but testing chinese model, get wrong output:
the input chinese voice is : 测试,测试,but the output is: 鍘邋邋

speechTest --model deepspeech-0.9.3-models-zh-CN.pbmm --audio 3.wav
2022-05-22 12:11:55.426120: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2022-05-22 12:11:55.436894: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-22 12:11:55.440660: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2022-05-22 12:11:55.469860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-22 12:11:55.470061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2022-05-22 12:11:55.473859: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2022-05-22 12:11:55.477597: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2022-05-22 12:11:55.479654: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2022-05-22 12:11:55.483263: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2022-05-22 12:11:55.485340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2022-05-22 12:11:55.491976: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2022-05-22 12:11:55.492126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2022-05-22 12:11:55.831924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-22 12:11:55.832025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2022-05-22 12:11:55.832240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2022-05-22 12:11:55.832449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6674 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
audio_format=6
num_channels=2
sample_rate=16000 (desired=16000)
bits_per_sample=8
res.buffer_size=291939
2022-05-22 12:11:56.096138: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
the result is: 鍘邋邋

@djmitche @Elleo @KathyReid @danielwinkler @hwine

@opentld
Copy link
Author

opentld commented May 22, 2022

I record .m4a file from windows 'Voice Recorder' in Chinese, and then convert it to .wav:
audio_format=6
num_channels=2
sample_rate=16000 (desired=16000)
bits_per_sample=8
res.buffer_size=291939
it seems that the codes run correctly:

fseek(wave, 40, SEEK_SET); rv = fread(&res.buffer_size, 4, 1, wave);
fprintf(stderr, "res.buffer_size=%ld\n", res.buffer_size);

fseek(wave, 44, SEEK_SET);
res.buffer = (char*)malloc(sizeof(char) * res.buffer_size);
rv = fread(res.buffer, sizeof(char), res.buffer_size, wave);

res.buffer_size is 291939, means that voice file has been read.

BUT when I convert the .m4a flie to .wav, using mono channel:
audio_format=1
num_channels=1
sample_rate=16000 (desired=16000)
bits_per_sample=16
res.buffer_size=62
2022-05-22 13:15:58.042539: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
the result is: ?

the res.buffer_size=62, it looks like the voice file was not read correctly.

why????

@JRMeyer
Copy link
Contributor

JRMeyer commented Jun 13, 2022

FYI -- #3693

@ftyers
Copy link
Collaborator

ftyers commented Jul 26, 2022

DeepSpeech is unmaintained, please see #3693.

@ftyers ftyers closed this as completed Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants