-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improper Handling of Guillemets (»«) cause hallucinations #153
Comments
no, it's not the double quotes making hallucinations, it's more the double quotes and the whole empty space + new lines making troubles... |
could you provide the real original text so I can check if it's space, tabs or else.... |
I edited my original post and added the text file. And I made another video from the text I uploaded on another issue. german_hallucinations_2.mp4If you want only the text visible in the video here it is:
No, this type of quotation mark is pretty common in German, if not standard, at least for novels. I went through my ebooks and havent found one where these quotation marks aren't being used. And just as a heads up I even found one where they are reversed:
edit:
and thought you were asking about the use of this type of quotation mark in general but to give a more precise answer: In German these quotation mark are almost exclusively used with their tips pointing inward like this (»«} instead of («»} although I have found one ebook where they are being used like this («»}. (but that is definetly the exception). |
ok I'm going to make some tests to see if my patch works |
Tell me if the result of the audio from german_hallucinations_2.txt is ok for you |
No, the hallucinations are still there unfortunately :( |
but is it better or exactly the same? |
about the same. The hallucinations are in the same places |
ok, I'm working on it. |
this one seems better no? |
FYI, the first word is "unknown" which is normal since there is no title. |
it's a little bit better, yes! If you look at my video the first two hallucinations and the last one are gone, the other two are still there. Maybe there are still three hallucinations. The fourth pink mark with which I highlighted the hallucinations in the video could be one long hallucination but there seem to be two separate hallucinations for both the end of the sentence and the beginning of the next one. |
ok keep in mind that the default voice used is from English. so be aware that glitches can occur. |
the "strange words" are definitely too long to be small glitches though. I am pretty sure these are still proper hallucinations. Where I put HERE there are still hallucinations:
|
the converted text is this: is it still buggy? |
I wonder if it does not come from the fact there is no space between the guillemets and the word.... |
I am going to try this tomorrow, don't know why I didn't think of trying it with these "" earlier |
you don't need to try, eb2ab script is doing it. all chars making hallucinations or glitches on various languages are converted with standard ascii chars. |
this one seems to be better after some new punctuation replacements. |
They are a bit shorter but definitely still there. And one other thing, don't know if it's relevant but in German these quotation marks (" ") are a little bit different as well. The mark at the beginning of a sentence is placed at the bottom and the one at the end is placed at the top.
If you push the update I could also do some more testing, if you need help :) |
could you tell me if the last conversion is better? |
You mean this audio ?
are still there but the "strange words" are a bit shorter. But that could probably also be a coincidence. If you want you could try a longer text with more quotation marks: |
I have also experienced this and solved it by adding a search and replace to calibre and replacing («|») with ". After that there were no further hallucinations. As you convert the uploaded files into .txt anyway (if I am ot mistaken), maybe the easiest solution would be to just do a replacement on those? |
@ChristopherS you should read the whole issue rather than comment without to know what it was said..... try to read before to write.... |
Super sorry... Just missed the last few comments. |
as you read above, I fixed it already many times... also the hallucinations comes from the voice you choose, its quality etc... but also which TTS engine, fairseq and xtts-v2 are radically different.. |
Maybe you could share your trained voice with the devs @ChristopherS? They have mentioned before they would like to add german voices. Of course only if the voice you chose is okay to share and if you and the devs are open to it :) |
I believe this type of quotaton mark (»«) seems to trigger hallucinations somewhat consistently.
At least in German but probably other languages as well.
This type of quotation mark (»«), known as Guillemets or Chevrons, is not commonly used in English, at least ChatGPT told me so lol, so maybe the code doesnt account for them?
Here is an example video, the red lines represent hallucinations.
hallucinations.mp4
Here is the text file:
german_hallucinations_1.txt
The text was updated successfully, but these errors were encountered: