Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> xVAsynth voice generation, Let's swap tips!
Lena Wolf
post Nov 29 2022, 05:32 PM
Post #1


Mouth
Group Icon
Joined: 18-May 21
From: Bravil



Let's have a whole new thread about voice generation with xVAsynth! This is for all the different games that it supports. There are differences of course, but I think there are more similarities and hopefully we can help each other.

I'll start - this is for Oblivion, a reprint of my post in Wolf Mods thread.

I had another go at voice generation in batch mode, and I now have a better idea of what's going on. I wrote it up here, if anyone is interested.

It's a huge pain at the moment to export quest dialogue. There is an xEdit script for Skyrim but not for Oblivion, and Skyrim quest records are too different - voice types are independent from race in Skyrim. I looked at that script... complicated. I know it's "only" a Pascal program, but it's yet another API to learn... mmm... Does anyone have a similar script for Oblivion? biggrin.gif


--------------------
"What is life's greatest illusion?"
"Innocence, my brother."

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Renee
post Nov 29 2022, 09:38 PM
Post #2


Councilor
Group Icon
Joined: 19-March 13
From: Ellicott City, Maryland



Hee, I'm not even sure what this is! So I Google'd.

xVASynth is an AI tool for generating high-quality voice acting lines using voices from video games. The app supports hundreds of voices, across dozens of games, and provides pitch, duration, and energy control at per-letter granularity.

So if I understand this correctly, you could take some voice files from, let's say, The Witcher, and use it in some other game?


--------------------
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Lena Wolf
post Nov 29 2022, 10:32 PM
Post #3


Mouth
Group Icon
Joined: 18-May 21
From: Bravil



Ok, sorry Renee, what you read there, it is... well... advertising. wink.gif Not technically wrong, but not quite up to the everyday reality either. biggrin.gif

What xVAsynth does, is generate voice files from text lines that you type into your quest window in a wide range of games. Certainly Morrowind, Oblivion, Skyrim, Fallout, to name a few, which is what we care about here. Here it is on Nexus.

For it to work, it requires what is called "voice models" - data files based on actual voice lines recorded by actors, such as all the different voice lines for each race in Oblivion. There is a different tool that allows you to create these models - to train them, as it is called. So if you wanted to, say, take Geralt's voice from The Witcher and make him say new lines for your mod, you would first need to train a "Geralt" model and then use it to generate your new lines.

That's too complicated for me though. smile.gif I just use models that someone else already created. Previously those models were not very good, so the newly generated voice lines sounded very mechanical. But recently both the engine and the models for Oblivion got revamped and improved, and can now generate quite acceptable voice files. I believe Skyrim voice models are even better, and that's what Ghastley is using to voice his mods.

But to start with, you need those text lines exported from your quest window into a very specific format that the voice synthesizer can read. And that's a job and a half already!

It is also possible to hand-craft each line, tune parameters until you're happy with the way it comes out. I don't have that kind of time! But Ghastley does. biggrin.gif

I think Zelazko is also using it, so that's three people already, and I figured we might want to exchange tips. Hence this thread. salute.gif


--------------------
"What is life's greatest illusion?"
"Innocence, my brother."

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Renee
post Nov 29 2022, 11:05 PM
Post #4


Councilor
Group Icon
Joined: 19-March 13
From: Ellicott City, Maryland



Wow, that sounds really neat. I'm in shock. I mean, a similar sort of technology exists. mALX has a program which types whatever she orates, for instance. But this is the first time I've heard of text-turning-into-voice for a videogame. ohmy.gif


--------------------
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Lena Wolf
post Nov 29 2022, 11:30 PM
Post #5


Mouth
Group Icon
Joined: 18-May 21
From: Bravil



This is how Morroblivion is voiced. No more walls of text, they are actually saying it! ohmy.gif Although the files are a bit old and it is all a bit mechanical, and yet I'll take it over a wall of text any day (and am using it). But these new models are so much better! Only it's a big job to get from the text lines in a quest window to the actual voice files that play in-game...

This post has been edited by Lena Wolf: Nov 29 2022, 11:31 PM


--------------------
"What is life's greatest illusion?"
"Innocence, my brother."

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Renee
post Nov 30 2022, 08:28 PM
Post #6


Councilor
Group Icon
Joined: 19-March 13
From: Ellicott City, Maryland



Again, it sounds really awesome. Sorry I can't be of any help on the subject, what a neat program, though.

Edit: is xVAsynth similar to Microsoft Sam?

This post has been edited by Renee: Nov 30 2022, 08:30 PM


--------------------
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Lena Wolf
post Nov 30 2022, 09:50 PM
Post #7


Mouth
Group Icon
Joined: 18-May 21
From: Bravil



QUOTE(Renee @ Nov 30 2022, 07:28 PM) *

Is xVAsynth similar to Microsoft Sam?

I never used MS Sam, but by the looks of it is similar technology. Except that of course Sam only talks like Sam, whereas xVAsynth allows you to choose from various models. You know, so that your Nords don't sound like your Imperials - that would be awful! wacko.gif wink.gif



--------------------
"What is life's greatest illusion?"
"Innocence, my brother."

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Lena Wolf
post Nov 30 2022, 10:34 PM
Post #8


Mouth
Group Icon
Joined: 18-May 21
From: Bravil



I spent two days straight up generating voice lines for TWMP Northern Realms - my nickname for the conglomerate of TWMP Hammerfell, High Rock (empty as it is), Skyrim, Stirk and Chain Islands. This covers the following mods:

- TWMP Hammerfell
- TWMP High Rock (nothing there yet, but let's not leave it out)
- TWMP Skyrim Improved
- TWMP Locations
- TWMP Skyrim Alive

Yesterday the whole day (!) was spent exporting dialogue from these mods. It should be possible to do it faster, but I couldn't find another way, meaning that I was sitting here at my PC all day clicking the "Export dialogue" button on per quest basis. Each quest took anywhere between 1 minute and 10 minutes to generate. Not fast enough to keep my attention, yet not long enough to be able to focus on something else in between... Infuriating.

Once that was done, the files had to be converted into the input format that xVAsynth expects and voice IDs had to be filled in for each line. That's 21,895 lines, thank you very much. wacko.gif

So this morning was spent cleaning the data. Nords and Orcs for example speak with the same voice, so once you convert the race+sex combo into a voice ID, you find a lot of duplicate lines. Delete them because the Synth is not smart enough to preprocess your data for you. Still, I was left with 5,697 lines.

Tried loading that into the Synth, it would start synthesizing, but after some 200-300 lines it would crash. Not even making a dent in it. Turns out, the default settings seem to be meant for a high-end PC, or may be just a modern PC, not a 12 year old thing like mine. Turned down the settings and enabled GPU and VRAM usage - that helped enormously. Still, I found it necessary to split up the big file per voice - trouble seems to start when the Synth tries to do the smart thing and group the data... Don't.

After that it didn't take too long to generate all files, but still it was all afternoon babysitting it. Again, can't get away and can't really focus on anything else.

The next step was lip sync generation. This is done with the CS (or at least I don't know another way to do it). Fortunately, Vorians took the heat on that topic and ShadeMe even made some changes in the CSE especially for that - which I also hijacked. biggrin.gif With the latest development build, lip sync file generation can actually be done in batch mode from the Character menu, and it works! Another couple of hours and you've got it. How many hours is it already altogether? Too many.



--------------------
"What is life's greatest illusion?"
"Innocence, my brother."

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
ghastley
post Jan 30 2023, 02:11 AM
Post #9


Councilor
Group Icon
Joined: 13-December 10



I've been using using it the hard way, one line at a time, and tweaking the results for pitch, pacing and energy. Some voices seem to generate close to what I want the first time, but others take a lot of manual input before they sound anything but flat.

I have released Forsworn Hearthfires 0.1 and Orc Hearthfires 2.1 with some voice content, and adding it to the Succubus/Diablo mod has got the scene with Azura and Nocturnal working, so that may be out soon, too. I still want to do more work on Nocturnal's lines, as she still sounds bored. Plus, Greta needs some dialogue as a follower, and the priestesses and acolytes at the monastery need more to say.

The sheer size of my Oblivion mods means I'm not going to do the same with them. I might try to do a batch job following Lena's method, but until I can actually run Oblivion on my machine with no CD drive, that can wait.


--------------------
Mods for The Elder Scrolls single-player games, and I play ESO.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Lena Wolf
post Apr 29 2023, 01:16 AM
Post #10


Mouth
Group Icon
Joined: 18-May 21
From: Bravil



I finally bit the bullet and brushed up on Pascal... Haven't used it in... well... if I tell you in how long, you'll know that I am no longer in my twenties. wink.gif

The result is a script for TES4Edit to export dialogue from mods while checking for character ID, race and sex in a sensible manner (i.e., not checking for factions since this is dynamic). Generic lines are thus output in many copies, while specific lines are restricted to the race and sex that can actually say them. Find this script on Nexus.

The output is filled in with xVAsynth model names and presented in the CSV format that xVAsynth expects. Sort order is not quite right, but you can't have everything - I trust you know how to use a spreadsheet or something.


--------------------
"What is life's greatest illusion?"
"Innocence, my brother."

User is offlineProfile CardPM
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:

 

- Lo-Fi Version Time is now: 28th March 2024 - 10:19 PM