-
Notifications
You must be signed in to change notification settings - Fork 9
Used SPARC to write Audio Anonymization Script #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
thank you @Veronika271 , I see some overlapping with the already existing code in here: https://sensein.group/senselab/senselab/audio/tasks/voice_cloning.html regarding the choice of target speaker embeddings, there are a few different options beyond averaging. For instance, you could
Did you consider any of these alternatives before deciding to use the average of the source embeddings? If yes, what are your thoughts? |
|
@fabiocat93 Thank you for the feedback! I was using the average of the source embeddings because Satra suggested it during a meeting, but I see how selecting an external target speaker would allow me to reuse existing voice-cloning capabilities in Senselab instead. I feel like deciding which type of anonymization to do should depend on the size of the dataset and what Senselab users want in their anonymized samples, from naturalness to pathology biomarkers. Still, I'm happy to rewrite my code using internal speaker anonymization and Senselab's voice cloning feature if that would be better! |
|
i did suggest using target speaker embedding average to create a new speaker. but there are some details here that should be rethought.
the ppg code calls the last bits neural editing. perhaps we can have a separate api for neural editing that offers more fine grained control of change and only supports models that do that. |
|
@satra @fabiocat93 @Veronika271 do we want to have any default target voices available through senselab so that users don't have to provide their own? This seems like it would be convenient and allow us to provide thoughtful target voice suggestions. |
This would go back to the original question, "How do you select the target voice?", which we don't have a clear answer to, yet |
@fabiocat93 assuming we did have an answer to that question, would we want to provide access to the target voice through |
If there was an ideal target voice, we could provide a pipeline for downloading the reference speaker embeddings or some of their audio samples as part of the senselab procedure. But 1) we haven't, and 2) the ideal target voice will depend on the user goal and use case (e.g., children vs adult, do we care about emotions? do we care about content in terms of words? do we care about xxx?) |
|
instead of providing data, provide access to downloading data+metadata as datasets. many ml packages do that. and that could then be used to provide targets if they wanted to. |
I used SPARC to write a script that changes the pitch of the target audio to a specific pitch from the source audio and changes the speaker embeddings of the target audio to the average speaker embeddings of the source audios.
I'm not quite sure how to test this yet, other than that I ran it on an audio sample, and it sounded reasonable.
Thank you!
Veronika