Used SPARC to write Audio Anonymization Script #319

Veronika271 · 2025-04-25T20:17:01Z

I used SPARC to write a script that changes the pitch of the target audio to a specific pitch from the source audio and changes the speaker embeddings of the target audio to the average speaker embeddings of the source audios.
I'm not quite sure how to test this yet, other than that I ran it on an audio sample, and it sounded reasonable.
Thank you!
Veronika

fabiocat93 · 2025-04-29T12:49:50Z

thank you @Veronika271 , I see some overlapping with the already existing code in here: https://sensein.group/senselab/senselab/audio/tasks/voice_cloning.html

regarding the choice of target speaker embeddings, there are a few different options beyond averaging. For instance, you could

select an external target speaker,
swap identities within the same dataset (internal speaker conversion),
synthesize a new voice entirely.

Did you consider any of these alternatives before deciding to use the average of the source embeddings? If yes, what are your thoughts?

Veronika271 · 2025-05-01T19:42:42Z

@fabiocat93 Thank you for the feedback! I was using the average of the source embeddings because Satra suggested it during a meeting, but I see how selecting an external target speaker would allow me to reuse existing voice-cloning capabilities in Senselab instead. I feel like deciding which type of anonymization to do should depend on the size of the dataset and what Senselab users want in their anonymized samples, from naturalness to pathology biomarkers. Still, I'm happy to rewrite my code using internal speaker anonymization and Senselab's voice cloning feature if that would be better!

satra · 2025-05-02T00:57:22Z

i did suggest using target speaker embedding average to create a new speaker. but there are some details here that should be rethought.

the code fabio pointed to is sufficient for an initial pass at voice cloning using sparc, so i would close this PR from that perspective. @Veronika271 - in the future it may be helpful to check through the code to see if something is already there. my bad in not checking either.
it would be nice to consider an API that allows for different types of voice cloning:

pairwise (currently implemented in the API)
average speaker embedding as a way of creating a new speaker. this could come from multiple targets, or from the source itself.
pitch shifts or other changes such as temporal alterations (not clear which models are capable of doing this besides sparc and ppg).

the ppg code calls the last bits neural editing. perhaps we can have a separate api for neural editing that offers more fine grained control of change and only supports models that do that.

ibevers · 2025-06-10T15:02:03Z

@satra @fabiocat93 @Veronika271 do we want to have any default target voices available through senselab so that users don't have to provide their own? This seems like it would be convenient and allow us to provide thoughtful target voice suggestions.

fabiocat93 · 2025-06-10T15:07:47Z

@satra @fabiocat93 @Veronika271 do we want to have any default target voices available through senselab so that users don't have to provide their own? This seems like it would be convenient and allow us to provide thoughtful target voice suggestions.

This would go back to the original question, "How do you select the target voice?", which we don't have a clear answer to, yet

ibevers · 2025-06-10T15:13:57Z

@satra @fabiocat93 @Veronika271 do we want to have any default target voices available through senselab so that users don't have to provide their own? This seems like it would be convenient and allow us to provide thoughtful target voice suggestions.

This would go back to the original question, "How do you select the target voice?", which we don't have a clear answer to, yet

@fabiocat93 assuming we did have an answer to that question, would we want to provide access to the target voice through senselab?

fabiocat93 · 2025-06-10T15:53:20Z

@satra @fabiocat93 @Veronika271 do we want to have any default target voices available through senselab so that users don't have to provide their own? This seems like it would be convenient and allow us to provide thoughtful target voice suggestions.

This would go back to the original question, "How do you select the target voice?", which we don't have a clear answer to, yet

@fabiocat93 assuming we did have an answer to that question, would we want to provide access to the target voice through senselab?

If there was an ideal target voice, we could provide a pipeline for downloading the reference speaker embeddings or some of their audio samples as part of the senselab procedure. But 1) we haven't, and 2) the ideal target voice will depend on the user goal and use case (e.g., children vs adult, do we care about emotions? do we care about content in terms of words? do we care about xxx?)

satra · 2025-06-10T21:46:53Z

instead of providing data, provide access to downloading data+metadata as datasets. many ml packages do that. and that could then be used to provide targets if they wanted to.

attempted to create audio anonymization script

f369efb

Merge branch 'main' into main

ae63807

ibevers assigned Veronika271 Jun 10, 2025

fabiocat93 marked this pull request as draft August 29, 2025 14:37

ibevers added the help wanted Extra attention is needed label Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Used SPARC to write Audio Anonymization Script #319

Used SPARC to write Audio Anonymization Script #319

Uh oh!

Veronika271 commented Apr 25, 2025

Uh oh!

fabiocat93 commented Apr 29, 2025

Uh oh!

Veronika271 commented May 1, 2025

Uh oh!

satra commented May 2, 2025

Uh oh!

ibevers commented Jun 10, 2025

Uh oh!

fabiocat93 commented Jun 10, 2025

Uh oh!

ibevers commented Jun 10, 2025

Uh oh!

fabiocat93 commented Jun 10, 2025

Uh oh!

satra commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Used SPARC to write Audio Anonymization Script #319

Are you sure you want to change the base?

Used SPARC to write Audio Anonymization Script #319

Uh oh!

Conversation

Veronika271 commented Apr 25, 2025

Uh oh!

fabiocat93 commented Apr 29, 2025

Uh oh!

Veronika271 commented May 1, 2025

Uh oh!

satra commented May 2, 2025

Uh oh!

ibevers commented Jun 10, 2025

Uh oh!

fabiocat93 commented Jun 10, 2025

Uh oh!

ibevers commented Jun 10, 2025

Uh oh!

fabiocat93 commented Jun 10, 2025

Uh oh!

satra commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants