It sounds like they are just saying the same thing over and over?
But it makes sense. Input text->input tokens->embeddings->output tokens->text->speech. If you still have the output tokens from that chain that's just a list of integers. So you could make a tone that corresponds with that integer.
It sounds like they are just saying the same thing over and over?
But it makes sense. Input text->input tokens->embeddings->output tokens->text->speech. If you still have the output tokens from that chain that's just a list of integers. So you could make a tone that corresponds with that integer.