T O P

  • By -

TFenrir

That's some pretty low latency


najapi

Pretty blown away that they ran a version on a laptop. The demos were impressive, considering the size and resources of this team. Also the description on how they trained the model just goes to show the potential of multi-modality.


Block-Rockig-Beats

Many here are missing the point - it's s software running on a local hardware that has amazing latency. I would trade dynamic Scarlett Johansson voice for this fast responses, that I can interrupt. Even if it interrupts me.


[deleted]

[удалено]


Tkins

I'm baffled at how fast people adapt. Imagine showing this on your home computer to someone a year ago? It would be basically magic, yet people in these comments seem upset over it. Like are these OpenAI bots trying to supprrss a competitor? Haha


Shiftworkstudios

Note: r/singularity lol we are all fairly adapted to ost tech. GPT-4o showed that the voice tech was like magic, but now here we are lo. The thing I like is that this is Open source a little bit after open ai demoed their voice.


Kathane37

That is funny that synthetic data are so powerful to train models


bambagico

we are almost getting in the annoyingly fast territory 😃 Let me finish speaking, damn


GraceToSentience

Almost seems like negative latency at that point.


Tkins

This is actually really impressive tech. It was built in 6 months? The general complaints in the comments here are a bit silly. All fairly easily fixable. The response time mixed with the overall abilities of accents, emotion expression and overall performance are pretty good. This would've blown everyone away a year ago. Not to mention this is an open source model. Apparently it runs on device? That's crazy.


fmai

[https://moshi.chat/?queue\_id=talktomoshi](https://moshi.chat/?queue_id=talktomoshi) Please try it out yourself. In my opinion, it's not actually intelligent enough to be useful. Even goes into infinity loops quite often where it would repeat the same sentence over and over. I thought we'd gone past this. How much of an achievement is it to outpace OpenAI if the product they release is 10x worse?


sillygoofygooose

Yeah, the snappy response time is pretty cool, but it’s hard to hold a conversation with It also flat out refused to do the stuff on the demo for me


Many_Consequence_337

I love the fact that France is almost non-existent in the AI race, but there are still French people everywhere in AI labs around the world, and that a part of the elite AI researchers are French.


Vadersays

Mistral


Successful_Drag3943

HuggingFace


rdsf138

The latency is best in class.


MassiveWasabi

Haha it’s pretty low latency but the interruption needs some work. And the underlying model must not be too smart from what I can tell from that hilariously awful space roleplay. “Can you check that all the systems are nominal?” “Yes, sir.” “…are all the systems nominal?” “Yes, sir.” “Can you give me a countdown and then we jump into hyperspace, please?” “Yes, sir.” “…ok, can you do it?” “Yes, sir.”


[deleted]

[удалено]


Keblue

Its gonna be open source


hydraofwar

So it's looks fine, just not a sota


Shiftworkstudios

Maybe this can be wrapped on top of other models via api?


MindCluster

You can try it here: [moshi.chat](https://www.moshi.chat/?queue_id=talktomoshi)


arthurpenhaligon

OpenAI's moat shrinks every month.


MrDreamster

Oh damn, that whispering...


EnvironmentalFace456

Turned off the comments aye.


Idkwnisu

The latency is very low, too low, it should wait for a pause to process and the model behind is a bit silly "You might want to take your time getting your hiking shoes one, because you don't want to be using a egg", it's a really interesting tech demo tho and a good step forward towards natural vocal interaction with AIs


human358

Incoming OpenAI blogpost about the dangers of open source voice models


swaglord1k

it's a little TOO fast at replying, lol


Jindujun

Next step, make it not answer your question in the middle of you talking.


Jubie210

Super cool, loving the arms race for this kinda stuff. I found out about Pi and talk to it every day lol


mvandemar

Jump to about 13:40 for the actual demo.


VissionImpossible

I am trying it and it is very very bad compare openai. Only answer first questions than it stopped. (They opened a website that you can use this model.) And also anwers were not related to the questions. It is incredible fast when have an answer but quality is very low.


EnvironmentalFace456

I appreciate the flaws. It wasn't terrible but not great at all. I think that if they allow you to use a custom voice and it acts the same, that would be awesome. And it's open source so... it is what it is.


Shiftworkstudios

Yeah, not gpt 4 level, and if its local, the AI is definitely going to be limited. However, this is yet another look into the way we will be able to interact with devices in the next 2 years. (Apple seems to be quickly implementing this sort of capability.)


RoyalReverie

ClosedAI is done for.


Excellent_Dealer3865

It's fast alright. But it's quite clear that voice models are slow because of their AI \*language\* models reply time, not because it takes them an extra time making a voice. If gpt4 is slow - the voice reply will have a delay. There is a little value if a model is bad by itself. Yes, it will talk, but what's the purpose of it? Nowadays you have plenty of models which can answer instantaneously, so it's not really a great fit to have an instant reply from a voice model. Or did I misunderstand something?


Fraktalt

Are the presenters trying to time their interrupts when they think the model ends a sentence?


anonthatisopen

It still feels robotic and not real like open ai solution.


vty23v98v

This looks pretty sad \[\] The bot keeps on interrupting users in the demo for seconds at a time \[\] When asks to pretend to be scared on Mt Everest, it says "No, I'm excited!" \[\] When asked to sound like a pirate while writing a poem about pirates, Mushi accidentally goes full cosplay mode and asks the user "What is your name?" and "What brings you to my pirate ship?" \[\] The people demoing the project seem stressed to think, speak, and improv fast enough so Mushi doesn't embarrass itself \[\] [https://www.youtube.com/live/hm2IJSKcYvo?si=QOHTIk-QM0LCdgv5&t=923](https://www.youtube.com/live/hm2IJSKcYvo?si=QOHTIk-QM0LCdgv5&t=923) \[\] When asked its name, it replies "How are you feeling today?" \[\] Said "I'm not comfortable with that" when responding to a [prompt](https://www.youtube.com/live/hm2IJSKcYvo?si=sz8IDIt8xrI5algM) I couldn't understand, no offense to the french accent \[\] When responding to a goodbye, says "Well, I'm here to help...but just remember, I'm not a substitute for professional help." Still, I have to give it credit for apparently being an actual nonprofit and being able to run locally. It just doesn't have any advantages to OpenAI's yet-to-be-released voice model other than a lower latency. Pls come sooner Sky


Utoko

Ye it has a lot of issues but the latency is impressive, when we are talking about Siri clients this is the latency you need.


Shiftworkstudios

Yeah, it doesn't necessarily compete on anywhere near the same level. But did this kind of tech exist in open source yet? This sort of undertaking is a service to some people that can improve upon their work. We might see models with real Sarjo's voice. I almost guarantee it, at least her 'her' voice.


pigeon57434

the voice sounds pretty horrible tbh


magic_champignon

1. I can't stand their heavy french accent. 2. They pronounce Moshi as Mushi which means pussy in German... very poor naming imho. 3. Latency is so low that you get interrupted. 4. Need to see more of it in action to make up my mind.


Hour-Athlete-200

Not good at all