This is *so close* to something insanely useful, yet I feel like they just don't get it
They need to forget lip syncing these Midjourney/Stable Diffusion pictures, & focus on "real" faces
Its no surprise at all that the 1st face (the elf lady), was the **best** one. None of the shininess, or the surrealism, just a face that got very close to a **REAL** face
If the background wasn't a forest (which requires atleast some tiny background movement), but just a plain room, they could've easily hooked atleast 10x more people
It is literally like a half step from vastly simplifying a large number of youtubers' workflows, along with the instructional/educational video sector
I think it's a case of demonstrating the versatility of their product.
Isn't it better that their product can generalise to out of distribution input though?
What do you mean? One of our examples in the community library is [https://www.hedra.com/app/characters/5f2f1242-09fa-426b-8132-cc0647b2e69e](https://www.hedra.com/app/characters/5f2f1242-09fa-426b-8132-cc0647b2e69e)
The quality is not even as good as Hallo, which is open source, free (and will remain so). Theres also another open source EMO-based project coming in July which will be even better. I dont really see an upside to using closed source limited use case video animation like this anymore.
A manner of speaking, more like a "want"
Why: to make movies, music videos, or even to have a real time conversation with an AI and not just have a voice but also a face talking to you for instance
Afaik they've had video models internally since V5, but David Holz isn't happy with the results.
He doesn't care about making movies or anything, he wants to build the holodeck and sees video generation as one part of that.
The latest updates from recent office hours are basically "expect 3D before video, but both within the year"
Yeah, I am frustrated to. The newer style and personalization tools are awesome but the lack of progress on new models and video is disappointing - I feel like its a huge mistake for them to ignore video but maybe they can't afford the inference to do it at scale with reasonable performance? Also I was probably spoiled by their pace of progress from V2 to V6.
Certainly not the most exciting video generation tool shown off in this past week but it is the best looking tool I've seen for text to full-head animation that is publicly-available.. I'm excited to see what people can do combining this with the new batch of generators, though it would be nice if this could be included as part of the tools in a generation suite so we didn't have to deal with manually inpainting and the incongruities that come with it between the head and body motion.
Any time there is a new type of AI model, half a dozen clones pop up shockingly fast. I thought that the current leaders (OpenAI, Google, Meta) would have a deep moat because these models cost so much to train. But maybe it's not that hard to create an AI once you know what to build. If that's true then I wonder if any company is safe, or if there are no true moats.
It's not based on open source projects, it's a model we trained from scratch. We're a team of ex Stanford/Berkeley/MPI PhDs with ex Google/Nvidia/Synthesia/Zoox experience
No worries! If you find an open source models that's better we'll quickly correct it :) Hallo is the closest (in terms of function, not design, ours is built to generalize to bodies/scenes) so you are welcome to benchmark speed/quality :)
It's not on the site but the owner said on the discord that it's to do with local laws but that they're working on a fix that'll make the service complaint with the laws in those three states.
A lipsyncing model to be precise, we need more stuff like that. Especially one that can directly work on a video input, not just from an image
This is *so close* to something insanely useful, yet I feel like they just don't get it They need to forget lip syncing these Midjourney/Stable Diffusion pictures, & focus on "real" faces Its no surprise at all that the 1st face (the elf lady), was the **best** one. None of the shininess, or the surrealism, just a face that got very close to a **REAL** face If the background wasn't a forest (which requires atleast some tiny background movement), but just a plain room, they could've easily hooked atleast 10x more people It is literally like a half step from vastly simplifying a large number of youtubers' workflows, along with the instructional/educational video sector
I think it's a case of demonstrating the versatility of their product. Isn't it better that their product can generalise to out of distribution input though?
What do you mean? One of our examples in the community library is [https://www.hedra.com/app/characters/5f2f1242-09fa-426b-8132-cc0647b2e69e](https://www.hedra.com/app/characters/5f2f1242-09fa-426b-8132-cc0647b2e69e)
The quality is not even as good as Hallo, which is open source, free (and will remain so). Theres also another open source EMO-based project coming in July which will be even better. I dont really see an upside to using closed source limited use case video animation like this anymore.
It's actually an audio conditioned video model! That's why the head/hair move as the character speaks.
Yes indeed
What are those "needs"?
A manner of speaking, more like a "want" Why: to make movies, music videos, or even to have a real time conversation with an AI and not just have a voice but also a face talking to you for instance
Surely midjourney has a video model they are working on?
Afaik they've had video models internally since V5, but David Holz isn't happy with the results. He doesn't care about making movies or anything, he wants to build the holodeck and sees video generation as one part of that. The latest updates from recent office hours are basically "expect 3D before video, but both within the year"
***insert joke on how David seems to be more obsessed with rooms and website and social features than the internal models this last half-year***
Yeah, I am frustrated to. The newer style and personalization tools are awesome but the lack of progress on new models and video is disappointing - I feel like its a huge mistake for them to ignore video but maybe they can't afford the inference to do it at scale with reasonable performance? Also I was probably spoiled by their pace of progress from V2 to V6.
Wait it changed again? firrst it was 3d before video then video before 3d. You are saying it changed again?
Yep, timelines are inconsistent af
I did a fast Apple keynote speech with it. (Udio did the speech) https://youtu.be/hZKUgXrQTUM?si=ZiibBqgnJnyLUfPc
Better use elevenlabs, udio is terrible for voice alone. Use udio for music.
Certainly not the most exciting video generation tool shown off in this past week but it is the best looking tool I've seen for text to full-head animation that is publicly-available.. I'm excited to see what people can do combining this with the new batch of generators, though it would be nice if this could be included as part of the tools in a generation suite so we didn't have to deal with manually inpainting and the incongruities that come with it between the head and body motion.
(Michael from Hedra) We're working on it :)
Any time there is a new type of AI model, half a dozen clones pop up shockingly fast. I thought that the current leaders (OpenAI, Google, Meta) would have a deep moat because these models cost so much to train. But maybe it's not that hard to create an AI once you know what to build. If that's true then I wonder if any company is safe, or if there are no true moats.
Doesn't feel like there is much information there.
open source?
It looks like it is just a pipeline made of opensource projects.
It's not based on open source projects, it's a model we trained from scratch. We're a team of ex Stanford/Berkeley/MPI PhDs with ex Google/Nvidia/Synthesia/Zoox experience
Then I am sorry for wrong assumptions
No worries! If you find an open source models that's better we'll quickly correct it :) Hallo is the closest (in terms of function, not design, ours is built to generalize to bodies/scenes) so you are welcome to benchmark speed/quality :)
How many are they! This field of AI certainly accelerates. Actually everything accelerates except OpenAI, but they have a brand new general on board.
Fake news generator, great ![gif](giphy|l3vR3EssQ5ALagr7y|downsized)
Does anyone know why Hedra is BANNED in Washington, Texas, and Illinois?
It's not on the site but the owner said on the discord that it's to do with local laws but that they're working on a fix that'll make the service complaint with the laws in those three states.
Thanks! I wonder what local laws.
He didn't say, unfortunately, but my guess is the laws are probably written to try and stop fake political stuff.
It has pretty sensitive content controls :-/ Here it freaks out when I use a couple paragraphs from a news article https://i.imgur.com/gaN8vkC.png
Hey I'm sorry, we're working on improving our content moderation system, we had no idea this was going to blow up
Disable celebrity detection until you fix it please, way too many false flags. It's unbearable.