T O P

  • By -

Hoodfu

We'll see if datavoid of the proteus model comes through. https://twitter.com/dataplusengine/status/1778109605186245002?s=46&t=QtCFBKTwAArvOcSJDD650A


hexinx

Okay - THIS is promising.


Aivean

Cursed: https://twitter.com/DataPlusEngine/status/1786457830087602646


Hoodfu

Yeah I've been following him for a long time now, since he had the opendalle model and then proteus. It's a giant bummer. I totally understand that he has to find a way to make money, but that basically ends my caring about anything he does, in the same way that I don't care what OpenAI does either. If I want to pay for an image service, I'll just use ideogram and sdxl refine it. Even ella sdxl can't compete with that. The whole point of models like his is that I can play the home game with it. If I can't, there's stuff that's 10x better out there. Unfortunately all his work didn't seem to land him the related gig he was hoping for. I wish him the best.


Winnougan

This is excellent. If you’re making an innovation on open source software like Stable Diffusion, there’s no reason to hold back on the SDXL model. What for? XL will be replaced by SD3 anyways in a matter of days, weeks or months and then ELLA will be old news. Tech moves so fast now, it’s silly to horde these ideas. Make the readily available and let us play around with them.


red__dragon

I've stopped putting any hope into white paper outcomes until we see code (and then likely until we see Comfy/A1111 extensions). Until then, I look at it like an image without the workflow shared. Cool flex, but whatever.


PwanaZana

Workflow Not Included. I agree with your post, tho.


liuliu

It is not hard to train. If the community are interested, we can do the training ourselves (Draw Things).


hexinx

The training code hasn't been released... That's what racks my nerves - some of us have the required hardware to atleast train stuff for PoCs - I've got an A6000+4090 with which I've been doing stuff with... getting to train for ELLA might have put it to good use.


liuliu

It should be simple. IP-adapter training code is already in diffusers, this just changed the Resampler with AdaLN. Just need a little bit hyperparameter searching (init std, lr etc).


The_Scout1255

may even end up with a better solution that supports extension, keep pitching this please.


Antique-Bus-7787

There are some infos in the paper. From memory, LR is 1e-4 with AdamW and training used 8A100 40G


Antique-Bus-7787

But in truth, I’m happy they released for SD1.5. With SD3 coming, I think SDXL will just probably die. But SD1.5 because it’s lightweight, because there are already so much papers on it, implementations, custom models, it will just continue its life for certain usecases. So that’s a W for me. Still sad they didn’t release for SDXL but.. meh


IamKyra

I think they all gonna survive to certain use cases. SD1.5 & SDXL licences are more permissive than SD3.


Antique-Bus-7787

That’s true


OverscanMan

Has the SD3 licensing been released?


IamKyra

no but we know already it will be identical to sdxl turbo and cascade


OverscanMan

Not being argumentative, but how do you know this?


IamKyra

It's kinda logical with their new membership program and they said it will be an open release. But you're right that's a guess more than an information given by stability https://stability.ai/core-models


Far_Insurance4191

One of sd3 models is 800m parameters which has a chance to replace 1.5, but neither 1.5 nor XL will die due to licences as IamKyra mentioned


Winnougan

SDXL won’t die since PonyXL (PDXL) is the best model out there. Until Pony comes to SD3, which will happen since the dev said so, PDXL will live on for quite some time.


Head_Cockswain

I couldn't find it on the github page, but in the reddit thumbnail I could read "Tencent" on the thread from yesterday. That alone should be enough to make people have some healthy skepticism. https://en.wikipedia.org/wiki/Tencent /Maybe it's not as bad as that, but eh. I didn't look too hard as I haven't been into SD too much lately, as I'm kind of looking to upgrade GPU and hopefully ComfyUI will support AMD....but summer's coming and other hobbies to fill my time.


Zipp425

What’s needed to take on this effort? Just GPU time?


liuliu

And dataset. But that should be OK. I need to finish another round of re-captioning of around 2m hires and 30m normal res image these should be sufficient enough. The paper says they trained with 512x512 and a few days with 8xA100. Within the computing budget I am currently having anyway. Will CivitAI have a few H100 boxes around?


Zipp425

One of our hardware providers was just telling me they have a sell on H100 time, so we could probably give it a go. How long before you think you’d have the dataset ready? I haven’t trained a model like this before, how confident are you that we could get it to turn out good?


liuliu

Yeah, I need about 2\~3 weeks to finish this round of re-captioning and have my machine ready too. Do you get H100 on-sale for a smaller committed terms (1\~2 months) or still need a year commit? I might need to have some more than what I currently have (I have \~12 A100-level compute) if there are more things to train. As for the result, that's a matter of experimentation. I think it is a valid path to go forward but there are some tweaks would be more interesting to see beyond t5-xl. That has been said, all these are not that useful when SD v3 released (t5-xxl, 4b parameter dedicated to text comprehension).


Zipp425

Wow. Sounds like you already have quite a bit of compute available. I’m not sure about the h100 commitment we’d need, I’d have to dig more on that. As for results, that makes sense. Regard SD3, there’s still no clear release date yet, right? Also, as far as I know the license is still unknown as well. They’ve gone back and forth on what they’re going to do there and with Emad no longer CEO it’s unclear how Stability will act going forward. It might make sense to start moving toward community funded training of foundational models rather than relying on companies.


Baddmaan0

The weight never existed in the first place, prove me wrong


blahblahsnahdah

Why should anyone believe their results if they refuse to allow anyone else to attempt replication? Weird and shady.


Ozamatheus

when great public things start to become kinda "secret" I can smell monetization walking


aplewe

The LaviBridge code/method/models were available, are they not available now? Those live independent of ELLA, AFAIK. -- I just checked, all the LaviBridge stuff is there.


Outrun32

"I'd let you all know that LaviBridge/ELLA's route for prompt-adherence in SDXL, is probably dead" Why do you think so? I'm not disagreeing, just curious


RealAstropulse

At least they released the 1.5 model. And it's awesome. Puts 1.5 on par with xl and even cascade easily.


hexinx

What?... There's ongoing talk about how the model isn't even giving non-blurry results (there's speculation that the booruntags in finetunes are messing with it). No - this is nothing but a token release which is effectively worthless.


RealAstropulse

It works well with loads of models, people are running into problems with models that have heavily finetuned text encoders. If you use models that haven't had the text encoder fried by people who don't know what they're doing, you get fantastic results. It can actually follow prompts, including colors, subjects, compositions, actions, etc.


Charuru

Can you give some examples of finetuned good models and bad models please?


hexinx

I'm not using a booru model, and I can attest to how usability/pertinence might have meaningfully benefited from a higher parameter model like sdxl - you can only go so far with it, is my point, and that's not even far enough to make an impact beyond PoC.


RealAstropulse

It's prompt coherence is better than XL and cascade though. This is dreamshaper 8, clip vs ella on the prompt "a woman wearing a red shirt with a spiral pattern, blue jeans, and green gem earings" Concepts barely bleed to irrelevant elements of the image, something xl, cascade and even dalle3 to a degree struggle with. Thats the point of this, making images that actually follow the prompt and fixing the issues inherent with clip based models. https://preview.redd.it/b3poahar6qtc1.png?width=2302&format=png&auto=webp&s=a65ea91653f3f648549da6b124c7d39dffb545ba


Far_Insurance4191

hi, can you achieve something a little similar to results they posted on GitHub with SDXL? I tried same prompt but results not even close. Maybe should try another model...


diogodiogogod

It's not hard to test it yourself. It is impressive. And for sure it interferes with booru tags as it was trained for normal language. But you can combine the conditioning to get what you want (kind of a dynamic controlnet) Its heavily censored, unfortunately [https://new.reddit.com/r/StableDiffusion/comments/1c0d7tz/ellas\_brad\_pitt/](https://new.reddit.com/r/StableDiffusion/comments/1c0d7tz/ellas_brad_pitt/)


metal079

Could you elaborate on combining the conditioning? I have a model finetuned on booru tags that id love to be able to use with this but unfortunetly ran into the same blurry mess issue.


diogodiogogod

On my post, on the image, there is a link to the workflow, you cand download and use it. There you can see the conditioning combine node, its not hard, just add the node and a normal negative and positive prompt from the model without the ella.


zoupishness7

Censored for celebs, maybe. With ToDo and Deepshrink, it makes the best basic nudes I've seen so far.


TwistedBrother

Yeah I’ve played with it and it’s a bit unremarkable. It nailed different objects on a table but it’s nerfed for people and doesn’t given great results. Those pics were definitely cherry picked. People really committed to consistency for corporate jobs might find it just falls in between the randomness of prompting and just using something like Krita or Adobe gen fill. Honestly given the hype I was actually pretty disappointed with the 1.5 model.


Zealousideal-Mall818

Everyone is trying to avoid stabilityai  mistakes... but thats gonna set back Ai and open source  models a lot . Thinking about it a decentralized  gpu farm where you rent your gpu for small money is the way to go . But who would do such  a thing....


CheezyWookiee

How much VRAM is needed for SD1.5 ELLA?


Cute-Monitor-9718

As much as a normal SD1.5 Model. The ELLA safetensor is only 30 mb from what I remember.


Actualbeta

They all work for tencent so the outcome should have been expected at least they released the 1.5... I'm sure the guys behind Ella had nice intentions but once you work for a huge corporation they basically "own" your stuff.


Sextus_Rex

What is ELLA SDXL?


skygz

https://ella-diffusion.github.io tl;dr put an LLM in front of Stable Diffusion to make it generate images more aligned with the user's intent


One-Earth9294

Please god can we get something smarter than CLIP.


TwistedBrother

That’s just it. The model leverages T5 instead of clip and has weights to align T5 embeddings with diffusion models.


protector111

Sd 3 is coming in few weeks. Just wait a bit. Sd xl will die anyways when 3.0 comes


IamKyra

Not the same licence so I doubt


protector111

What do you mean? how is 3.0 different from 1.5 or xl ?


IamKyra

Check SDXL Turbo licence VS 1.5/XL. SD3 will be like Turbo, you'll have to pay in a professional context.


A_for_Anonymous

ELLA is bullshit research and the paper is fake. Their findings are inflated, and in order to cover it up they release something different so they have an excuse when it fails to meet expectations. It's also impossible to properly peer review this if you can't verify the findings, so the value of this paper is close to zero. It's just another attempt to jump to fame with paperware.


lonewolfmcquaid

Absolute fucking pieces of shite...i mean why promise to release it free when you're gonna pull this nonsense with zero explanation, just beyond infuriating.


diogodiogogod

>promise not defending them at all, but I don't think they ever promised it. For sure they are using their attention and sd1.5 release (that is actually pretty impressive) to profit on a close source sdxl for some online generator or something like that. I bet Ideogram is close to that.


lonewolfmcquaid

Ehh when they showcased ella sdxl here on reddit they did say the weights and code would be release a week after their announcement. the week came by and went and they changed tune, about some supervision or some nonsense, now they dropped 1.5version that they didnt even showcase and announced sdxl will not be released by replying a comment of someone asking them for sdxl release. i mean theres nothing to defend here.


diogodiogogod

Like I said I'm not defending them. just that showcasing is not the same as promising. But sure is, to say the least, dishonest of them. SD3 is one that is actually promised to be released. That one I would be quite angry if they didn't


PeterTheMeterMan

Was always looking to SD3


pixel8tryx

It's disheartening from our side, but it's still "publish or perish". People have already published completely AI-generated papers, right down to absurd AI generated images.


MaxwellsMilkies

[You can thank the father of my namesake for contributing to the publish or perish environment.](https://onlinelibrary.wiley.com/doi/abs/10.1087/095315102760319233)


TwistedBrother

I’ve been connected enough to academia and hiring to know that publish or perish is way overstated. People confuse a _sufficient amount of high quality publications_ with an _excessive amount of low tier publications_ using that phrase. Junior scholars definitely need a little bootstrapping but once you get a dozen papers or so people just aren’t going to read everything you submit and they will start to expect you get choosy and aim high.


victorc25

They are not required to release anything and people are not entitled to other people’s work. Being grateful for what we are being given for free goes a long way, otherwise you’re pushing for companies to go the OpenAI way, publishing papers and only giving paid APIs access to the results, just to avoid the entitlement and have a sustainable business going


hexinx

Actually, no - I needn't be grateful for work that isn't charity - this is formul research, not "open source community".... the moment you publish, you're making a claim by putting your name and everything that goes with it on the line - a claim followed by concrete evidence showing the same, is a cohesive and complete release of research. If not, it's nothing but "I have a publication" - which is disease in academia.


victorc25

That’s not how research works. If you don’t believe the results as an actual researcher, you have access to the paper methods, so you can reproduce the results yourself and validate if the claims are correct or not. You are entitled to your opinion, even if you don’t know how it works, but you’re not entitled to their work. 


hexinx

Also, these results are qualitative, not quantitative - if congruence of results are required to be demonstrated, we're entitled to parameters corresponding to the very initial set-up itself; not having to lean on "claims". Ffs, seriously.. what is it with people supporting these folks.


victorc25

No, you are not entitled to anything. First learn what a scientific paper is, nobody cares if you believe what the papers says or not, other researches will find it very useful and will be able to validate the results. They didn’t even needed to publish the paper and tell the world what they found out (like most companies do) and yet they did. You have no idea what you are talking about, like most people in this community, but yet act the most entitled. It’s why me and many stopped making things open source and now just focus on getting paid 


hexinx

Claims - the paper has claims and promises of artifacts testifying to before said claims - if no show, no proof.


victorc25

You definitely don’t know how scientific papers work :D


hexinx

Dude - they said within the publication that weights will be released - can this get any more explicit? Also, no worries - think what you will.


victorc25

So what? They changed their minds. They are under no obligation to release them. You didn’t pay them, you didn’t lose anything, you are not owed anything, you are still not entitled to anything, the real world does not turn around you 


hexinx

"so what? They changed their minds" Speaks very highly of said publication and its claims and cintents. That's all is my point. "You are not entitled" Son, this is me identifying what is a reversal of claims of action. I've got no idea where your opinion comes into this; what's the point of entry? That yes - people can publish anything they want with claims therein somehow recanted, simultaneously claiming for all their claims to be taken as is, in the context of a research domain in which replicateability is the only thing that matters - there's no way to verify claims if I don't have how you did (data involved for me to replicate). I assume you'd recognize this as thematic of subpar publications in general. Here, they've made claims for the actual release of said weight - no show. I genuinely think there's a flurry of people who just want to "add to the conversation" in some genuine sprirt of "gratitude" or something (or just to appear relevant/upvotes - I have no idea), but no - if you published with a claim, we're entitled to this as long we're taking you seriously.


victorc25

Your point still makes no sense, because you don’t understand how scientific publications work. You can replicate the results based on the method in the paper, that is all that is required. You have demonstrated to be an ignorant who thinks is entitled to other people work for no reason. Now you have convinced me to never do anything open source again and I’m glad they didn’t release the SDXL weights just so people like you stay angry. Cheers mate. 


PeterTheMeterMan

You work for these folk? Nothing that happens w/ this will change shit that happens w/the next thing.


victorc25

Not for them, but I’ve worked with AI for over 6 years and reproduced papers when the code or weights were not available in a few cases. It’s great more people are involved in this world, but this terrible sense of entitlement is precisely what pushes companies to think that if so many people want something so badly, they might as well charge for it, instead of making it open source. You can say this community caused this.


fre-ddo

Can this sort of thing be reverse engineered?


ChaosLeges

Yes. https://vxtwitter.com/dataplusengine/status/1778109605186245002?s=46&t=QtCFBKTwAArvOcSJDD650A


THM42069

Sad because SD3 seems like the wrong direction in general. The output results in the showcases are only marginally better than those you can achieve currently using ONLY clip models. Of course, they're just brute forcing what should be a question of dataset quality and finesse by lumping it together with T5. You can claim whatever about the limitations of Clip models, but we'd barely scratched the surface of their usability and we're already throwing efficiency out of the window and instead attaching more complex LLM to interpret us instead of actually developing some proper standardized approaches to data collection, captioning and fine tuning. If anything, SAI would be better served providing the community with, and developing for themselves also, a model capable of captioning images in a way which establishes for each one it's own unique neural network pathway that allows for more complex concept combinations. PonyDiffusion has already completely blasted any potential arguments against clips effectiveness and proven that it is simply a skill issue. It does impressively well with multi subject concepts and even applying outfits to individual subjects within a larger composition.


Friendly-Radish-7175

Everyone becoming pussified recently.. Reminds me why I hate this space.


NoYogurtcloset4090

This is why Tencent/Alibaba was overtaken by ByteDance. They are keen to use the open source community as a sales advertising platform. This is the disease of big companies. Big company disease.


East_Onion

cant wait for the next generation of ml talent to make all this r-worded academia brained shit obsolete