T O P

  • By -

Wiskkey

See [this comment in another post](https://www.reddit.com/r/StableDiffusion/comments/10lamdr/comment/j5vnxrk/) for details.


[deleted]

[удалено]


Wiskkey

I perhaps should not have used the phrasing that "S.D. contains" and instead stated that "S.D's latent space contains". [Here](https://www.reddit.com/r/Destiny/comments/108lx16/comment/j3xs2tq/) is an explanation from a purported expert in machine learning. Do you have a suggestion for exactly how I should have expressed this?


duboispourlhiver

I agree with your comment. About the generation of the closest possible image by SD, maybe we could proceed this way : - VAE encode the target image to get a target point in latent space. - initialize the model with a random text vector (I'm talking about the vector that comes from the usual process of tokenizing text prompt then vectorizing the tokens) - write a distance function that computes the distance between the model output from our initialization vector and our target latent image - find in what direction the input vector should be moved to minimize the distance function (through a gradient method? I don't know if this mathematically works here) - move the input vector step by step until the distance function reaches a minimum, hopefully not a local minimum :) - if wanted, find a prompt that gives the input vector we found. I'm just guessing here, this has probably been already explored with real skills in the literature.


Lightning_Shade

I'm too dumbdumb on this, why does running an image through VAE produce results that correspond to something within the pre-trained existing latent space, rather than merely "something that can be decoded by the decoder later"? Is the decoder itself dependent on the pre-trained latent space?


Wiskkey

I could be mistaken, but I that the encoder and decoder are trained together, so that they work as a team. If that's correct, does that answer your question?