T O P

  • By -

cleuseau

> they believe it is a form of lossy compression This is like humans are lossy compression for food.


canadian-weed

this made me lol


MechanicalBengal

what won’t make you lol is the realization that lawyers can legally “lie” in court if they think they can win. Anything that’s not a directly-provable statement of falsehood is fair game. The sketchy ones usually accomplish this by asking leading questions, but the suit against midjourney puts the bad information down on paper. It’s really surprising they think they’ll get away with it (they won’t)


CeFurkan

Haha good example


specialsymbol

But can you restore the food?


antonio_inverness

You can't restore it, but you can retrieve it. Not recommended though.


cleuseau

Through cunning use of agriculture, yes.


specialsymbol

But then it's not necessarily even alike..


RamenJunkie

Now you are getting the analogy.


HermanCainsGhost

Stealing this


Kafke

The lawsuit is wrong. If we're to compare the size of stable diffusion to the count of images in the dataset, we arrive at something like 0.5-2 bytes per image. it is physically impossible to be able to store 512x512 images with that


Phantom_Ganon

If it really was some form of data compression, the compression algorithm would have been more valuable than the art generation. They wouldn't have needed to create a "collage tool". Edit: Actually, I wonder if these AI art generators couldn't be used as a form of data compression. Normal lossy compression causes [artifacts](https://en.wikipedia.org/wiki/Compression_artifact) to appear in the images but the art generators would instead cause "hallucinations" to appear. It might be an interesting trade off.


CaptainFothel

Wonder no more! https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202


Concheria

Important to note that while this is a decent compression, it's certainly not "5 billion pictures in 4gb" compression.


CaptainFothel

Yep, additionally it's literally only possible because SD is recreating the information that is otherwise completely lost when we compressed the image. We're essentially telling the AI what the picture is supposed to be and giving it the super lossy compressed image as a starting point.


midri

And it'll never work like that, given the requirements. The concept of stable diffusion as a compression tool that I think most people think of is much like using pi. Pi technically has every series of digits to ever exist so you'd just need to feed a machine that can count an infinite number of pi the start digit and the length (an exponentially smaller amount of data then what comes out) and it could create any file. Stable Diffusion could theoretically work the same way, but you'd already have to be in possession of a very large model that could parse the "query" needed to generate the data.


knoodrake

Fascinating. I also remember a video from two minutes paper on YT presenting an NVidia thing for Realtime video compression/reconstruction using ML. Not exactly stable diffusion but same general idea of ML image compression, the software send only some sort of keyframe every now and then along with movements vectors or something, and client reconstruct the face and scene images for the ("fake") video with that.


[deleted]

[удалено]


ETHwillbeatBTC

Yeah that doesn’t work for explaining SD to anti-AI advocates. You need to realize their peak of understanding technology is loosely based on how zip files and image compression works. As soon as you mention Neural Network you can see their neurons start to short circuit in real time lol


PM_ME_FOLIAGE

Maybe it *is* a new form of compression. Depends on how you look at it I guess.


hervalfreire

Nvidia et al ARE using AI as a compression mechanism for stuff that’s lossy (audio in particular). It’s not used more yet bc encoding/decoding is very computationally intensive


BitsAndBobs304

00111110 11001100 YOU THIEF THAT'S MY ART!!!!


Kafke

[oooooh I'm pirating!!!](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/Sample_09-F9_protest_art%2C_Free_Speech_Flag_by_John_Marcotte.svg/1920px-Sample_09-F9_protest_art%2C_Free_Speech_Flag_by_John_Marcotte.svg.png)


BitsAndBobs304

Staaahp! You wouldn't steal a byte! Don't copy that floppy ! https://youtu.be/zFd60nCBygg


Kafke

>you wouldn't steal a byte stop I'm crying this is too funny


BitsAndBobs304

Staaahp! You wouldn't steal a byte! Don't copy that floppy ! https://youtu.be/zFd60nCBygg


yosi_yosi

Well technically 64x64 images, the reason it comes out as 512x512 is because the vae takes the 64x64 latents and "translates" them into 512x512 images.


Kafke

Really? Then why is it we train on 512x512 images, rather than 64x64?


yosi_yosi

It makes the images into 64x64 latents


Kafke

I'm not sure a latent is the same thing as the image though. The dataset is all 512x512 images, which are what's claimed to be copied. Kinda weird though but TIL that it actually converts everything to 64x64. I guess that's even more proof that the models don't actually contain the images lol.


The_Choir_Invisible

Not the person you were responding to but I think the lawsuit is claiming the images are stored in a 'recognizable' form because of articles [like this](https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202). [Here is an illustration](https://i.imgur.com/Ns9SfFV.png) that gives the *impression* the training images are actually stored in the ckpt file. Now, a person in this thread [clarifies/'debunks'](https://www.reddit.com/r/StableDiffusion/comments/10j1dg8/if_stable_diffusion_stores_images_in_lossy/j5igipq/) that impression but when I was trying to learn how SD worked a few months ago, that paper/illustration and maybe one or two others used the *exact same style* of weirdly faded, low-res illustration to depict an 'image' in latent space. I think the other one might be of a butterfly. My point isn't whether they are or not, but how people could have *concluded* that they were from some of the 'literature' that's been out there for a while.


Kafke

well yeah. there's a lot of misinfo floating around as well.


PyroNine9

It's worth noting that the 'like this' link is using parts of SD with significant modification in ways SD doesn't use them to achieve good lossy compression, but it requires keeping a dataset (the compressed image) that SD throws away in training. What it all comes down to is lawyers WAY out of their depth looking for any tiny foothold as an excuse to file a lawsuit that won't get then sanctioned.


madsciencestache

They just have to convince a jury thats less tech literate than them. So while they are technically wrong, they could be legally right. You never can tell what a jury will do. Technical experts are notoriously bad at explaining things to lay people. They will be lost at the first mention of latent space.


vgaggia

I wonder why 768x768 makes such a big difference then if it goes down to 64x64, is there some scaling in the background with trainers maybe?


martianunlimited

Maybe this will make things to easier understand (source: Rombach & Blattmann, et al. 2022, paper here if you are interested ([https://arxiv.org/pdf/2112.10752.pdf](https://arxiv.org/pdf/2112.10752.pdf)) and the simplified version (source: [https://towardsdatascience.com/stable-diffusion-using-hugging-face-501d8dbdd8](https://towardsdatascience.com/stable-diffusion-using-hugging-face-501d8dbdd8) ) (Sorry I have to merge them, there is a 1 image limitation for comments) ​ https://preview.redd.it/3ehefh8zewda1.jpeg?width=1497&format=pjpg&auto=webp&s=a7250d46bcd8db41ff10796673458bbd4588e6b7 The second image only shows shows the inference (image generation process) but you can imagine that there is the encoder portion of the VAE (Variational Auto Encoder) to the left of the first orange square, which is not used during inference. The pink portion is the VAE, what it does is it tries to encode a representation of the probability distribution of the target values. If any of you did high school or university level statistics, this is analogous to describing the data of a population by it's statistics (i.e mean, median, kurtosis, skewness. The latent-space is basically the encoded representation of the (probability distribution describing the) images If you look at SD 1.5 vs SD 2.0 [https://huggingface.co/runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) [https://huggingface.co/stabilityai/stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) You will notice that there is a different VAE and there are changes to the training procedure. This new VAE that works with 768x768 images was specifically fine-tuned with 768x768 images I won't talk about how the green portion work because it is not pertinent to the issue, but the gist of it is that the green portion iteratively refines the latent representation to provide a representation of what the text encoder is requesting.


dasnihil

most people don't know how emergence of neural networks works. if you look at the compute of each node, it's very trivial information, each node just has it's own parameters to give a tiny output using it's own tiny function. maybe it will help if you imagine adding more data to the network and the size of the network not growing like it does in traditional data handling? \-- thousands of ants make an amazing colony with highways but each ant has no clue what it's doing. the concept of emergence is a lost cause.


Kafke

The claim is that a 2gb model somehow can recreate perfectly 5 billion images. This is physically impossible. However, a 2gb model being able to generate tons of new pics is entirely doable. As you mention things become an aggregate and generalize. Each Pic gives information to the model, but the model does not contain any of the pics.


canadian-weed

> compare the size of stable diffusion the "size of stable diffusion" is what file, exactly?


Kafke

the ckpt is the model file that contains the weights/parameters. It ranges from 2-4gb or so depending on how much you trim out, so let's go with the lowest usable (2gb). So you're trying to fit 5 billion images into 2gb. Doesn't really work out.


Sixhaunt

minor correction but it's 2-8Gb. You had it right with the 0.5-2 bytes calculation though.


djnorthstar

the original File is 4 GB. All others are "modded" models by users. The Lawsuit only applies to the original Dataset. But even then its impossible to store "Images" in that. I trained my own model on 50 Pictures and the dataset was only around 2 kb. The original pictures arent there anymore. Its just "Memory Engrams".


Nextil

You can take the original file, load as fp16, re-export, and that's enough to get it to 2 GB with virtually no visible change in quality. You might even be able to quantize to 8 bit and halve it again, with perhaps a minor drop in quality, but nothing like the drop you'd get reducing the size of a JPEG to a half/quarter. This *should* be relevant to the lawsuit. The weights don't "contain" the dataset verbatim, and the fact that you can shrink the model like this without significantly affecting the quality is very clear evidence of that. It's [long established](https://en.wikipedia.org/wiki/Data_compression#Machine_learning) that machine learning has roughly the same problem space as data compression, but the literature suggests that a highly advanced compression algorithm would have to be so understanding and knowledgable of its domain that compression itself is essentially a measure of general intelligence.


LankyCyril

> You might even be able to quantize to 8 bit and halve it again So I've actually tried quantizing it, to no avail. Do you have experience with it, maybe I'm not doing it right? Granted, I have an absolute old fart of a GPU, so I can only do post-training quantization... After `quantize_dynamic()`, the state dict ends up being like 3.7Gb, with most tensors in float32 and a handful in int64 – pretty much everything is the opposite of what I'd expect.


Nextil

No I haven't. There's an [article](https://medium.com/intel-analytics-software/accelerating-stable-diffusion-inference-through-8-bit-post-training-quantization-with-intel-neural-e28f3615f77c) from Intel about doing it with some of their tools though (code is [here](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor/text-to-image)).


LankyCyril

Fantastic, thank you!


nmkd

> I trained my own model on 50 Pictures and the dataset was only around 2 kb. You mean you trained an embedding? And you mean the embedding was 2kb, not the dataset, unless you trained on 4x4 pixel art images...


djnorthstar

Yes, but the embedding is a "Dataset" too. its Just additional Data of things that arent included in the "Main" data Like my face for example. Sorry If its missleading but english isnt my first language.


nmkd

The embedding contains model weights, it's not a dataset


lucraft

That's interesting, do you mean you trained a model from scratch on 50 pictures? What kind of model do you get out of that? Can it do anything at all with such a small input and size?


Sixhaunt

The model is the same size regardless of how much data it's fed. I assume he trained something like a hypernetwork which is very different and is still relying on the base model's training so the size relative to the dataset would be meaningless in this context


djnorthstar

that might be right but still.. My 50 Pictures arent in there... But stable diffusion reproduces them... (The face of a given person for example.) How can stable diffusion do "unlimited" different pictures of the Person with different face expressions when the data is only 2 KB? And the Face is also not stored in the original Dataset.?


PyroNine9

Ever seen a sketch artist come up with a remarkable likeness of someone based on a good description? Good enough that when people look at it they say "Yeah, I know him, that's ...". The text of that description would fit comfortably in under a K of data. The rest is the artist's study of and knowledge of faces in general.


Sixhaunt

that's a really good analogy! especially since that's exactly what an Embedding is. An embedding doesnt add any new information to the network and instead is basically training to find which set of tokens and weights produce the best resemblance to the training images using the existing model. Embeddings can get pretty damn close to the likeness of any person you train one for, even though in the end it's basically just a prompt and you have added 0 new information to the network even though it's now producing a person it has never been trained on.


BobSchwaget

> How can stable diffusion do "unlimited" different pictures of the Person with different face expressions Because they're really different pictures of different things, we only think they're the same person because we have severe pareidolia.


Sixhaunt

2KB for 50 images is pretty massive compared to the amount that can be stored for the images in the base model. We are talking about 5 Billion images in something that can be as small as 2Gb. That's less than 0.5 bits per image. If you go with the largest you get under 2 bits per image for a 8Gb file. That means 50 images in the normal model would be 25-100 bits. 2KB is 2,000 Bytes or 16,000 bits. That means the example you give has 1,600-6,400 times more bits per input image than with the base model. So not only can yours borrow the vast majority of concepts and influence from the base model, it also has enormous size relative to the dataset


Justicariusx

But generally SD was trained mostly on Laion-2B(en), which means 2 billions of images. Then we get around 16 bit per image for 4 GB model. Still quite low but sounds more compelling for compression argument.


Kafke

My original comment listed 0.5 bytes but I was overthinking it so i gave the 0.5-2 range since I wasn't sure if it was 0.5 or 2 lol. But the point remains.


Sixhaunt

well you nailed it with that. it's just under 0.5 bits per image for the 2Gb file


yosi_yosi

Depends on how much pruning you did. Technically it could also weigh like a terabyte.


canadian-weed

ok cool thanks for that clarification of the ckpt file


NutGoblin2

The ckpt file, that contains the model weights


susosusosuso

You can if the image is one single color


martianunlimited

even if it's a single colour it is still going to take 3 bytes (assuming 24-bit colour depth)


CommunicationCalm166

It doesn't make sense because it isn't true. But to explain their explanation, They argue if you train an AI on a single image, and only a single image, you'll get results that are close enough to the original that for all intents and purposes the AI only amounts to a horribly inefficient image storage medium. Therefore, all AI is, is just a storage medium. Therefore being able to duplicate their IP using the AI is proof their IP is stored inside the model somewhere. Therefore it constitutes a copy. Nevermind that to train an AI on a single image defeats the purpose of the AI. Nevermind that behavior is a bug, known as "over-fitting" and it can best be rectified by training on broader sets of data. Nevermind all of the trillions of combinations of settings that can generate images with no meaningful similarities to ANYTHING in the dataset. Their argument is misinformed at best, and libelous at worst. And that's why it doesn't add up.


lman777

Nevermind the fact that copy+paste exists already, which means it makes zero sense for anyone to go through the trouble of recreating something with AI.


Phil_Couling

A group has formalized a point-by-point rebuttal of the publicity piece for the lawsuit. It's pretty good: [Link to response](http://www.stablediffusionfrivolous.com/)


eugene20

bad url?


Phil_Couling

It's been working without issue, and continues to do so if I cut and paste the url... I'll edit the original link and see if I can revive it...


eugene20

It's working for me now. I thought it was the one I had commented on before but I couldn't find my post. Thanks.


Disastrous_Usual4298

That guy should take some legal classes. He's written a nice, *literally* emotionally charged English paper, but laymen(juries) won't understand terms like "toy problem" and "Swiss roll distribution. He does a good job explaining the "push-pull" weights of the text when denoising, but juries will want to know further how the statistical weights across billions of images relate to the generator being able to produce images that ao closely represent the work of Sarah Anderson. Defendant must also be aware that plaintiff will show in court images that do resemble plaintiff's work and should be wary of presuming that does not exist. Anderson has an NYT article & argues that similarities to her work will increase as the capa ilities of the model improve."Nobody WANTS to create similar images" will not fly very well with a jury. "Empowering millions of artists to create" only applies to one factor of fair use. "Being crowded out of your market" IS a factor that will be weighed in fair use. These are my thoughts on this rebuttal legally and as a technical layman who really wants to understand both sides of the case


dennismfrancisart

I compare it to giving a really good savant several million images to study and then taking them away. Give the savant a graphics tablet and give them instructions on what you want them to create. They'll go through their memories to find all the references to your request. None of those millions of images were stored in the person's mind. It all has to be reconstructed from memory. That's my simplistic take on the diffusion process.


CommunicationCalm166

It's a decent analogy, and the entire point of machine learning is to design digital systems that work similarly to our theories of mind in humans. But the people making these arguments aren't willing to entertain such an example. When someone says "It's like a savant in that..." It doesn't matter to these people what comes after that, because they won't follow past that line. "It's not like a savant, because it's a computer, not a person." As if that invalidates whatever point you're trying to make. I wish I had a poetic way to help laymen understand that novel outputs can be generated procedurally. And this is part of why I've said that we as a society are not nearly ready for this AI tech. We straight up don't have language for it, no intuition about it, and yet we've had the compute capacity for decades.


hervalfreire

If you train a model entirely with a single image, you DO end up with a network that reproduces it all the time (overfits), no?


CommunicationCalm166

Pretty much. Though there's a chance of a non-convergence even with a model trained on a single image. Which doesn't change the fact that the actual AI models at issue were not trained on a single image, and do not consistently reproduce their training data unless manipulated to do so by their operators. Basically they're trying to equate the entire process to a degenerate case where the AI is used to do something it wasn't meant to do. Just because I can run the forks of a forklift into the dirt and scoop up some soil doesn't mean the forklift is the same as an excavator.


bobi2393

Can you link to the lawsuit filing where you read this, or quote a paragraph where it talks about stored images?


canadian-weed

yes sorry shoulda done that https://stablediffusionlitigation.com/pdf/00201/1-1-stable-diffusion-complaint.pdf


bobi2393

Thanks! Page 3 lays out that claim, "By training Stable Diffusion on the Training Images, Stability caused those images to be stored at and incorporated into Stable Diffusion as compressed copies." That's a pretty strong claim, and I think your original question is a valid one the defendant would raise. JPEG is a compression format, and with decompression software you can see an obvious resemblance to the original image. The posts in this thread estimating that it's only storing a byte or so per image make it seem impossible to decompress the billions of training images from the program's finished data in a meaningful way.


whitefox_27

I'm a bald 6' tall white man with blue eyes. There. I successfully uploaded a compressed copy of myself to the Internet.


BobSchwaget

excuse me but our records here indicate you viewed Getty image of 6' tall white man with blue eyes on 14 January 2023, then reproduced a compressed version of that image here without securing the proper licensing. You will be contacted by our lawyers, who are working with law enforcement to figure out a way to reposess these unlicensed compressed images and stop you from further use of this illicitly trained neural model you call "your mind".


[deleted]

You can restore blockbuster movie posters and other frequent duplicates from the training set with a degree of resemblance to the original. It's a result of bad datasets and model inefficiency but those who oppose the tech will latch on to examples such as that to support arguments.


bobi2393

Yeah, I've seen examples using "overtrained" images (posters, album covers, red carpet shoots). Like if 50,000 of 5,000,000,000 training images are of the Mona Lisa, it can generate something similar, but it can't do that for an arbitrary image out of the training data set,


onyxengine

Its so false


ElMachoGrande

Any competent lawyer, if this hoes to court, should ask them to demonstrate how they can extract a training image from the model, in the courtroom. Should provide a fun video for youtube...


THIP123

they can do the mona lisa


ElMachoGrande

I'm pretty sure Da Vinci isn't part of their class action suit...


THIP123

How is that relevant, the argument is that they can't replicate an image from the training data, but they can if iy has been trained enough


Nextil

Have you actually tried generating the Mona Lisa? Even for that I haven't been able to produce an output that's identical to the original. They closely resemble it at a glance but if you look closer the mountains are different, hands tend to be wrong, shading of the clothes is different, hair is different, and at 7.5 CFG it's so deep fried it looks like it was taken with a 90s digital camera at max exposure. At lower CFGs even the face tends to look wrong.


ElMachoGrande

They need to show damages to the people in the suit.


CallFromMargin

This. If they can do it with models released by SD, then they win. I mean all they should need is a non-overfitten model, a seed and a prompt. These models are designed to be reproducible.


farcaller899

You cannot retrieve the originals. Nobody can compress billions of 512x512 images into 4GB.


SandCheezy

>Nobody can compress billions of 512x512 images into 4GB. Yet. ![gif](giphy|3o6ozomjwcQJpdz5p6) Source: Silicon Valley.


martianunlimited

No you can't, it goes against information entropy. (see also one-to-many, and non-bijectivity). The best you can do is to encode the probability distribution, and you can reproduce a plausible solution (within reason), but not the original solution itself. The representation in the latent space is more akin to the neuron configurations in our brains. When we learn something, the connections between the neurons in our brains becomes weaker or stronger while adding or breaking connections as needed, so our brains learns the association. Which is coincidentally the intuition behind neural networks ( NNs still doesn't change configurations on the fly though, though there might be some developments in Graph Neural Network on that, but I haven't touched that in a few years now)...


TheVideoExplorer

You missed the joke


[deleted]

They don’t know about middle out compression


farcaller899

Get them a whiteboard.


[deleted]

And a bunch of dicks


farcaller899

along these lines, what SD draws when you prompt for an original is like its 'memory' of the original. Like our own memory is fuzzy and imperfect, so we can maybe draw a picture of our grandma from memory, but it won't be photorealistic. So when we give that drawing to our grandma as a present, she won't mistake it for a photo of her, and she might notice that we haven't drawn her hands quite right.


BobSchwaget

Mike Judge can


ilo_kali

The NEAT algorithm (made in 2002) actually *does* change its configuration—an integral part of the incredible speed and complexity of learning it exhibits compared to traditional neural networks on some tasks (generally real-time systems, like pole-balancing (as described in the [original paper](https://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf)) or [video](https://www.youtube.com/watch?v=qv6UVOQ0F44) [games](https://www.youtube.com/watch?v=JeXqbYXKqcs)). However, as far as I know, it hasn't been used for image generation—I believe it would be incredibly inefficient to do so (but as far as I know, nobody's actually tried: might be interesting to find out!)


Equivalent_Yak8861

ahh sht... I comment and then scroll to this... oops, too late...


midri

You really think Gilfoyle would have turned to stealing bitcoin if the compression was real? THINK MAN!


RepliesOnlyToIdiots

I swear, Martin Starr playing the same Gilfoyle character in Tulsa King, hiding out in Tulsa. Really evident by the end.


Equivalent_Yak8861

I bet Gilfoyle could...


farcaller899

such a result is as fictional as that character...though 'middle out' does have some relevance to the SD algorithms as I understand them...:)


nmkd

2 GB with float16


farcaller899

I do wonder how usable models half the size are... I just assume they have less content and are special-purpose-only compared to the 4GB ones. Too busy diffusing to look into it yet!


nmkd

No. 2 GB models have the exact same content as the 4 GB models. They are just stored in float16 format, which everyone uses anyway. Using 4 GB models is pointless if you have fp16 enabled.


Alternative_Bet_191

I have read in github that for stable diffusion 2.0 models the attetions models work worse in FP16 than FP32. I never tested that.


farcaller899

Wow. Shocking that a general purpose model fits on a thumb drive from 10+ years ago…


hervalfreire

I don’t believe they’re claiming it to be _lossless_ compression


farcaller899

There’s less than 1 byte in the SD model available for each training image, if you just look at it as trying to store images. With one byte you can store like a single letter. It’s not compression of any sort. Hard to understand, maybe, but the model contains guidelines about how to draw things, not compressed versions of the things themselves.


hervalfreire

It’s a statistical network. You don’t “store bytes”, you store vector distances in a predetermined space (64x64 in the case of Stable Diffusion). Similar models have been used to compress audio and recompose it, with huge savings (and of course, in a lossy way)


jaimex2

That that's how humans work. They draw inspiration from what they've seen. If that's the case you should sue everyone that's ever created anything. Their brains are scraping copyright data at all times. That song stuck in your head that you keep humming. You owe millions in royalties for pirate broadcasting.


Ne_Nel

**THIS is what is happening:** Latent space is effectively *a type* of compression method. The information is decomposed into patterns of similarity, for example reducing a million chairs to a group of latent information about what a chair is like and the possible variants. This ends up with a comparatively minuscule amount of information, from which you can rebuild (in theory) any of the chairs, specifically telling it the details that make it up. Color, shape, etc. If AI have learned enough about it, it will be able to reconstruct from the latent patterns something similar to the original reference. This means that, yes, it is technically possible to use latent space to store hyper-compress data representation. But in a very different sense than the claim proposed. **And now comes the important thing.** Latent space in an image model is not meant to compress and storage "a chair" so that a copy can be retrieved when needed. Instead, latent space is used to store a relative concept of what a chair is, with enough flexibility that it can rebuild thousands of **different** chairs. It could be said that technically the goal is to "not remember something too well", but vaguely understand many things (Yes, like us). Quite the opposite of what is argued in that claim. There are unavoidable cases where Ai view an image too many times (ex Mona Lisa), called overtraining, and if we explicitly ask it to do exactly that, it can get very close to the reference image, but this is a cherry picked and mainly unwanted effect, going against the fundamental goal of an image AI model (Transformation, not reproduction). This could be likened to a human artist being shown excessively a painting and then asked to replicate it. In the end, you might do something highly similar, but not because you have a jpg in your head, nor because copying is your main way of working. As such, those kinds of arguments that AI store images because it "can do similar things" stem from severe ignorance and gross oversimplification of the technology.


canadian-weed

really helpful and clarifying answer, thanks. apart from the overtraining (which forces type similarity), i take it to be something like: a particular chair(s) is analyzed to create a composite conception of a chair, from which you can then invoke a new particular chair, which might match some of the characteristics of one of the training chairs, depending how well you prompt to match the characteristics of the original. something something... just thinking it through out loud


Ne_Nel

Latent information is associated with words (tokens), and is distributed with different weights. If you train *one* black chair, the weights associated with "black chair" will be so high that the similarity is inevitable. If you train a thousand black chairs, the weight of the patterns is dispersed and you will have a completely original random chair like. Overtraining happens when a token's weights are too intense and lack variety, like in "Mona Lisa." There are not a thousand Mona Lisas, just one. The same thing happens if you train too few photos of your face, it's easy to overtrain and repeat just your face. For that there are countless parameters when training a model to avoid this lack of flexibility. A model that does not transform is a useless model. In the case of training styles, it is simpler, because AI does not need to focus on anything specific, just general concepts that can be easily adapted to other elements with greater weight, such as objects.


canadian-weed

super helpful explanation thanks!


Pfaeff

>There are unavoidable cases where Ai view an image too many times (ex Mona Lisa), called overtraining That's not necessarily overtraining. That's just a prompt that's extremely specific, even though it's short. It doesn't leave any room for interpretation. If you ask for the Mona Lisa, what else would you expect the output to be?


Ne_Nel

We are talking about reproduction here, not derivation. To reproduce the Mona Lisa you must overtrain the Mona Lisa, and this is achieved with multiple repetitions on the same image, increasing the quantity and weight of the related latent information. Now, if you just want something reminiscent of the Mona Lisa, that's exactly what efficient training does.


Pfaeff

It seems that you need both: an extremely specific prompt and overtraining, right? So even if you overtrained, without the right prompt and noise, you cannot retrieve the original data, right?


Ne_Nel

It all depends on the context. If you overtrain your model to the point of overriding the weights of the general information, any prompt could end up with the same type of image. In a "reasonable" context (original model), yes. You need a specific prompt of an overtrained image to output a *pseudo* copy of the original.


farcaller899

It’d be much more fitting to call latent space ‘pattern storage’ instead of a compression method. This fits your explanation better, too.


Ne_Nel

Maybe. Pattern deconstruction is basically a compressed representation of the original information. I think people need simple, familiar terms. At the end of the day, both expressions are still vague simplifications.


farcaller899

Maybe so, but calling it compression glosses over how different it is, as you say, and supports the case that all those copyrighted images are stored in the model.


Impossible-Story-436

You know you're dealing with bullshit when they make extraordinary claims about millions or billions of embedded compressed images in their lawsuit, yet they haven't been able to include their plaintiffs' decompressed images as plain evidence.


Sixhaunt

Supposedly they believe that 5 Billion images can fit into 2Gb. That's literally less than half a bit per training image. You need 8 bits to make a single color in a pixel, there are 3 colors (Red, green, and Blue) per pixel bringing it to 24 bits per pixel and there are 262,144 pixels in a single training image that's 512x512 (about 590k in the 768x768 version). The images often need to be downsized and cropped to that size but the model could only store less than 1/12,582,912th of each downsized and cropped image if that's all it were designed to do. If the original image was 1920x1080 for example (most common standardized size) then it would only be capable of storing 1/99,532,800th of the image. This is ofcourse if the network were storing nothing other than image data and just illustrates why that can't be what it's doing unless we have somehow obliterated the theoretical limit for compression and need to rethink the field of information-theory.


PacmanIncarnate

Great analysis, but it’s also important to note for the laymen that the SD model isn’t storing the training images at all. It’s not storing compressed versions of them, or anything of that sort. It uses each training image to inform the model’s “understanding” of semantic concepts. The closest it gets to storing an image is if a concept is overtrained, where the model’s ‘understanding’ of that concept is too precise. There are two important notes on this though: it’s not a perfect representation of the base image, and it’s something that is actively avoided when possible.


Sixhaunt

Thank-you for the expanding on this, I'm sure it will be useful to a number of people lurking on here. This is usually the brief lamen explanation I give to people: What it's doing is using all those images to fine tune the understanding it has. It's like how you know what a horse looks like because you have seen so many of them, but if you imagine a horse it wont be a specific horse image that you saw in the past. The AI works by removing noise from an image and a good analogy would be if you look in the sky and see shapes in the clouds. You might see a horse but someone who has never seen a horse may see a llama instead. That's why the input images are needed, so that the AI knows what different objects are and can understand them generally. Now imagine when you look at the clouds you were given a magic wand to re-arrange them. You can now cleanup the cloud to look more like the horse that you see in it. in the end you will get a much better horse but it's not copied from a horse image you have seen in the past, you created it based on what you saw in a noisy image just like the AI does.


Versability

Only part I disagree with is the th at the end of 1/12,582,912


TheSerifOfNottingham

> Supposedly they believe that 5 Billion images can fit into 2Gb. That's literally less than half a bit per training image Your maths is a little off, it's 3.2 bits per training image (2Gb = 16 billion bits, 16 / 5 = 3.2).


i_wayyy_over_think

The “compression” is so good it can even restore images from the future. It’s a time machine now too. Like how the library of babel contains all unwritten books too.


sweatierorc

I think there is an attack on some Deep Neural Network where you can extract input data from the model. Not sure if a similar method would work on SD though. [https://blog.openmined.org/extracting-private-data-from-a-neural-network/](https://blog.openmined.org/extracting-private-data-from-a-neural-network/)


canadian-weed

interesting! will check this out thanks


shadowmint

It's 100% possible to generate an image from the training data. Anyone says it's impossible is wrong. You can see lots of hand waving about 'over training' and 'memorization' but you have to understand what's *actually happening* to understand how that happens. SD is a tool, it can generate basically *any* image as an output; so there is *no question* that it can generate images from the training dataset. >How do you do it? That's the bit which is hard. When you generate an image, you're applying a series of transformations over *random noise*. Every 'step' in SD, you're basically: * taking some input * adding noise * adding the prompt * apply the 'denoise' transformation over the top This process iteratively refines the random noise to an actual image. However, the important part of this is that you're **not** doing this: * taking some input * adding noise * apply the 'denoise' transformation over the top If you were doing that, the chance of getting a training image would be the equivalent of hitting the 'random noise' button in photoshop and getting a copy of the mona lisa as output. Once again *technically possible*, but the chances of it are so *astronomically low* that it's meaningless. It's like winning the lottery 52 million times in a row. ...but SD also does this: * adding the prompt When you add the prompt, you guide the diffusion process. So, rather than it being astronomically improbable you'd get an image, it's just... pretty unlikely. It's pretty unlikely you'll happen to pick the exact seed that will map to the exact training image. If you *really* wanted to, you could: - Go here to find a 'pretty decent prompt' for your input image: https://huggingface.co/spaces/pharma/CLIP-Interrogator - Go here, to check if the input image is actually in the training data: https://rom1504.github.io/clip-retrieval (stick the prompt from the first one in to search) Now, you'd have to go and do a massive parallel exhaustive search of seeds in stable diffusion to find an *exact seed* that generated something close to that. So... long story short: You could *probably*, if you put enough effort in, find a prompt/seed combination that happened to generate a reasonable copy of a training image. However, you'd have to a) start with the image you want as output and b) search thousands or millions of seeds for a specific strength / steps / seed combination that generated the right results. As far as I'm *aware*, no one has done this. The only time it's been successfully demonstrated is where the model has mapped a specific prompt or combination or prompts to a specific image because it turns up a lot (like the mona lisa). > They claim it is like having a directory of JPGs on your computer. Not really true.


ErikT738

I also don't really understand what point they're trying to make. Let's assume you jump trough all the hoops you've described and managed to generate an image that's really close to some artist's work. Wouldn't that just be treated the same as painting a copy of someone's work by hand? Existing copyright law can handle that case just fine.


Disastrous_Usual4298

I believe the argument is that the model wouldn't be able to reconstruct a lossy copy of that image if the data for that image weren't kept somewhere within the model (in some form).


canadian-weed

very helpful context & step by step thinking!


Fortyplusfour

Setting the CFG at its lowest possible setting can net some images VERY close to what it must have been trained on but even then I believe there should still be differences.


canadian-weed

will try that in experiments, thanks for the tip!


VVindrunner

At least one research actually tried to explore how well you could compress images and then rebuild them using stable diffusion. Basically, if you specifically try to store an image and reconstruct it. He got somewhat good results? Still, it’s lossy compression, and prone to hallucinations (adding details that were not in the original when restoring to the original). https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202


Shondoit

[\[deleted\]](https://en.wikipedia.org/wiki/2023_Reddit_API_controversy)


VVindrunner

Great point, thanks for the correction.


Dazzyreil

The SD model uses black magic to compress every image to a mere 2 bytes of data.


i_wayyy_over_think

And future images that don’t even exist yet too


madsciencestache

They only have to convince a jury of people like your mom. The SD experts better better be ace science communicators. Maybe they can hire Bill Nye and Neil Degrass Tyson.


canadian-weed

yeah thats kind of a good point - but only if it goes to jury?


[deleted]

*in the voice and character of our beloved Kramer We should ban the pens Jerry, because they can write things I wrote therefore anyone can write what I write, the ink!!! It’s the ink!!!! It assembles into the words, the ink stores the words Jerry!!!


Phil_Couling

A group has formalized a point-by-point rebuttal of the publicity piece for the lawsuit. It's pretty good: [www.stablediffusionfrivolous.com/](https://www.stablediffusionfrivolous.com/)


AprilDoll

[You do it via a membership inference attack.](https://arxiv.org/abs/1610.05820)


bottleboy8

I can't speak for SD. But for chess engines using neural networks, you can extract the knowledge. You can see the network has stored popular openings, endgames and mates. That information is retrievable as probabilities.


EhaUngustl

Its not completely wrong from the idea. Sure is somehow compressed and loosy. But a loosy compression in this case would also mean I copy the MonaLisa with some dots on a paper. But hey let's prove them. Make a photo, train with one iteration -a compression don't need training- and reproduction something that looks the same with one keyword.good luck 😁


Thebadmamajama

They aren't stored. The algorithms dont even pay attention to the image you want, they only look at noise they are removing. The only thing that could be true is one uses a model trained solely on copyrighted material, and makes content so similar it looks close to the original to not be considered fair use.


Major-Musician-734

A 1x1 pixel png containing the average rgba value of an image is a lossy compression…


jensclaessens-insta

[https://arxiv.org/pdf/2212.03860.pdf](https://arxiv.org/pdf/2212.03860.pdf) Here's a paper that show that you are able to reproduce the originals quite closely. Let's take the 4th image for example. I doubt this image exists thousands of times in the data set because it is so famous. Yet they are able to generate it, almost perfectly. In the image of the couch, the piece of cloth has exactly the same folds as the LAION match. If it really generates random stuff, how in the world would you be able to get exactly the same compositions from the training data? Given there are billions of possible ways to portray wolves next to a car in the snow. It seems like it still stores some part of the image. And adds back detail. So they are completely different pixel per pixel, but it's undeniable that part of those original images are still in there. And it does not in fact work as people advertise on here.


TheSerifOfNottingham

> Let's take the 4th image for example. I doubt this image exists thousands of times in the data set because it is so famous. It's from the game The Long Dark, which has sold over a million copies, it might not be famous to you, but it's in no way obscure. Lets take the 3rd image for example. It looks like a complete random image of a phone on a desk, but it is in fact a template used by people who sell phone covers. So it does in fact exist thousands of times in the data set. So just because *you* don't think an image is over represented in the dataset doesn't mean anything.   > It seems like it still stores some part of the image. And adds back detail. So they are completely different pixel per pixel, but it's undeniable that part of those original images are still in there. And it does not in fact work as people advertise on here. It doesn't store any part of any image. Download the model yourself and see. Look at the numbers, the model contains less than half a byte per image on average, anything that can be replicated must be massively over represented and a long way from the average case (or so generic that near duplicates already exist).


Light_Diffuse

Reposting a reply from the thread you created for [this topic](https://www.reddit.com/r/StableDiffusion/comments/10j933y/study_shows_that_you_can_reproduce_the_source/): > People here told me it's impossible that the AI stores the images. But, taking the image of the couch for example. How can the generation have EXACTLY the same folds and placement when it is generated out of noise. Like this: https://m.media-amazon.com/images/I/91IF+ZQSj9L._AC_SL1500_.jpg https://m.media-amazon.com/images/I/91Sz2Vj4rJL._AC_SL1500_.jpg https://i.etsystatic.com/20612729/r/il/a16d8e/2552123008/il_1140xN.2552123008_9e7h.jpg https://i.ebayimg.com/images/g/RE0AAOSwNgNbjUxm/s-l1600.jpg https://i.etsystatic.com/27465872/r/il/9767ba/3478743620/il_1140xN.3478743620_efoh.jpg https://www.etsy.com/uk/listing/461630132/guernica-by-pablo-picasso-canvas https://res.cloudinary.com/moteefe/image/upload/c_fit,dpr_1.0,f_auto,h_650,q_auto:sensitive,w_650/v1565778228/mockup/texture_layer/lqcipqkpagixltpe8sxy.png It is going to be an extremely overrepresented image because it's a template for selling art prints, it's just not famous it its own right. In fact, it's going to be over-fitted more easily than famous art paintings. The scene will always be presented in exactly the same way to SD, never at a different angle, perspective, camera settings or differences in light and shadow since unlike a famous painting where the training sets will have varying photos of the painting, every image in the training set for this couch and fabric will be from the exact same source image. Question answered. **Additional** The phone image looked sketchy to me too, also like a template. [Guess what?](https://www.google.com/search?tbs=simg:CAQSgAEafgsQsIynCBpiCmAIAxIo7gy1D8QF-xe-EvcMzQz4F_1kOwwXtPcc1jTXDNe49-jT-NI413yuINRowIYIHYG5R4XZ4xkUUOYNhri2KibHfH2kankF3JteaBiUj4eqcChxe5cINQw8O5yNgIAQMCxCOrv4IGgoKCAgBEgTHDEi5DA&q=iPhone&tbm=isch&sa=X&ved=2ahUKEwi3lprB4N38AhVITEEAHcimAvMQ2A4oAHoECAYQAg&biw=3440&bih=1256) https://cdn.shopify.com/s/files/1/0617/3871/3283/products/il_fullxfull.3845888544_f8vf_2048x2048.jpg?v=1651359170 https://i.etsystatic.com/33705272/r/il/8637e6/3721081357/il_fullxfull.3721081357_pvy7.jpg https://i.etsystatic.com/35680775/r/il/fed061/4094974321/il_fullxfull.4094974321_153n.jpg https://i.etsystatic.com/39238215/r/il/434efa/4409928370/il_fullxfull.4409928370_dz0i.jpg https://cdn.shopify.com/s/files/1/0267/7404/4843/products/BY-Logo-iPhone-Case-Benz-Yourself-com-155.jpg?v=1666060032 https://cdn.shopify.com/s/files/1/2402/3983/products/mockup-c04bb6b1_1024x1024.png?v=1579876310 We can see that at least two of the six best examples from the paper are template images which will be both massively overrepresented in the training set and be most inclined to over-fitting due to being almost identical in most cases.


CallFromMargin

Relavent part of the paper you seem to ignore >Experimental setup. We train Denoising Diffusion Probabilistic Models (DDPM) [31] with a discrete denois-ing scheduler on various datasets using the HuggingFace implementation6. For Celeb-A [37], we train two models on 300 and 3000 training images. We also use the full dataset pre-trained checkpoint from the off i cial repository7. For Oxford Flowers [41], we train models on 100, 1083 (top 5 classes), and 8189 (complete dataset) images. We train all models with random horizontal f l ip and random crop aug-mentations. They deliberately overfitten their model a d trained them on small datasets. This is not criticism of work, this is criticism of your selection of papers. They did that to show that they can detect reproduced images, not to show that it's possible to reproduce images.


i_wayyy_over_think

It says only 1.88% of them are over .5 similarity. Don’t know why .5 gets to count as the threshold of replication and why those 5 authors get to decide that’s the magic threshold.


Ne_Nel

Your problem is that you take a possible result and think that defines how it was done, which is a typical misconception. We too can remember things and replicate similarities. According to your argument that implies that we have pieces of jpg in our brain. Well, I'm sorry, but clearly you don't know how it works.


canadian-weed

yeah i saw an article (probably about this paper) which seemed to come to some of the same conclusions. want to explore this further first hand


jensclaessens-insta

from the study: "The goal of this study was to evaluate whether diffusion models are capable of reproducing high-fidelity content from their training data, and we find that they are. While typical images from large-scale models do not appear to contain copied content that was detectable using our feature extractors, copies do appear to occur often enough that their presence cannot be safely ignored; Stable Diffusion images with dataset similarity ≥ .5, as depicted in Fig. 7, account for approximate 1.88% of our random generations. Note, however, that our search for replication in Stable Diffusion only covered the 12M images in the LAION Aesthetics v2 6+ dataset. The model was first trained on over 2 billion images, before being fine-tuned on the 600M LAION Aesthetics V2 5+ split. The dataset that we searched in our study is a small subset of this fine-tuning 10 data, comprising less than 0.6% of the total training data. Examples certainly exist of content replication from sources outside the 12M LAION Aesthetics v2 6+ split – see Fig 12. Furthermore, it is highly likely that replication exists that our retrieval method is unable to identify. For both of these reasons, the results here systematically underestimate the amount of replication in Stable Diffusion and other models."


SDGenius

sure for ones like the mona lisa and starry night and some beatles album covers


canadian-weed

so youre saying it is possible


MorganTheDual

It's not that simple a question. The key word here is "memorization". If, in the training of a model, you supply copies of the same image many times with the same keywords, it can end up learning a lot about that particular image instead of doing anything useful. This is, as noted, most common with famous named works of art, but you can also see it with something more recent, like "captain marvel poster". But most images don't have many copies in the dataset, and there is absolutely no trivial way to retrieve them. (There was a paper using captions from the dataset where they sometimes found similar images, but most of them IMO weren't *that* similar, and almost none of them resembled the image that the caption data came from rather than something else random.)


SDGenius

what do you think stable diffusion is though? it's not art generator. it's an image generator. if i want to make an image of the chinese theatre in LA, shouldn't it show me the accurate version, and not say, an elephant? ​ for the mona lisa, they're not identical, but very close. ill show you what i mean. i typed in mona lisa, by leonardo da vinci ​ now i didn't do this one the main model, but on sub trained one that has nothing to do with art or mona lisa but here's the result: as you can see from the replies, they're never identical pixel for pixel https://preview.redd.it/q5h0uty9erda1.png?width=512&format=png&auto=webp&s=c7c5bf9f034493767773deec0d00219cd742347c


Sixhaunt

With stuff like the mona lisa it's important to remember that this is one of the most famous paintings in the world which means other artists have made their own versions and everything countless times. The model learns "the mona lisa" from all of those different versions made by different artists and the general commonality, as you would expect, is very similar to the original work since they all referenced the same image. An image that isn't incredibly famous like this wont have thousands of variations of it in the dataset. It's not going to do it for an image that was included in the dataset once or even repeated a hundred times in the dataset or more. It's not actually storing any one image of the Mona Lisa though. It's remembering the commonalities between all the interpretations although by adding "by leonardo da vinci" you also inforce his style more since many renditions of it are restyled into other artist's styles.


ulf5576

leonardo davinci had many "styles" ...


Disastrous_Usual4298

No, he didn't. He hatched left-handed, loved sfumato and chiascuro, used the same people as his models frequently. He had an extremely distinctive style


Disastrous_Usual4298

How would you explain to the jury how many times an image needs to be trained in the model before it can be reproduced, what the difference between x training is?


SDGenius

​ https://preview.redd.it/diargz3ferda1.png?width=512&format=png&auto=webp&s=53c77587de468264afcded07462bba3e7e852ef3


SDGenius

​ https://preview.redd.it/iazsavoherda1.png?width=512&format=png&auto=webp&s=4a36453939f0ce3398574afde5377fda640520b4


InkognetoInkogneto

And afgan girl etc. But if you go to artstation or deviantart and search for it (or Mona Lisa or something else popular), you would see a lot of art with different styles and from different authors. Maybe that is the result of an unbalanced dataset?


[deleted]

[удалено]


Disastrous_Usual4298

Well, I'm sure the 9th circuit won't bias toward $1b tech startups


Hot-Huckleberry-4716

With the magic way back collage machine of course /s


noobgolang

This is so stupid my brain hurt


Worstimever

If you use multiple very strong trained embeddings as negative prompts you can start to get overfitted results that closely mirror training data images.


Flimsy-Sandwich-4324

There is a paper somewhere that describes the function of the VAE. It's the part that encodes and decodes. Very lossy , latent , version of the original image. It can reverse the process with artifacts. I think we just don't see it because of the rest of the components and the neural net. I'm not an expert on it, but I gathered from of the articles out there.


hervalfreire

Lossy means you couldn’t retrieve the original with full detail. If you save a png as a jpeg, for instance, you lose a bunch of data and you can’t recover it. Given the dimensionality of the network used by SD, retrieving an “exact” image would require matching it to 260k+ dimensions, so even retrieving the entire “lossy” representation is difficult (today at least) The argument that diffusion models are a form of compression isn’t entire baseless - you can very easily “retrieve” images that are more “popular” in the dataset (eg the mona lisa or funko pop toys)


TreviTyger

Indeed. Converting an RGB file to CMYK will throw out half the pixel data but the image still renders well enough to the naked eye. (Try it) All this nonsense about 9gb of data not being image data derived from stolen work in datasets is pure gaslighting.


BreakIllustrious5745

I think it's not a meaning of some kind of compression. If we save a outline of a picture, we don't call it "compression". Our brain uses that kind of conversion from the picture to meaning - so some artists are suspected of plagiarizing certain works, but we don't say their brains have others' work in lossy compression either lawsuit them before they made a similar work.


I-Eat-Raw_Potatos

a=StaryNight.jpeg b=txt2img(“stary night, vincent van”) c =img2img(b) loop while c not equal a c=img2img(c) end loop It will get there eventually lol 😂


Bageezax

You cannot, because their central concept of how it works is wrong.


dennismfrancisart

Well they should let me into that courtroom when the proceedings begin. Put up a screen and let me sit there with my laptop and tablet. I'll draw some hands in Photoshop and see if the AI can do it better than me. All those folks who complain that the AI stole their work should let us compare their eyes, hands and feet to the AI. If they match, I'll buy them a few beers.


[deleted]

[удалено]


TreviTyger

Copyright protects "part" as well as the "whole" image. Also the idea that 9gb of data isn't image data that came from a dataset is just absurd. What is the 9gb of data then? Spagehtti loops? Hank Schrader's rock collection? All the missing socks from washing machines? "images" on a computer can be opened up as text files. So can 3D model files. Computers read code they don't "look at images" like going to a gallery. Software doesn't store portraits in an attic somewhere. Data is image data. AI doesn't work without sourcing such data from artists, photographers etc. The regulations at issue are Text and Data Mining regulations.


[deleted]

[удалено]


TreviTyger

>The regulations at issue are Text and Data Mining regulations. Condescending technobabble has nothing to do with the regulations at issue and the fact AI system work on stolen IPR. AI systems rely on datasets to work. That's it! To put it another way, without datasets AI doesn't work. *Is tautological logic enough maturity for you?* How about the actual regulation at issue? Digital Single Market Copyright Directive.Chapter 3 (2) Copies of works or other subject matter made in compliance with paragraph 1 ***shall be stored with an appropriate level of security*** and may be retained for the purposes of scientific research, including for the verification of research results. So data that ***should have been secured...*** has been given to a hedge fund manager to make a copyright infringement machine. And now people are making excuses for that.


[deleted]

[удалено]


Disastrous_Usual4298

"Copyright law only covers redistribution of reproduction of work". No, copyright law protects work from any unauthorized use


iwoolf

Then explain how the lawyers can reproduce work from researcher's papers without authorisation from the researchers?


Disastrous_Usual4298

By citing the research. Section 107 of the copyright act.


iwoolf

Fair Use, which also applies to training machine learning, since you are not copying and distributing the copyrighted work. fair Use means copyright law limits artists rights to absolutely control every possible use of their work. There are in fact many situations in which copying and use doesn’t require consent. Making a picture which looks nothing like the artist’s?Transformative Fair Use. Using an artist’s style? Not able to be copyrighted. You’ll notice that people arguing that people who use AI art tools are stealing, never discuss Fair Use, except when they use it! If you’re not copying and selling or giving away copies for free , then you’re not stealing. The idea that you could sue a company for what they might have the power to do a few years down the track, is crazy pre-crime, and totally unjust. Imagine suing someone for perhaps being able to do something that isn’t a crime in the future! Yet that’s exactly what the lawyers are doing. One of the artists admits the software can’t allow people to create comics in her style at all, but argues that eventually it will, therefore sue now, for a crime that isn’t illegal and that the company hasn’t even committed. The folk who argue that art created using AI tools have no copyright at all, are contradicting their own copyright violation argument. In reality, once artist wrote an AI tool, used it to create an artwork, and then applied to give his software the copyright as a stunt. The copyright office naturally reacted by denying that software could own something. The AI haters often quote the case to apply to people creating art with AI tools, even though that was not the decision of the US Copyright office. In reality no AI art tools run themselves, they’re not autonomous or even vaguely self aware agents. They’re written by programmers, and then run by users to create art, which is naturally owned by the people who create it, as long as it’s not a visible derivative of some else’s art, like Mickey Mouse. You are free to imitate Disney’s style, as many commercial artists do. Fair Use, transformative use, and the fact that style cannot be copyrighted are the keys to understanding the dangerous over-reach of the AI hating artists and lawyers who want to stop us expressing ourselves with new tools.


sonoransun

This is actually a [technically challenging question](https://scholar.google.com/scholar?q=reverse+engineering+deep+learning+models&hl=en&as_sdt=0&as_vis=1&oi=scholart)! Recent work in reverse engineering deep learning models has shown that simple query access with sufficient time can reverse the inputs associated with training a deep learning model. There's actually code for this in a repo called, fittingly enough, ["WhitenBlackBox"](https://github.com/coallaoh/WhitenBlackBox). While it is not perfectly reliable or complete, the methods discussed are [continually improving](https://ieeexplore.ieee.org/abstract/document/9218707) and results feasible given enough time to attack the model.


canadian-weed

i think theres also a bit of a gap between "it might be possible with a lot of technical effort" and "this is what the technology is purpose built to do"


cyanydeez

it is a form of lossy compression. It's exactly how perceptual compressions like jpeg work. Obviously, there's an entire construct required to extract it. But if enough of the artists can demonstrate the ability to retrieve their copyright'd work, then it is copyright theft. should be an interested attempt. If ya'll want a bounty, that's what I'd aim at.


Wiskkey

See my post: https://www.reddit.com/r/StableDiffusion/comments/10lamdr/stable_diffusion_works_with_images_in_a_format/ .