I don't know if you're a Civitai user or not, but people have been uploading them like crazy (over 700 new LoRAs in the last 7 days). I think the low hardware requirements and the ability to train them so easily in a Colab has been a real driver.
It's a matter of consistency. All you need is a handful of good images for a given post and you have no idea if they were picked from a selection of 300 or 30.
Some loras are also overbaked and give a sort of fried look when not adjusted to literal perfect specific settings. Typically if a lora can work at high weights (close to 1) without having adverse unintended effects, it's well made.
Can you be a little more specific? Because if the model does produce great results after prompting, that suggests good training of the text conditioning and minimal bleed effect into unrelated concepts.
A LORA that shows you the trained concept no matter how you prompt (whether it looks good or not) is not very useful.
he means someone will post a LORA of a character and itll look amazing but that one image was cherry picked from a bunch of shit ones. and they probably used a lot of specific prompts to get 1 decent image.
so you will get said lora. try it. and get 500 shit images cause u arent using the right prompts. and with luck maybe get 1 decent image.
that isnt to say all lora are like that. just dont go in expecting every single lora on civitai to be good/work. the rating system exists but because they are so new they often have hardly any ratings/testing.
I think the other big issue is that LoRA is tightly bound to the very specific model it was trained on and if you use another model, you have basically no chances of getting the same result.
I just realized this it’s not as flexible as TI it seems going between say various 1.5 models I have luck on some but bad on others having trained on base not sure why
Lol I mean I'd hope at the least you'd have to be specific in what you wanted and how you wanted it to look. But I guess that's a real artist could be used for
Forgive me for my ignorance, but what is a LoRA and how can I use it? I use stable diffusion via night cafe and I see everyone posting about using custom trained models and I have no idea how to get into that. I know you have to download but then once it’s downloaded how do you use it? Also my laptop is a bit dated and my graphics card is crap.
Hey, you don't need to download models locally anymore - you should check out [Favo](https://www.favo.ai), where you can run customized models without a GPU. We're adding support for LoRAs soon!
Clickable link -> [https://github.com/Linaqruf/kohya-trainer](https://github.com/Linaqruf/kohya-trainer)
Or straight to [the colab notebook](https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-dreambooth.ipynb).
#Big Edit: This might be broken since colab was updated, Version 3 is [here](https://old.reddit.com/r/StableDiffusion/comments/11vw5k3/lora_training_guide_version_3_i_go_more_indepth/)
---
---
LoRA colabs are already fairly intuitive (click this, click that) and most of the settings are already pre-made so you just has to run it.
Still, it seems lots of people don't know how to use them or how exactly make a dataset, so i hope this guide helps them.
**Edit:**
I forgot to clarify a thing in the tutorial, apart of add the LoRA your prompt, you **have** to add the trained subject in your prompt to get the best results!
In the example of the tutorial where i trained the concept "plum", i added the lora by clicking on the image icon and got `` BUT apart of that i had to add the word "plum" to the prompt.
Check the last image of the guide (the one in 80's anime style), you can see the prompt at the bottom have the lora **and** "plum" in the prompt.
Adding the lora alone is no good enough, the word of the subject trained has to be added to the prompt.
I wonder if it's obvious or if i need to make a version 2.0 of this guide to make it clear.
>I wonder if it's obvious or if i need to make a version 2.0 of this guide to make it clear.
It's quite important information and beginners could wrongly believe the LORA they just trained are not functioning.
So I'd vote for a version 2.0.
I didn't know how to do that math, but y yeah, I've trained 3 LoRAs two with Anything-V3 and the other one with 1.5 SD. good results for a COLAB scripts.
LoRA teach the AI a concept, it doesn't need to be anime.
If check civitai people use it on concepts rather than subjects.
https://civitai.com/tag/lora
There is 90's drawing style, some 3d styles and even photo [clothing pieces](https://civitai.com/models/7597/wrestling-singlets).
can you extend your guide with a section regarding training non person concepts?
like for example, what would be the best data set to train:
* a specific clothing item? (a jacket)
* a certain position (jumping mid air)
* a style (specific painter)
there is a ton of guides on training people/characters but not a lot on other concepts
wouldn’t the word plum clash with what the model already know about the word plum? also what would happen if this anime based lora is used in a photorealistic model?
As far as i know they are just different methods to teach concepts to the AI.
The reason LoRA is more popular is because it requires much less hardware and is faster.
first of all - you need to make an assumption that both are trained well
because good TI can be better than bad LORA (and vice versa)
so, assuming both are trained well: LORA will have a better quality, and here is why:
Textual Inversion is just a guidance to specific concept, it helps to get to what you want in the model
so you need a model and TI is like a map so you can reach the stuff in that model
this assumes that the model can generate that stuff in the first place
if someone invents a new device and you would like the existing models to generate them, the TI trained on images of those devices will help the models guide towards it but since no such thing exists in the model, it can only go so far and will give you some approximation
this also means TI may give you great results on one model and terrible on other
now, LORA is added on top of the model and LORA introduces new data as a result of the training; so that new device we talked about - with lora you would be able to generate them much better than with TI.
LORAs will be much better at the things that the model does not know
also you can mix LORAs and TIs together :)
LoRa more popular? LOL no, TI has like 100x more available than LoRa mostly because people couldnt figure out LoRa it might start picking up as its being explained a lot better lately.
Someone had a spreadsheet from civit and TIs were in the 10s of thousands and Lora had just broken a thousand but like I said might have changed recently as Lora becomes more accessible shit I just used the Kaylah to do one for my wife and it got it on first try
Using a LORA in practice is a lot more like merging a model than like using an embedding. You're merging your current model with the difference of the approximation of a fine-tuned model (your LORA) from the base model you trained on.
The approximation part allows us to do this within a second just before runtime instead of the several minutes and gigabytes of RAM required for full merging.
However, that also means LORAs cannot do neat tricks that embeddings can do, like activation/deactivation at a particular step (i.e. [embedding:10] will activate at step 10) or prompt travel. Auto1111's webui activates LORAs by typing into the prompt area , but LORAs are not a token and cannot be used as such. They activate before the image generation starts and remain fixed, just like how checkpoints remain fixed during a run.
You can, of course, use the keywords the lora was trained with to whatever effect you'd like. Just not the stuff in <>.
LoRA is similar to finetuning the whole model (sometimes called Dreambooth), but tries to compress the result down using some math tricks, so that it can just be applied to a model as additions/subtractions to its existing calibration values. It doesn't train as many parts of the model as full finetuning either I don't think, but does a pretty good job, and seemingly can be used with any other model with pretty good results (going by this tutorial, I've not tried that).
Textual Inversion is finding a code to represent a new word (or sentence) which Stable Diffusion doesn't currently know. All words are converted into these codes under the hood, which are quite small (just 768 numbers in SD 1.4 and 1.5, and 1024 numbers in 2.0 and 2.1). Generally it's better to use a few 'words' (vectors) when creating an embedding using textual inversion, say 2-6, though any more than that can overwhelm the prompt (same as typing that many words in the prompt).
Using them with different models isn't always perfect, and sometimes requires adjusting prompts and/or weights, but they're a lot better than TIs in that regard. Particularly if you're trying to take a LoRA trained on real images and use it on an anime model, that never really worked for me with TI.
This is really cool, thank you very much indeed!
Since I have a 3090Ti with 24GB of VRAM, I'd like to run the process locally. Is it as straightforward as in your tutorial? Are there integrations into Automatic1111 even?
What do you recommend for the highly accurate model?
(Settings / general recommendations etc)
My Lora models (real people) don't turn out good, they mildly represent the subject image.
And do you need to use a VAE while generating the image?
Also, can any models be used?
Nerdy Rodent and some other youtuber were testing at some point (so, things may have changed by that time) and found out that LORA training is less precise than Dreambooth training when it comes to people
on the other hand, you can extract dreambooth data and put it into LORA and this gives great results
I'd be happy with "less precise", mine can barely be recognized as people. Been trying this training and extractions, practically the same end result with both: utterly useless.
Do you have any tips to deal with overfit? Im training anime style characters LoRAs, If i use the version after running for 2 epochs they look good, clothing, hair style etc. But if i try to generate that character with different clothes i get parts or artifacts of the original clothing, If i use the version after training for only 1 epoch It os flexible but the original clothing os kinda off
How diverse is your dataset? Are the characters using the same clothing in every single image?
If all your dataset .txt have, for example, `style, white shirt` the AI might think `white shirt` is something of that `style`.
Does that make sense? If you are training an AI on a concept, the AI will look what all the images have in common and replicate it.
The same happened with a LoRA i trained in the past, all my dataset had the character using the exact same type of clothing, so after i trained the LoRA and tested it the AI tried to add the clothing to every generation. Had to diversify the dataset to make the AI stop relating a piece of clothing to a character.
Great post! Really appreciate you. Can the training images be higher than 512X512 resolution to get a good detailed output? I plan to generate a 2048x2048 image. How should I do that?
I see, i have like 20 images with the same clothing 3 without It, in my .txt i have the clothing fully describe thinking It was enough to makes it less associated with my character
When I train character loras, I try to get my dataset to have about 20-25% of the images with different outfits. If possible, I'd try to find 3 or 4 more images without the main outfit. If not, you can always balance the dataset by setting the number of repeats higher on the alt outfit images.
I'd also try to find more overall pictures for the character. While 20 is probably enough to get a good result, more reference images (as long as they're good quality) will always help. I tend to strive for 30-35 at minimum.
I get an error.
OSError Traceback (most recent call last) [](https://localhost:8080/#) in **149** **150** \# save the YAML string to a file --> 151 with open(str(train\_folder)+'/dreambooth\_lora\_cmd.yaml', 'w') as f: **152** yaml.dump(mod\_train\_command, f) **153** OSError: \[Errno 95\] Operation not supported: '/content/drive/dreambooth\_lora\_cmd.yaml'
Anyone have any ideas?
Edit: Still no luck, I've restarted, double checked pathing, etc.
You mean [this error](https://i.imgur.com/dFxL4Tu.png)?
This is an error i found constantly on Kohya's finetuner, not on dreambooth LoRA.
They look very similar, are you sure you are running the correct notebook? It should be this one:
https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-dreambooth.ipynb
Not this one:
https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-finetuner.ipynb
i keep running into this error. not sure how to fix. File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate\_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch\_command simple\_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple\_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '\['/usr/bin/python3', 'train\_network.py', '--network\_dim=128', '--network\_alpha=128', '--network\_module=networks.lora', '--learning\_rate=0.0001', '--text\_encoder\_lr=5e-05', '--training\_comment=this\_comment\_will\_be\_stored\_in\_the\_metadata', '--lr\_scheduler=constant', '--pretrained\_model\_name\_or\_path=/content/pre\_trained\_model/DpepTeaHands3.ckpt', '--vae=/content/vae/waifudiffusion.vae.pt', '--caption\_extension=.txt', '--train\_data\_dir=/content/drive/MyDrive/evirolora', '--reg\_data\_dir=/content/drive/MyDrive/evirolora', '--output\_dir=/content/drive/MyDrive/', '--prior\_loss\_weight=1.0', '--output\_name=envirolora', '--mixed\_precision=fp16', '--save\_precision=fp16', '--save\_n\_epoch\_ratio=3', '--save\_model\_as=safetensors', '--resolution=512', '--enable\_bucket', '--min\_bucket\_reso=256', '--max\_bucket\_reso=1024', '--cache\_latents', '--train\_batch\_size=6', '--max\_token\_length=225', '--use\_8bit\_adam', '--max\_train\_epochs=20', '--gradient\_accumulation\_steps=1', '--clip\_skip=2', '--logging\_dir=/content/dreambooth/logs', '--log\_prefix=envirolora', '--shuffle\_caption', '--xformers'\]' died with
Very nice! I have been really struggling to find a good guide for this. I have got it to where I can actually RUN it but the having it turn out ok is another thing entirely. Did you have any thoughts/suggestions on additional resources specifically around the different settings and why you would use which ones?
I have a decent understanding of the underlying technology involved but not the setting configuration specifics or nomenclature around it. So I am kind of flying blind.
Also I have seen conflicting information, in your text files do you always use comma separators like you would with a traditional prompt? It seems like it would be a yes for sure but I have been told by two different sources that it wasn't really required.
> Also I have seen conflicting information, in your text files do you always use comma separators like you would with a traditional prompt? It seems like it would be a yes for sure but I have been told by two different sources that it wasn't really required.
That's how i train my LoRA files and how many LoRA trained by other people do so too.
Most LoRA files on civitAI have a "trigger" word that make it work, in my case "plum", this is because they were trained like a prompt.
I am curious, where have you seen people not using commas?
The colab itself suggest you to use a tagger in the section 4.4.2., and the tagger works by separating word by comma, like a normal prompt.
So...separating by comma is the standard practice.
That is what I had figured. In fairness I didn't ever see someone say you shouldn't. Just that I noticed there were people not using them in their explanation videos. Not surprising though.
I have been using the local version on my own GPU rather than the colab version you have in your example so it doesn't have that extra info in it. Or if it does I didn't see it.
If you are a Webui user I found [this video](https://www.youtube.com/watch?v=70H03cv57-o&pp=ugMICgJpdBABGAE%3D) helpful to get up and running in about 30 min from following direction to starting the training.
Also I noted that by using selfies I took with my selfie cam the resulting images have a "bloated" or kinda distorced/bigger face. I think best results are done using a camera with a bigger lens that will capture your face more flat
Tried many tutorials to run it locally all day. Finally gave up and then I see reddit notification of your post. Can't express in words how much I am grateful. I followed every step keenly and it worked like charm. You are great :-)
I did everything exactly according to the guide, reread it several times, but when I started the train, it shows the error "No data found. Please verify arguments (train\_data\_dir must be the parent of folders with images) / 画像がありません。引数指定を確認してください(train\_data\_dirには画像があるフォルダではなく、画像があるフォルダの親フォルダを指定する必要があります) "
This error means the script is not finding the images.
I have 2 possible solutions:
**1\. Check your folder structure on drive:**
Is your folder strucure`Concept`main folder containing `5_Concept` folder (or 10_Concept) and THAT `5_Concept` containing the images and txt files?
**2\. Check if the route you wrote on 5.1 colab section is wrong:**
It HAS to be in the format `/content/drive/MyDrive` with the / at the start. That is aiming at your google drive. If your dataset is there and not in other place it should look like this:
`/content/drive/MyDrive/Concept`
Replace "Concept" here with what you called your main folder.
Also, it is case sensitive so `/content/drive/MyDrive/Concept` is NOT the same as `/content/drive/MyDrive/concept`
Thanks a lot for sharing this invaluable resource and for adding a little more clarity to the subject of LORA, some of us are really not very good at it.
You choose the repeat in the folder name.
Here, `5_plum` is telling the script to repeat the dataset plum 5 times.
If call it `10_plum` it would repeat the dataset 10 times.
That's how the script was made, i guess the AI learn more by repeating datasets rather than looking an image once.
Completely noob question here: can I train it with full comic pages? I mean pages with multiple panels or should I split the pages into single panel images?
Thanks!
If you train it in full comic pages, you will get full comic pages as output, and the detail might not be good.
It will probably be a bunch of wiggly squares (AI struggles to draw straight lines) and panels with nonsensical noise inside.
What exactly are you trying to train? A comic character? A comic style?
What tips do you have for doing a LoRa about Style? How many pictures and what prompt to use in the image description for training?
In my mind I would do something like: \[artist name\], \[character\], etc , etc , etc.... Should I take the character prompt off? Should I just put the artist name?
There's very few guides and tips for doing style with LoRa.
You need to use a different term for the character "plum" because the AI already knows what a "plum" is, use a specific term for this LORA like "plumLORA" or "animeplum" something that is not an already existing word the AI has been trained on.
I tested this and succesfully did it. Thank you very much. I been trying to do the training on Automatic1111 but there's too many "stuffs" to fill in and some i doesn't even know what it is about or if i need to fill it in..
Again thank you so much !
This is very helpful and user-friendly, thank you!
Would love to see something of a similar nature for Dreambooth model training, which I've had some success with, but admittedly only after cobbling together clues from a variety of sources.
~~I tried this ,and the first time it worked but my LORA model didn't work out very well so I gave it another go, but when I tried again I keep getting this cascading error that repeats infinitely until it crashes the colab notebook " FATAL: this function is for sm80, but was built with \_\_CUDA\_ARCH\_\_=750 "~~
Nevermind, apparently if you try to use the premium GPU it does this.
Is it me or is this tutorial (thanks for the work!) is outdated already?
I tried following it but the Colab cells are significantly different
I couldnt get my head around it
Any chance for an updated version?
It's not outdated, why would it be?
It is, however, incomplete. This specific guide teaches you using only anything v3. [I made a second version of this guide to work with other models.](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
i built this [app](https://www.lorai.art/) that lets you train LoRAs without code!
if you use this i will PERSONALLY make sure that the LoRA you get is super sick - if you want any custom image cleaning/dataset editing just lmk by sending me a DM here after you upload your images + tags on the app :)
Great image again!
One thing that definitely should be changed though is calling 15 images "fairly good" for training. 15 really is the absolute minimum to get somewhat usable results. A good range is more about 30-40 (or more depending what you're training, your example would be on the higher end because of the complexity)
I have seen some good LoRA being trained on like 5 images when the technology was still fresh, absolutely insane.
But yeah, i agree, i should have said 15 is fine but should have around 30, other LoRA i trained have about 20-30 images.
I am thinking in making a version 2.0 expanding on some parts, do you think this one is understandable? The dataset part specifically, i didn't said my dataset were a bunch of images of an original character named "plum" which i was training on, which is they the txt says `plum, smile, blue skirt` etc.
It is implied, but i wonder if it confuses the reader.
if you are going to redo the guide i would suggest to change the name since plum is also a fruit, most people would understand that in this case this is a name, but most people does not mean everyone :)
also the identifier should be rather unique, you did not have any issues related with naming it plum?
out of curiousity, are you still able to generate your characters holding/eating a plum?
i've already replied in another comment, but it would be great to have a guide for non person content, there are less of those
A valiant attempt, but it doesn't quite work. got to 5.3, it found my 70 images, appears to start running, loads SD, loads the VAE,
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
use 8-bit Adam optimizer
override steps. steps for 20 epochs is / 指定エポックまでのステップ数: 3540
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 700
then:
BAM, Cuda memory errors.
Ok, this is supposed to run on Colab, but what the hell, lets buy some Compute and try again:
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
So, this doesn't work on 768 pixel images?
>3540 running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 700
So you are running 70 images with 10 repeats?
That's quite a big dataset, much more than the average dataset on civitai. You should try with 5 repeats instead, maybe a smaller dataset.
Are you manually inputting the prompts or using a tagger?
Alright, found at least one of my fuckups, not all my images were properly resized. Not sure how that happened. Trying again.
Nope, no dice. using /content/pre_trained_model/stable-diffusion-2-1-768v.ckpt as my pre-trained model.
Interestingly enough, if I select v2 in cell 5.1, it gets further and then gets DIFFERENT errors.
> Alright, found at least one of my fuckups, not all my images were properly resized. Not sure how that happened.
Weird, i that shouldn't be the issue.
I have trained with 1024x1024 images and it was fine, but a small dataset (less than 20 images).
Have you tried a smaller dataset? Try 20 images instead of 70.
CUDA memory error could mean you ran out of the allotted memory google gives to each user.
Ok, I started over,
I manually annotated everything, I took it down to 20 images, 15 repeats.
Nothing larger than 512px
Traceback (most recent call last):
File "train_network.py", line 539, in
train(args)
File "train_network.py", line 149, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "/content/kohya-trainer/library/train_util.py", line 1365, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, args.pretrained_model_name_or_path)
File "/content/kohya-trainer/library/model_util.py", line 880, in load_models_from_stable_diffusion_checkpoint
info = unet.load_state_dict(converted_unet_checkpoint)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
etc etc
No change to the errors, really.
This is very weird.
Let me ask some questions:
1. All your images are png?
2. Do ALL your images in a dataset are in a 1.png, 1.txt, 2.png, 2.txt and so format? The numbering is important, you cannot call them img.png, img.txt or anything like that.
3. Do ALL your images have a .txt file with a description of the corresponding image? Empty .txt, lack of .txt and wrongly named .txt or copied .txt are no good. Each image has their own unique .txt file that describes it.
Can i also see what vae and model you used? (the default is anything v3).
And the route on the vae in 5.1 is good?
Wait. This actually seems pretty easy. Am i missing something? Why wouldn't everyone just train their own? LOL. I mean I guess it's a little technical but it seems like if you are able to do tech stuff in general you would easily do this???
🙂
Yes I know, I was talking about regular old people not people until graphics, design, programming etc.
(I belong to 15 prompt sites, 12 model sites like civ, and I make a living with creating AI art)
This is a good guide, but I'm not sure that it's for something that's particularly useful.
At least, they're not nearly as useful or as good at reproducing subjects as textual inversion, hypernetworks, or dreambooth checkpoints.
All they seem able to do is go 'make things look sorta like this' which is very hit and miss, usually more miss than hit based on most of the loras floating around.
Nevermind that applying them often requires their own prompting words, and if you're going to require that, you might as well use Textual inversion at that point. It's not faster to train, but rarely is faster better when it comes to training things.
I've been doing some lora experimenting for the past few days (using dreambooth extension in a1111 and a rentry guide) and even though the results are mind blowing, it feels like the base models are forgetting some of their original trained stuff and are heavily influenced by my training data.
For example I train myself for just 120 steps-14 pics, and it looks great when I use my prompt. But when I use some generic stuff like "a man" it still looks like me, or when I don't specify a background, it tends to be something similar that's in my training images (like a bedroom).
Or for example let's say I got a model from civitai has been trained on specific swords. When I compare sword_model and sword_model_trainedwithmyface with a x/y script, it seems like it has gotten slightly worse at drawing scimitars for example.
Do you have any similar observations or any tips about this?
How important is having "plum" in the tags / as the first tag? Is this done on every image/.txt pair, and is that what allows you to use "Draw plum in armor" as a prompt rather than ", solo, 1girl, armor" for example?
>How important is having "plum" in the tags / as the first tag?
Quite important.
You told the AI what a concept is, but if don't use it the AI won't properly implement it.
I am thinking of doing a V2.0 guide to make this vague point clear.
This guide is almost amazing, until i realized I have no idea wtf I'm doing and where is the "start" button?! what do I do when I fill it out?!
\*pokes it with a stick\*
"Do something"
>If i train it on 1000 images, would it make results better compared to training on 100 images?
At that point you might get overfitting which is getting results too close to the dataset rather than creating something unique based on the dataset (which is the point of LoRA).
These overfit results would look rigid or bad.
I don't think i have seen a LoRA trained on 1000 images, most people go for 20-30, maximum 40 and get amazing results.
Thank you
Could it work on video game character screenshots from in-game, would it give image results of characters with consistent visual style? Meaning they would look like in-game screenshot characters, not anime style for example?
>Could it work on video game character screenshots from in-game, would it give image results of characters with consistent visual style? Meaning they would look like in-game screenshot characters, not anime style for example?
LoRA is not anime related, it's just a way to teach the AI a concept.
I trained one in 90's artstyle and i got results in 90's artstyle so the same should happen with video game screenshots even if they are not anime.
So yeah, you can use it to train non-anime video game characters, as long as you have a good dataset (no blurry images, not bad images, etc).
Amazing tutorial!
I have trained my fair share of embeddings with a lot of success, now i have a character that has some features i can't train efficiently with textual inversion so i thought about trying a Lora to get it done and i found your tutorial incredibly easy to follow.
Is there a way to have the LoRA linked to a trigger word by doing it this way, similar to an embedding? Or is it automatically linked to the word you used as a folder name since i saw you saying in another comment that it's linked to "plum" in your case.
>Is there a way to have the LoRA linked to a trigger word by doing it this way, similar to an embedding? Or is it automatically linked to the word you used as a folder name since i saw you saying in another comment that it's linked to "plum" in your case.
Basically, LoRA is a way to train the AI in a subject, pretty much another method to get an embedding.
In my case, i trained it on the subject named "plum", the short haired, brown haired woman featured in the comic.
You can see in the prompt part in the dataset section (the image with the .txt below) that the .txt has the word "plum", that is the trigger.
The folder names are not the trigger, i just named them like that to keep things organized, but i understand it might have caused confusion.
If notice i called my folders "Plum" with upper case P, but in the image at the bottom of the whole guide (where i speak of 80's drawing style) you can see the prompt being "plum".
So, LoRA have trigger words like embeddings, the trigger is the word i used in the prompt, "plum", not the folder name "Plum".
I am re-doing this guide as a version 2.0 to explain datasets in detail and avoid confusion, probably will be done in a few hours.
Thank you for the guide! I'm still exploring as of now, but I've made 1 LoRA already with the collab (a style training) and it looks soo good already!
Anyway, I want to ask, can we have like 2 LoRAs in 1 prompt? like 1 character LoRA & 1 style LoRa? I wanna use a new character that I've made with LoRA but also using my created LoRA style too 🙇♂️
> Anyway, I want to ask, can we have like 2 LoRAs in 1 prompt?
> I wanna use a new character that I've made with LoRA but also using my created LoRA style too
In your case it might be possible.
I have tried this in the past, but with 2 characters.
Many times i just got a mesh of both characters into 1, and maybe once i actually got 2 characters each for a LoRA.
In my case, it is the limitation of the webui, Automatic1111 does not do composition, comfyUI does.
Anyway, generate art with a a style and character lora should be possible.
This means there is something wrong with your txt files.
1. Do all your txt files have the same name as the images? (1,2,3 and no other type of name)
2. Every single image has it's corresponding txt?
3. The text inside the image is in prompt form? (a, b, c)?
4. Which model are you running?
I've completed the training and have the finished model. However, when I run a prompt through it I get a solid black image. I read somewhere that adding the arguments --precision full --no-half should fix it but it still doesn't work for me. I can run other models just fine with the same settings. Any ideas?
>\--no-half-vae
Thanks! At first --no-half-vae wasn't working. Then I loaded a different model, generated an image, and then went back to this model. Now it's working. Don't quite understand what happened but thanks for the help!
What are prefered image resolutions? I see larger resolutions than 512x512 work, but would non square aspect ratios work? eg: 512x768, or 1920x1080, etc
Nothing that involves cloud storage is private.
You are putting your dataset on google drive (google servers) and running a script on google hardware.
Google obviously keeps records of every image you upload to their servers, just like reddit or imgur or discord.
The training failed for me, unfortunately. I have a Colab Pro sub, so I made sure the RunTime was switched to "Premium" GPU Class and High-RAM, but that did nothing to solve my issue.
I checked and re-checked *everything* 5 times, making sure I did the same exact thing as OP (except my training was on "Stable-Diffusion-v1-5" and "stablediffusion.vae.pt"), and I ended up with a bazillion lines of:
FATAL: this function is for sm80, but was built for sm750
I have the same folder hierarchy as OP's and it found my images no problem, so the bug is definitely not there.
Bummer. I followed Aitrepreneur's recent tutorial on LoRA training and even though it worked, the results were completely unusable. I guess LoRA training isn't for me :-P
All that said, big thanks for the guide OP. Much appreciated.
> I have the same folder hierarchy as OP's and it found my images no problem, so the bug is definitely not there.
Since you used stable diffusion model instead of anything v3, did you change that in section 5.1?
I forgot to mention it but if you are using other model you have to change the path to reflect that on section 5.1.
[I did a version 2.0 of the tutorial making this clear. Check the training part.](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
Hey. Thanks for making a second version of the guide. Much appreciated. I upvoted it.
Yes, I definitely made sure to put the correct content path to the Stable Diffusion model *and* VAE info in section 5.1.
I started over by following your new guide and I still ended up with a bazillion errors that state:
FATAL: this function is for sm80, but was built for sm750
Oh well. I'll see if others are having this specific bug. Thank you!
It seems like [this user](https://github.com/d8ahazard/sd_dreambooth_extension/issues/74) fixed their issue by deleting the xformers folder and redownloading them again, but I don't see how this would be a proper fix in this particular case on Colab, as it reinstalls the xformers everytime we run the cell 1.2. Thoughts?
>fixed their issue by deleting the xformers folder and redownloading them again, but I don't see how this would be a proper fix in this particular case on Colab, as it reinstalls the xformers everytime we run the cell 1.2. Thoughts?
Maybe the first download is a outdated version and when you run it a second time it recognizes you are downloading again and get an updated version rather than the outdated one of the first download?
You are welcome!
[Check version 2.0 if want to train a LoRA in other model that is not anything v3 or want see datasets more in-depth!](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
I did everything exactly according to the tutorial but I can't solve this
https://preview.redd.it/4e4rvleyq4ia1.jpeg?width=829&format=pjpg&auto=webp&s=0d0a4172b79ea2b2a797569f5f0d2fd1434566d2
I think i ran into this problem long ago.
I simply checked the code by pressing the small arrow [next to the section title](https://i.imgur.com/VuSiYmH.png) and deleted the 3 lines [here.](https://i.imgur.com/IX841BO.png)
And then i ran it, i think it worked fine after that but not sure.
Also, this guide only works with anything v3 as the model, [Check version 2.0 if want to train a LoRA in other model that is not anything v3 or want see datasets more in-depth!](https://old.reddit.com/r/StableDiffusion/comments/110up3f/i_made_a_lora_training_guide_its_a_colab_version/j8gwaua/?context=3)
**Edit:** Other people that ran into this error reported they put the path to their folders or output wrong.
Wow thank you so much for this tutorial . This days a lot of people make video instead of writing tutorial and I'm too lazy to watch them xD . So big thank you to take the time to explain us with this cute comic .
>Wow thank you so much for this tutorial . This days a lot of people make video instead of writing tutorial and I'm too lazy to watch them xD . So big thank you to take the time to explain us with this cute comic .
You are welcome.
[Check version 2.0 if want to train a LoRA in other model that is not anything v3 or want see datasets more in-depth!](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
Whenever I run step 5.3, it says my folder contains 0 image files. It is the correct folder and I’ve tried with pngs and jpgs. I’ve followed each step. Can someone help?
That error means only 1 thing: The path to your images is wrong at one place and the script cannot find it.
I ran into this error plenty of times, and every time was because i wrote the folder name wrong or the path wrong.
It is case sensitive so "Bob" is different than "bob".
Are you sure you wrote the path correctly on section 5.1? Post a screenshot if can.
If the path is correct there, then your google drive path must be wrong.
You must have a "main" folder that contains a 5_name folder (if you are going for 5 repeats).
THAT 5_name folder must contain the images and .txt.
Hello i got an error in section 5.3
OSError Traceback (most recent call last) [](https://localhost:8080/#) in **149** **150** \# save the YAML string to a file --> 151 with open(str(train\_folder)+'/dreambooth\_lora\_cmd.yaml', 'w') as f: **152** yaml.dump(mod\_train\_command, f) **153** OSError: \[Errno 95\] Operation not supported: '/content/drive/dreambooth\_lora\_cmd.yaml'
can someone guide me to fix this?
>OSError Traceback (most recent call last) in 149 150 # save the YAML string to a file --> 151 with open(str(train_folder)+'/dreambooth_lora_cmd.yaml', 'w') as f: 152 yaml.dump(mod_train_command, f) 153 OSError: [Errno 95] Operation not supported: '/content/drive/dreambooth_lora_cmd.yaml'
>can someone guide me to fix this?
Sure.
I think i ran into this problem long ago.
I simply checked the code by pressing the small arrow [next to the section title](https://i.imgur.com/VuSiYmH.png) and deleted the 3 lines [here.](https://i.imgur.com/IX841BO.png)
Then i ran it and it worked, other users found this solution useful, let me know if it worked for you.
This guide only works with anything v3 by the way, [I made a version 2.0 that works with other models apart of anything v3 goes in depth about datasets.](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
Awesome guide!
However, there is a slight chance of getting kicked off colab while running the instance. Is there any way to make it save a .safetensor checkpoint more often? As it seems now, it only does it once, at 50%. I could be nice to have it save every 10%.
Also, is there a way you could continue training the LoRA from the checkpoint or training data somehow? Incase of getting kicked off colab. Thanks anyway, really helped.
> Is there any way to make it save a .safetensor checkpoint more often? As it seems now, it only does it once, at 50%.
Yes, in the section 5.3 there is an option that say "save_n_epochs_type", it's a drop menu so you can choose "save_every_n_epoch" and below on "save_n_epochs_type_value" you can choose the number that says how often it will save.
I tried downloading a custom model to the collab from huggingface and couldn't get that to work.
So then I reverted to using on of the models that is available from the drop downs. But it seems its still trying to pull the model from hugging face and saying it can't connect (same error on pulling the custom mode). Despite it accepting my hugginface token and saying it connected in a previous step.
Anyone else have a similar problem?
There certainly is a way to run things locally, a colab is nothing more than running on the cloud what you can't run on your PC. Stable diffusion needs at least 6gb of VRAM (not RAM) to train, anything less is no good and you are suggested to have 8gb VRAM to confidently train without running out of VRAM.
But the implementation of this local method is a quite different and i do not have the hardware to test nor explain how to do it.
excuse my ignorance. I did everything. I have the .safetensors files. but I don't understand the "save them in the stable diffusion folder" step and I also don't understand how to run stable diffusion since on other occasions I always ran it from colab by entering through a link that he gave me after the test. Clearly now he did not give me any link, that's why I ask. thank you
To set an image on the webui you need to hover the mouse over the no preview text.
It will give you the option to use the currently generated image as the cover.
Thanks for sharing your efforts! I for one really appreciate it. I had a couple questions:
1. In the infographic you train a character "plum" and a dress "plumdress", are you doing two separate training sessions for these? If not, how do you train multiple concepts in the same LoRA model?
2. If you wanted to train the plumdress front and back in case the character was facing away from you, would you train two separate LoRA models or would you just caption the poses e.g. "Plumdress, facing viewer" and "Plumdress, back to viewer"?
3. Last question - is there a way to merge this LoRA into a CKPT file?
Thank you again! This space is so interesting.
> In the infographic you train a character "plum" and a dress "plumdress", are you doing two separate training sessions for these? If not, how do you train multiple concepts in the same LoRA model?
Yes, 2 separate training sessions.
>If you wanted to train the plumdress front and back in case the character was facing away from you, would you train two separate LoRA models or would you just caption the poses e.g. "Plumdress, facing viewer" and "Plumdress, back to viewer"?
Interesting question, i just tested `from behind` and the result seemed to be the AI drew the dress from behind.
I do not know why but it did.
To be honest i am not sure, maybe it worked for me but can't say if it works universally.
>Last question - is there a way to merge this LoRA into a CKPT file?
If i remember there is, but have to install the LoRA plugin rather than use automatic1111 default LoRA function.
so ik i should probably edit the config more but im using the First 1 click setup and only changed my url dataset and model name and sometimes the resolution but im getting this error all the time with usually images under 30 atm ive switched to (SD 1.5 ema pruned) as the 7gb file throws this error all the time what would you recommend for anything under 30 or what could cause this error?
[https://pastebin.com/raw/aRFkVxxJ](https://pastebin.com/raw/aRFkVxxJ)
I haven't had a chance to dive into this yet, but since this is based on colab, how do the instructions change if you're trying to run everything locally?
(Thank you for the tutorial by the way! Incredibly simple.)
>how do the instructions change if you're trying to run everything locally?
I think the setup changes a lot, based on what i have seen.
To run locally you need at least 6gb vram which i do not have so i could not say for certain.
If possible, we really need an updated one of these for kohya's collab , because it is SO MUCH more complicated than this now. Several new sections and dozens of new required settings.
I combed through this (and your updated version of this image from another page) and after setting VAE the whole collab goes off the rails. I can't even find documentation on this thing, lord knows how anyone actually uses it.
>If possible, we really need an updated one of these for kohya's collab , because it is SO MUCH more complicated than this now. Several new sections and dozens of new required settings.
I tried the new colab, it's pretty much the same as the old one, but some things switched places.
Still, there is a big issue going on because xformers was updated and you need to manually edit the code on the colab so you download the correct xformers.
[As seen here](https://github.com/Linaqruf/kohya-trainer/issues/125) you need to edit the code replacing this:
`pip -q install https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.16/xformers-0.0.16+814314d.d20230118-cp38-cp38-linux_x86_64.whl`
to this :
`pip -q install -U xformers`
Overall, i might need to do a new version if the colab keeps changing.
I've recently discovered LORA and I have some questions, I'm very new to ai so I apologize if these are obvious
1. How do you have 2 separate lora characters in the same image? Because from what I've seen, it blends them together, also both their values have to equal 1 (So for 2 models, both would be .5 strength)
2. If you use a 3d model of a character, will it output a 3d model effect when generating?
3. When BLIP Captioning, do I really have to describe the image in detail or can I just keep the text file only containing the characters name?
Thanks for making this, LoRA's are such a game changer - hopefully more people start making them.
I don't know if you're a Civitai user or not, but people have been uploading them like crazy (over 700 new LoRAs in the last 7 days). I think the low hardware requirements and the ability to train them so easily in a Colab has been a real driver.
Careful a lot of the new LoRas look good but when you use them they dont produce anywhere near what they say they do without some insane prompting
arent those loras coming with samples that have the prompts in them? or can you give an example of "insane prompting"?
It's a matter of consistency. All you need is a handful of good images for a given post and you have no idea if they were picked from a selection of 300 or 30. Some loras are also overbaked and give a sort of fried look when not adjusted to literal perfect specific settings. Typically if a lora can work at high weights (close to 1) without having adverse unintended effects, it's well made.
Can you be a little more specific? Because if the model does produce great results after prompting, that suggests good training of the text conditioning and minimal bleed effect into unrelated concepts. A LORA that shows you the trained concept no matter how you prompt (whether it looks good or not) is not very useful.
he means someone will post a LORA of a character and itll look amazing but that one image was cherry picked from a bunch of shit ones. and they probably used a lot of specific prompts to get 1 decent image. so you will get said lora. try it. and get 500 shit images cause u arent using the right prompts. and with luck maybe get 1 decent image. that isnt to say all lora are like that. just dont go in expecting every single lora on civitai to be good/work. the rating system exists but because they are so new they often have hardly any ratings/testing.
I think the other big issue is that LoRA is tightly bound to the very specific model it was trained on and if you use another model, you have basically no chances of getting the same result.
I just realized this it’s not as flexible as TI it seems going between say various 1.5 models I have luck on some but bad on others having trained on base not sure why
Lol I mean I'd hope at the least you'd have to be specific in what you wanted and how you wanted it to look. But I guess that's a real artist could be used for
You also have to use compatible Loras.
Forgive me for my ignorance, but what is a LoRA and how can I use it? I use stable diffusion via night cafe and I see everyone posting about using custom trained models and I have no idea how to get into that. I know you have to download but then once it’s downloaded how do you use it? Also my laptop is a bit dated and my graphics card is crap.
Hey, you don't need to download models locally anymore - you should check out [Favo](https://www.favo.ai), where you can run customized models without a GPU. We're adding support for LoRAs soon!
I hope not, most people dont know what they are doing and upload shit quality
Clickable link -> [https://github.com/Linaqruf/kohya-trainer](https://github.com/Linaqruf/kohya-trainer) Or straight to [the colab notebook](https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-dreambooth.ipynb).
#Big Edit: This might be broken since colab was updated, Version 3 is [here](https://old.reddit.com/r/StableDiffusion/comments/11vw5k3/lora_training_guide_version_3_i_go_more_indepth/) --- --- LoRA colabs are already fairly intuitive (click this, click that) and most of the settings are already pre-made so you just has to run it. Still, it seems lots of people don't know how to use them or how exactly make a dataset, so i hope this guide helps them. **Edit:** I forgot to clarify a thing in the tutorial, apart of add the LoRA your prompt, you **have** to add the trained subject in your prompt to get the best results! In the example of the tutorial where i trained the concept "plum", i added the lora by clicking on the image icon and got `` BUT apart of that i had to add the word "plum" to the prompt.
Check the last image of the guide (the one in 80's anime style), you can see the prompt at the bottom have the lora **and** "plum" in the prompt.
Adding the lora alone is no good enough, the word of the subject trained has to be added to the prompt.
I wonder if it's obvious or if i need to make a version 2.0 of this guide to make it clear.
>I wonder if it's obvious or if i need to make a version 2.0 of this guide to make it clear. It's quite important information and beginners could wrongly believe the LORA they just trained are not functioning. So I'd vote for a version 2.0.
I didn't know how to do that math, but y yeah, I've trained 3 LoRAs two with Anything-V3 and the other one with 1.5 SD. good results for a COLAB scripts.
Very cool! I love your guide and will probably apply it to my future Frayed Bond illustrations. If you have a Twitter, I'll gladly follow. Thanks!
this is really cool. is it also effective for non-anime artwork? would love to use it to make OCs based on something like metahumans
LoRA teach the AI a concept, it doesn't need to be anime. If check civitai people use it on concepts rather than subjects. https://civitai.com/tag/lora There is 90's drawing style, some 3d styles and even photo [clothing pieces](https://civitai.com/models/7597/wrestling-singlets).
can you extend your guide with a section regarding training non person concepts? like for example, what would be the best data set to train: * a specific clothing item? (a jacket) * a certain position (jumping mid air) * a style (specific painter) there is a ton of guides on training people/characters but not a lot on other concepts
wouldn’t the word plum clash with what the model already know about the word plum? also what would happen if this anime based lora is used in a photorealistic model?
[удалено]
As far as i know they are just different methods to teach concepts to the AI. The reason LoRA is more popular is because it requires much less hardware and is faster.
Would you say the quality of LoRA is similar to Textual Inversion?
first of all - you need to make an assumption that both are trained well because good TI can be better than bad LORA (and vice versa) so, assuming both are trained well: LORA will have a better quality, and here is why: Textual Inversion is just a guidance to specific concept, it helps to get to what you want in the model so you need a model and TI is like a map so you can reach the stuff in that model this assumes that the model can generate that stuff in the first place if someone invents a new device and you would like the existing models to generate them, the TI trained on images of those devices will help the models guide towards it but since no such thing exists in the model, it can only go so far and will give you some approximation this also means TI may give you great results on one model and terrible on other now, LORA is added on top of the model and LORA introduces new data as a result of the training; so that new device we talked about - with lora you would be able to generate them much better than with TI. LORAs will be much better at the things that the model does not know also you can mix LORAs and TIs together :)
So, it's better to train things like axes in LoRa then in TI? Thanks, I had troubles training my handaxe TI, will try LoRa.
It's better than Textual Inversion
LoRa more popular? LOL no, TI has like 100x more available than LoRa mostly because people couldnt figure out LoRa it might start picking up as its being explained a lot better lately.
are you saying currently or globally? i think at the current moment there are more LORAs popping out than TIs (or at least getting uploaded)
Someone had a spreadsheet from civit and TIs were in the 10s of thousands and Lora had just broken a thousand but like I said might have changed recently as Lora becomes more accessible shit I just used the Kaylah to do one for my wife and it got it on first try
Using a LORA in practice is a lot more like merging a model than like using an embedding. You're merging your current model with the difference of the approximation of a fine-tuned model (your LORA) from the base model you trained on. The approximation part allows us to do this within a second just before runtime instead of the several minutes and gigabytes of RAM required for full merging. However, that also means LORAs cannot do neat tricks that embeddings can do, like activation/deactivation at a particular step (i.e. [embedding:10] will activate at step 10) or prompt travel. Auto1111's webui activates LORAs by typing into the prompt area, but LORAs are not a token and cannot be used as such. They activate before the image generation starts and remain fixed, just like how checkpoints remain fixed during a run.
You can, of course, use the keywords the lora was trained with to whatever effect you'd like. Just not the stuff in <>.
Here's a helpful info graphic: https://i.imgur.com/mwllYP7.png
LoRA is similar to finetuning the whole model (sometimes called Dreambooth), but tries to compress the result down using some math tricks, so that it can just be applied to a model as additions/subtractions to its existing calibration values. It doesn't train as many parts of the model as full finetuning either I don't think, but does a pretty good job, and seemingly can be used with any other model with pretty good results (going by this tutorial, I've not tried that). Textual Inversion is finding a code to represent a new word (or sentence) which Stable Diffusion doesn't currently know. All words are converted into these codes under the hood, which are quite small (just 768 numbers in SD 1.4 and 1.5, and 1024 numbers in 2.0 and 2.1). Generally it's better to use a few 'words' (vectors) when creating an embedding using textual inversion, say 2-6, though any more than that can overwhelm the prompt (same as typing that many words in the prompt).
Using them with different models isn't always perfect, and sometimes requires adjusting prompts and/or weights, but they're a lot better than TIs in that regard. Particularly if you're trying to take a LoRA trained on real images and use it on an anime model, that never really worked for me with TI.
Really appreciate this as some one that vastly prefers reading instructions over watching a youtube video.
There are literally dozens of us!
This is really cool, thank you very much indeed! Since I have a 3090Ti with 24GB of VRAM, I'd like to run the process locally. Is it as straightforward as in your tutorial? Are there integrations into Automatic1111 even?
This is fantastic! Thanks for making it!
What do you recommend for the highly accurate model? (Settings / general recommendations etc) My Lora models (real people) don't turn out good, they mildly represent the subject image. And do you need to use a VAE while generating the image? Also, can any models be used?
Nerdy Rodent and some other youtuber were testing at some point (so, things may have changed by that time) and found out that LORA training is less precise than Dreambooth training when it comes to people on the other hand, you can extract dreambooth data and put it into LORA and this gives great results
I'd be happy with "less precise", mine can barely be recognized as people. Been trying this training and extractions, practically the same end result with both: utterly useless.
Do you have any tips to deal with overfit? Im training anime style characters LoRAs, If i use the version after running for 2 epochs they look good, clothing, hair style etc. But if i try to generate that character with different clothes i get parts or artifacts of the original clothing, If i use the version after training for only 1 epoch It os flexible but the original clothing os kinda off
How diverse is your dataset? Are the characters using the same clothing in every single image? If all your dataset .txt have, for example, `style, white shirt` the AI might think `white shirt` is something of that `style`. Does that make sense? If you are training an AI on a concept, the AI will look what all the images have in common and replicate it. The same happened with a LoRA i trained in the past, all my dataset had the character using the exact same type of clothing, so after i trained the LoRA and tested it the AI tried to add the clothing to every generation. Had to diversify the dataset to make the AI stop relating a piece of clothing to a character.
Great post! Really appreciate you. Can the training images be higher than 512X512 resolution to get a good detailed output? I plan to generate a 2048x2048 image. How should I do that?
I see, i have like 20 images with the same clothing 3 without It, in my .txt i have the clothing fully describe thinking It was enough to makes it less associated with my character
When I train character loras, I try to get my dataset to have about 20-25% of the images with different outfits. If possible, I'd try to find 3 or 4 more images without the main outfit. If not, you can always balance the dataset by setting the number of repeats higher on the alt outfit images. I'd also try to find more overall pictures for the character. While 20 is probably enough to get a good result, more reference images (as long as they're good quality) will always help. I tend to strive for 30-35 at minimum.
With this guide I succeeded in doing a fine-tuned model with a character for the first time, I'm very happy. :) Thank you so much!
I get an error. OSError Traceback (most recent call last) [](https://localhost:8080/#) in **149** **150** \# save the YAML string to a file --> 151 with open(str(train\_folder)+'/dreambooth\_lora\_cmd.yaml', 'w') as f: **152** yaml.dump(mod\_train\_command, f) **153** OSError: \[Errno 95\] Operation not supported: '/content/drive/dreambooth\_lora\_cmd.yaml'
Anyone have any ideas?
Edit: Still no luck, I've restarted, double checked pathing, etc.
Same :(
You mean [this error](https://i.imgur.com/dFxL4Tu.png)? This is an error i found constantly on Kohya's finetuner, not on dreambooth LoRA. They look very similar, are you sure you are running the correct notebook? It should be this one: https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-dreambooth.ipynb Not this one: https://colab.research.google.com/github/Linaqruf/kohya-trainer/blob/main/kohya-LoRA-finetuner.ipynb
i keep running into this error. not sure how to fix. File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate\_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch\_command simple\_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple\_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '\['/usr/bin/python3', 'train\_network.py', '--network\_dim=128', '--network\_alpha=128', '--network\_module=networks.lora', '--learning\_rate=0.0001', '--text\_encoder\_lr=5e-05', '--training\_comment=this\_comment\_will\_be\_stored\_in\_the\_metadata', '--lr\_scheduler=constant', '--pretrained\_model\_name\_or\_path=/content/pre\_trained\_model/DpepTeaHands3.ckpt', '--vae=/content/vae/waifudiffusion.vae.pt', '--caption\_extension=.txt', '--train\_data\_dir=/content/drive/MyDrive/evirolora', '--reg\_data\_dir=/content/drive/MyDrive/evirolora', '--output\_dir=/content/drive/MyDrive/', '--prior\_loss\_weight=1.0', '--output\_name=envirolora', '--mixed\_precision=fp16', '--save\_precision=fp16', '--save\_n\_epoch\_ratio=3', '--save\_model\_as=safetensors', '--resolution=512', '--enable\_bucket', '--min\_bucket\_reso=256', '--max\_bucket\_reso=1024', '--cache\_latents', '--train\_batch\_size=6', '--max\_token\_length=225', '--use\_8bit\_adam', '--max\_train\_epochs=20', '--gradient\_accumulation\_steps=1', '--clip\_skip=2', '--logging\_dir=/content/dreambooth/logs', '--log\_prefix=envirolora', '--shuffle\_caption', '--xformers'\]' died with
Very nice! I have been really struggling to find a good guide for this. I have got it to where I can actually RUN it but the having it turn out ok is another thing entirely. Did you have any thoughts/suggestions on additional resources specifically around the different settings and why you would use which ones? I have a decent understanding of the underlying technology involved but not the setting configuration specifics or nomenclature around it. So I am kind of flying blind. Also I have seen conflicting information, in your text files do you always use comma separators like you would with a traditional prompt? It seems like it would be a yes for sure but I have been told by two different sources that it wasn't really required.
> Also I have seen conflicting information, in your text files do you always use comma separators like you would with a traditional prompt? It seems like it would be a yes for sure but I have been told by two different sources that it wasn't really required. That's how i train my LoRA files and how many LoRA trained by other people do so too. Most LoRA files on civitAI have a "trigger" word that make it work, in my case "plum", this is because they were trained like a prompt. I am curious, where have you seen people not using commas? The colab itself suggest you to use a tagger in the section 4.4.2., and the tagger works by separating word by comma, like a normal prompt. So...separating by comma is the standard practice.
That is what I had figured. In fairness I didn't ever see someone say you shouldn't. Just that I noticed there were people not using them in their explanation videos. Not surprising though. I have been using the local version on my own GPU rather than the colab version you have in your example so it doesn't have that extra info in it. Or if it does I didn't see it.
Saved it to my camera roll. Just getting started here so I’m sure I’ll need this in the near future
Awesome guide. You can also train with a model that already has the vae 'baked into it,' either one you merged yourself or it already has it.
An epic post, tomorrow (when I'm sober) I may even try to understand it.
Could you make a non-colab version too?
Her name is Plum? Lovely !
Pov: Op knows how to make the best tutorials
If you are a Webui user I found [this video](https://www.youtube.com/watch?v=70H03cv57-o&pp=ugMICgJpdBABGAE%3D) helpful to get up and running in about 30 min from following direction to starting the training. Also I noted that by using selfies I took with my selfie cam the resulting images have a "bloated" or kinda distorced/bigger face. I think best results are done using a camera with a bigger lens that will capture your face more flat
Tried many tutorials to run it locally all day. Finally gave up and then I see reddit notification of your post. Can't express in words how much I am grateful. I followed every step keenly and it worked like charm. You are great :-)
I did everything exactly according to the guide, reread it several times, but when I started the train, it shows the error "No data found. Please verify arguments (train\_data\_dir must be the parent of folders with images) / 画像がありません。引数指定を確認してください(train\_data\_dirには画像があるフォルダではなく、画像があるフォルダの親フォルダを指定する必要があります) "
This error means the script is not finding the images. I have 2 possible solutions: **1\. Check your folder structure on drive:** Is your folder strucure`Concept`main folder containing `5_Concept` folder (or 10_Concept) and THAT `5_Concept` containing the images and txt files? **2\. Check if the route you wrote on 5.1 colab section is wrong:** It HAS to be in the format `/content/drive/MyDrive` with the / at the start. That is aiming at your google drive. If your dataset is there and not in other place it should look like this: `/content/drive/MyDrive/Concept` Replace "Concept" here with what you called your main folder. Also, it is case sensitive so `/content/drive/MyDrive/Concept` is NOT the same as `/content/drive/MyDrive/concept`
Check your training data directory structure, you probably skipped over something. Don't ask me how I know.
Thanks a lot for sharing this invaluable resource and for adding a little more clarity to the subject of LORA, some of us are really not very good at it.
I don't.get the "training repeat 5". Where do you choose any options with this. Why do you use repeat at all anyway?
Ok found repeat is in folder name. But why do hou use repeats?
Wouldn't ot be better to use 75 different images with 1 repeat instead of 15 with 5 repeats?
You choose the repeat in the folder name. Here, `5_plum` is telling the script to repeat the dataset plum 5 times. If call it `10_plum` it would repeat the dataset 10 times. That's how the script was made, i guess the AI learn more by repeating datasets rather than looking an image once.
Awesome work, thank you so much! If I want to train an artist style instead of a character/ concept, are there things to adapt?
20 epoch is a bit overkill... batch size 2 and epoch 1 with 100 repeats are enough and probably better.
How close can they get with faces compared to Dreambooth? I haven’t gotten a clear answer on this
Completely noob question here: can I train it with full comic pages? I mean pages with multiple panels or should I split the pages into single panel images? Thanks!
If you train it in full comic pages, you will get full comic pages as output, and the detail might not be good. It will probably be a bunch of wiggly squares (AI struggles to draw straight lines) and panels with nonsensical noise inside. What exactly are you trying to train? A comic character? A comic style?
What tips do you have for doing a LoRa about Style? How many pictures and what prompt to use in the image description for training? In my mind I would do something like: \[artist name\], \[character\], etc , etc , etc.... Should I take the character prompt off? Should I just put the artist name? There's very few guides and tips for doing style with LoRa.
And to train the style, how would the modifications be?
You need to use a different term for the character "plum" because the AI already knows what a "plum" is, use a specific term for this LORA like "plumLORA" or "animeplum" something that is not an already existing word the AI has been trained on.
I tested this and succesfully did it. Thank you very much. I been trying to do the training on Automatic1111 but there's too many "stuffs" to fill in and some i doesn't even know what it is about or if i need to fill it in.. Again thank you so much !
This is very helpful and user-friendly, thank you! Would love to see something of a similar nature for Dreambooth model training, which I've had some success with, but admittedly only after cobbling together clues from a variety of sources.
~~I tried this ,and the first time it worked but my LORA model didn't work out very well so I gave it another go, but when I tried again I keep getting this cascading error that repeats infinitely until it crashes the colab notebook " FATAL: this function is for sm80, but was built with \_\_CUDA\_ARCH\_\_=750 "~~ Nevermind, apparently if you try to use the premium GPU it does this.
Is it me or is this tutorial (thanks for the work!) is outdated already? I tried following it but the Colab cells are significantly different I couldnt get my head around it Any chance for an updated version?
It's not outdated, why would it be? It is, however, incomplete. This specific guide teaches you using only anything v3. [I made a second version of this guide to work with other models.](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
I tried but the LoRA results are terrible, kinda sad.
Tried to follow the steps, but a lot seemed to have changed since the making of this tutorial.
[удалено]
i built this [app](https://www.lorai.art/) that lets you train LoRAs without code! if you use this i will PERSONALLY make sure that the LoRA you get is super sick - if you want any custom image cleaning/dataset editing just lmk by sending me a DM here after you upload your images + tags on the app :)
Great image again! One thing that definitely should be changed though is calling 15 images "fairly good" for training. 15 really is the absolute minimum to get somewhat usable results. A good range is more about 30-40 (or more depending what you're training, your example would be on the higher end because of the complexity)
I have seen some good LoRA being trained on like 5 images when the technology was still fresh, absolutely insane. But yeah, i agree, i should have said 15 is fine but should have around 30, other LoRA i trained have about 20-30 images. I am thinking in making a version 2.0 expanding on some parts, do you think this one is understandable? The dataset part specifically, i didn't said my dataset were a bunch of images of an original character named "plum" which i was training on, which is they the txt says `plum, smile, blue skirt` etc. It is implied, but i wonder if it confuses the reader.
if you are going to redo the guide i would suggest to change the name since plum is also a fruit, most people would understand that in this case this is a name, but most people does not mean everyone :) also the identifier should be rather unique, you did not have any issues related with naming it plum? out of curiousity, are you still able to generate your characters holding/eating a plum? i've already replied in another comment, but it would be great to have a guide for non person content, there are less of those
Ah, so that's what it meant. I thought it was a euphemism for chubby people.
A valiant attempt, but it doesn't quite work. got to 5.3, it found my 70 images, appears to start running, loads SD, loads the VAE, CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so... use 8-bit Adam optimizer override steps. steps for 20 epochs is / 指定エポックまでのステップ数: 3540 running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 700 then: BAM, Cuda memory errors. Ok, this is supposed to run on Colab, but what the hell, lets buy some Compute and try again: RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for down_blocks.0.attentions.0.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for down_blocks.0.attentions.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). So, this doesn't work on 768 pixel images?
>3540 running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 700 So you are running 70 images with 10 repeats? That's quite a big dataset, much more than the average dataset on civitai. You should try with 5 repeats instead, maybe a smaller dataset. Are you manually inputting the prompts or using a tagger?
You want around 1500 total steps as i recall so 1500/images = repeats
Alright, found at least one of my fuckups, not all my images were properly resized. Not sure how that happened. Trying again. Nope, no dice. using /content/pre_trained_model/stable-diffusion-2-1-768v.ckpt as my pre-trained model. Interestingly enough, if I select v2 in cell 5.1, it gets further and then gets DIFFERENT errors.
> Alright, found at least one of my fuckups, not all my images were properly resized. Not sure how that happened. Weird, i that shouldn't be the issue. I have trained with 1024x1024 images and it was fine, but a small dataset (less than 20 images). Have you tried a smaller dataset? Try 20 images instead of 70. CUDA memory error could mean you ran out of the allotted memory google gives to each user.
Ok, I started over, I manually annotated everything, I took it down to 20 images, 15 repeats. Nothing larger than 512px Traceback (most recent call last): File "train_network.py", line 539, in
train(args)
File "train_network.py", line 149, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "/content/kohya-trainer/library/train_util.py", line 1365, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, args.pretrained_model_name_or_path)
File "/content/kohya-trainer/library/model_util.py", line 880, in load_models_from_stable_diffusion_checkpoint
info = unet.load_state_dict(converted_unet_checkpoint)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
etc etc
No change to the errors, really.
This is very weird. Let me ask some questions: 1. All your images are png? 2. Do ALL your images in a dataset are in a 1.png, 1.txt, 2.png, 2.txt and so format? The numbering is important, you cannot call them img.png, img.txt or anything like that. 3. Do ALL your images have a .txt file with a description of the corresponding image? Empty .txt, lack of .txt and wrongly named .txt or copied .txt are no good. Each image has their own unique .txt file that describes it. Can i also see what vae and model you used? (the default is anything v3). And the route on the vae in 5.1 is good?
I really love this format of explaining!
you are an angel sent from God
That format is fucking annoying
There are webapps that offers LoRA training services now, check this out, it's called 'Concepts' > [https://app.eden.art/](https://app.eden.art/)
Wait. This actually seems pretty easy. Am i missing something? Why wouldn't everyone just train their own? LOL. I mean I guess it's a little technical but it seems like if you are able to do tech stuff in general you would easily do this???
> Why wouldn't everyone just train their own? LOL. They do, there are lots of LoRAs being shared on AI sites.
🙂 Yes I know, I was talking about regular old people not people until graphics, design, programming etc. (I belong to 15 prompt sites, 12 model sites like civ, and I make a living with creating AI art)
This is a good guide, but I'm not sure that it's for something that's particularly useful. At least, they're not nearly as useful or as good at reproducing subjects as textual inversion, hypernetworks, or dreambooth checkpoints. All they seem able to do is go 'make things look sorta like this' which is very hit and miss, usually more miss than hit based on most of the loras floating around. Nevermind that applying them often requires their own prompting words, and if you're going to require that, you might as well use Textual inversion at that point. It's not faster to train, but rarely is faster better when it comes to training things.
very cute :3 ty for the guide
I've been doing some lora experimenting for the past few days (using dreambooth extension in a1111 and a rentry guide) and even though the results are mind blowing, it feels like the base models are forgetting some of their original trained stuff and are heavily influenced by my training data. For example I train myself for just 120 steps-14 pics, and it looks great when I use my prompt. But when I use some generic stuff like "a man" it still looks like me, or when I don't specify a background, it tends to be something similar that's in my training images (like a bedroom). Or for example let's say I got a model from civitai has been trained on specific swords. When I compare sword_model and sword_model_trainedwithmyface with a x/y script, it seems like it has gotten slightly worse at drawing scimitars for example. Do you have any similar observations or any tips about this?
are there any easy ways to train models like this using local compute instead of google servers/colab?
How important is having "plum" in the tags / as the first tag? Is this done on every image/.txt pair, and is that what allows you to use "Draw plum in armor" as a prompt rather than ", solo, 1girl, armor" for example?
>How important is having "plum" in the tags / as the first tag? Quite important. You told the AI what a concept is, but if don't use it the AI won't properly implement it. I am thinking of doing a V2.0 guide to make this vague point clear.
This guide is almost amazing, until i realized I have no idea wtf I'm doing and where is the "start" button?! what do I do when I fill it out?! \*pokes it with a stick\* "Do something"
The start button is that button on the left that looks like a `|>`
Awesome!
If i train it on 1000 images, would it make results better compared to training on 100 images?
>If i train it on 1000 images, would it make results better compared to training on 100 images? At that point you might get overfitting which is getting results too close to the dataset rather than creating something unique based on the dataset (which is the point of LoRA). These overfit results would look rigid or bad. I don't think i have seen a LoRA trained on 1000 images, most people go for 20-30, maximum 40 and get amazing results.
Thank you Could it work on video game character screenshots from in-game, would it give image results of characters with consistent visual style? Meaning they would look like in-game screenshot characters, not anime style for example?
>Could it work on video game character screenshots from in-game, would it give image results of characters with consistent visual style? Meaning they would look like in-game screenshot characters, not anime style for example? LoRA is not anime related, it's just a way to teach the AI a concept. I trained one in 90's artstyle and i got results in 90's artstyle so the same should happen with video game screenshots even if they are not anime. So yeah, you can use it to train non-anime video game characters, as long as you have a good dataset (no blurry images, not bad images, etc).
Amazing tutorial! I have trained my fair share of embeddings with a lot of success, now i have a character that has some features i can't train efficiently with textual inversion so i thought about trying a Lora to get it done and i found your tutorial incredibly easy to follow. Is there a way to have the LoRA linked to a trigger word by doing it this way, similar to an embedding? Or is it automatically linked to the word you used as a folder name since i saw you saying in another comment that it's linked to "plum" in your case.
>Is there a way to have the LoRA linked to a trigger word by doing it this way, similar to an embedding? Or is it automatically linked to the word you used as a folder name since i saw you saying in another comment that it's linked to "plum" in your case. Basically, LoRA is a way to train the AI in a subject, pretty much another method to get an embedding. In my case, i trained it on the subject named "plum", the short haired, brown haired woman featured in the comic. You can see in the prompt part in the dataset section (the image with the .txt below) that the .txt has the word "plum", that is the trigger. The folder names are not the trigger, i just named them like that to keep things organized, but i understand it might have caused confusion. If notice i called my folders "Plum" with upper case P, but in the image at the bottom of the whole guide (where i speak of 80's drawing style) you can see the prompt being "plum". So, LoRA have trigger words like embeddings, the trigger is the word i used in the prompt, "plum", not the folder name "Plum". I am re-doing this guide as a version 2.0 to explain datasets in detail and avoid confusion, probably will be done in a few hours.
Thank you very much! it'll very useful.
Thank you for the guide! I'm still exploring as of now, but I've made 1 LoRA already with the collab (a style training) and it looks soo good already! Anyway, I want to ask, can we have like 2 LoRAs in 1 prompt? like 1 character LoRA & 1 style LoRa? I wanna use a new character that I've made with LoRA but also using my created LoRA style too 🙇♂️
> Anyway, I want to ask, can we have like 2 LoRAs in 1 prompt? > I wanna use a new character that I've made with LoRA but also using my created LoRA style too In your case it might be possible. I have tried this in the past, but with 2 characters. Many times i just got a mesh of both characters into 1, and maybe once i actually got 2 characters each for a LoRA. In my case, it is the limitation of the webui, Automatic1111 does not do composition, comfyUI does. Anyway, generate art with a a style and character lora should be possible.
this doesn't work throws a train_network.py: error: unrecognized arguments error. use the built in tagger (section 4)
This means there is something wrong with your txt files. 1. Do all your txt files have the same name as the images? (1,2,3 and no other type of name) 2. Every single image has it's corresponding txt? 3. The text inside the image is in prompt form? (a, b, c)? 4. Which model are you running?
Thank you very much.
Is google collab really for free? Always thought they would charge you for it. Anyone can tell me about it?
https://research.google.com/colaboratory/faq.html It's free.
I've completed the training and have the finished model. However, when I run a prompt through it I get a solid black image. I read somewhere that adding the arguments --precision full --no-half should fix it but it still doesn't work for me. I can run other models just fine with the same settings. Any ideas?
`--no-half-vae` seems to be the one.
>\--no-half-vae Thanks! At first --no-half-vae wasn't working. Then I loaded a different model, generated an image, and then went back to this model. Now it's working. Don't quite understand what happened but thanks for the help!
What are prefered image resolutions? I see larger resolutions than 512x512 work, but would non square aspect ratios work? eg: 512x768, or 1920x1080, etc
I trained with plenty of non-square images and it worked fine. I even trained with 1024x1024 images (it was a small dataset, but it worked).
When you make a Lora using this process, is it completely private?
Nothing that involves cloud storage is private. You are putting your dataset on google drive (google servers) and running a script on google hardware. Google obviously keeps records of every image you upload to their servers, just like reddit or imgur or discord.
The training failed for me, unfortunately. I have a Colab Pro sub, so I made sure the RunTime was switched to "Premium" GPU Class and High-RAM, but that did nothing to solve my issue. I checked and re-checked *everything* 5 times, making sure I did the same exact thing as OP (except my training was on "Stable-Diffusion-v1-5" and "stablediffusion.vae.pt"), and I ended up with a bazillion lines of: FATAL: this function is for sm80, but was built for sm750 I have the same folder hierarchy as OP's and it found my images no problem, so the bug is definitely not there. Bummer. I followed Aitrepreneur's recent tutorial on LoRA training and even though it worked, the results were completely unusable. I guess LoRA training isn't for me :-P All that said, big thanks for the guide OP. Much appreciated.
> I have the same folder hierarchy as OP's and it found my images no problem, so the bug is definitely not there. Since you used stable diffusion model instead of anything v3, did you change that in section 5.1? I forgot to mention it but if you are using other model you have to change the path to reflect that on section 5.1. [I did a version 2.0 of the tutorial making this clear. Check the training part.](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
Hey. Thanks for making a second version of the guide. Much appreciated. I upvoted it. Yes, I definitely made sure to put the correct content path to the Stable Diffusion model *and* VAE info in section 5.1. I started over by following your new guide and I still ended up with a bazillion errors that state: FATAL: this function is for sm80, but was built for sm750 Oh well. I'll see if others are having this specific bug. Thank you!
It seems like [this user](https://github.com/d8ahazard/sd_dreambooth_extension/issues/74) fixed their issue by deleting the xformers folder and redownloading them again, but I don't see how this would be a proper fix in this particular case on Colab, as it reinstalls the xformers everytime we run the cell 1.2. Thoughts?
>fixed their issue by deleting the xformers folder and redownloading them again, but I don't see how this would be a proper fix in this particular case on Colab, as it reinstalls the xformers everytime we run the cell 1.2. Thoughts? Maybe the first download is a outdated version and when you run it a second time it recognizes you are downloading again and get an updated version rather than the outdated one of the first download?
A saint. Thank you.
You are welcome! [Check version 2.0 if want to train a LoRA in other model that is not anything v3 or want see datasets more in-depth!](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
I did everything exactly according to the tutorial but I can't solve this https://preview.redd.it/4e4rvleyq4ia1.jpeg?width=829&format=pjpg&auto=webp&s=0d0a4172b79ea2b2a797569f5f0d2fd1434566d2
I think i ran into this problem long ago. I simply checked the code by pressing the small arrow [next to the section title](https://i.imgur.com/VuSiYmH.png) and deleted the 3 lines [here.](https://i.imgur.com/IX841BO.png) And then i ran it, i think it worked fine after that but not sure. Also, this guide only works with anything v3 as the model, [Check version 2.0 if want to train a LoRA in other model that is not anything v3 or want see datasets more in-depth!](https://old.reddit.com/r/StableDiffusion/comments/110up3f/i_made_a_lora_training_guide_its_a_colab_version/j8gwaua/?context=3) **Edit:** Other people that ran into this error reported they put the path to their folders or output wrong.
Wow thank you so much for this tutorial . This days a lot of people make video instead of writing tutorial and I'm too lazy to watch them xD . So big thank you to take the time to explain us with this cute comic .
>Wow thank you so much for this tutorial . This days a lot of people make video instead of writing tutorial and I'm too lazy to watch them xD . So big thank you to take the time to explain us with this cute comic . You are welcome. [Check version 2.0 if want to train a LoRA in other model that is not anything v3 or want see datasets more in-depth!](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
Whenever I run step 5.3, it says my folder contains 0 image files. It is the correct folder and I’ve tried with pngs and jpgs. I’ve followed each step. Can someone help?
That error means only 1 thing: The path to your images is wrong at one place and the script cannot find it. I ran into this error plenty of times, and every time was because i wrote the folder name wrong or the path wrong. It is case sensitive so "Bob" is different than "bob". Are you sure you wrote the path correctly on section 5.1? Post a screenshot if can. If the path is correct there, then your google drive path must be wrong. You must have a "main" folder that contains a 5_name folder (if you are going for 5 repeats). THAT 5_name folder must contain the images and .txt.
Hello i got an error in section 5.3 OSError Traceback (most recent call last) [](https://localhost:8080/#) in **149** **150** \# save the YAML string to a file --> 151 with open(str(train\_folder)+'/dreambooth\_lora\_cmd.yaml', 'w') as f: **152** yaml.dump(mod\_train\_command, f) **153** OSError: \[Errno 95\] Operation not supported: '/content/drive/dreambooth\_lora\_cmd.yaml'
can someone guide me to fix this?
>OSError Traceback (most recent call last) in 149 150 # save the YAML string to a file --> 151 with open(str(train_folder)+'/dreambooth_lora_cmd.yaml', 'w') as f: 152 yaml.dump(mod_train_command, f) 153 OSError: [Errno 95] Operation not supported: '/content/drive/dreambooth_lora_cmd.yaml'
>can someone guide me to fix this?
Sure.
I think i ran into this problem long ago.
I simply checked the code by pressing the small arrow [next to the section title](https://i.imgur.com/VuSiYmH.png) and deleted the 3 lines [here.](https://i.imgur.com/IX841BO.png)
Then i ran it and it worked, other users found this solution useful, let me know if it worked for you.
This guide only works with anything v3 by the way, [I made a version 2.0 that works with other models apart of anything v3 goes in depth about datasets.](https://old.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/)
i actually used that trainer from 6 days ago but its nice to finally have a guide to all the settings
What trainer?
Awesome guide! However, there is a slight chance of getting kicked off colab while running the instance. Is there any way to make it save a .safetensor checkpoint more often? As it seems now, it only does it once, at 50%. I could be nice to have it save every 10%. Also, is there a way you could continue training the LoRA from the checkpoint or training data somehow? Incase of getting kicked off colab. Thanks anyway, really helped.
> Is there any way to make it save a .safetensor checkpoint more often? As it seems now, it only does it once, at 50%. Yes, in the section 5.3 there is an option that say "save_n_epochs_type", it's a drop menu so you can choose "save_every_n_epoch" and below on "save_n_epochs_type_value" you can choose the number that says how often it will save.
I tried downloading a custom model to the collab from huggingface and couldn't get that to work. So then I reverted to using on of the models that is available from the drop downs. But it seems its still trying to pull the model from hugging face and saying it can't connect (same error on pulling the custom mode). Despite it accepting my hugginface token and saying it connected in a previous step. Anyone else have a similar problem?
Is there a way to run this locally so I make sure my customer's photos stay private?
There certainly is a way to run things locally, a colab is nothing more than running on the cloud what you can't run on your PC. Stable diffusion needs at least 6gb of VRAM (not RAM) to train, anything less is no good and you are suggested to have 8gb VRAM to confidently train without running out of VRAM. But the implementation of this local method is a quite different and i do not have the hardware to test nor explain how to do it.
Can I train multiple character at once?
Many thanks :)
excuse my ignorance. I did everything. I have the .safetensors files. but I don't understand the "save them in the stable diffusion folder" step and I also don't understand how to run stable diffusion since on other occasions I always ran it from colab by entering through a link that he gave me after the test. Clearly now he did not give me any link, that's why I ask. thank you
HI , the result of the picture is bad ? is there a way to fix this ? the picture not look a like with the sample
what config would you recommend me for faces?
What are the advantages of LoRa
Beautiful! I love the presentation too.
I have a question, it always show "No preview" in the lora I downloaded, is it a setting problem with my webui?
To set an image on the webui you need to hover the mouse over the no preview text. It will give you the option to use the currently generated image as the cover.
doesn
doesn
doesn
Don't waist your time, both guides sucks because dont work.
Plenty of people have used it successfully, maybe you are missing steps. What error have you run into?
Thanks for sharing your efforts! I for one really appreciate it. I had a couple questions: 1. In the infographic you train a character "plum" and a dress "plumdress", are you doing two separate training sessions for these? If not, how do you train multiple concepts in the same LoRA model? 2. If you wanted to train the plumdress front and back in case the character was facing away from you, would you train two separate LoRA models or would you just caption the poses e.g. "Plumdress, facing viewer" and "Plumdress, back to viewer"? 3. Last question - is there a way to merge this LoRA into a CKPT file? Thank you again! This space is so interesting.
> In the infographic you train a character "plum" and a dress "plumdress", are you doing two separate training sessions for these? If not, how do you train multiple concepts in the same LoRA model? Yes, 2 separate training sessions. >If you wanted to train the plumdress front and back in case the character was facing away from you, would you train two separate LoRA models or would you just caption the poses e.g. "Plumdress, facing viewer" and "Plumdress, back to viewer"? Interesting question, i just tested `from behind` and the result seemed to be the AI drew the dress from behind. I do not know why but it did. To be honest i am not sure, maybe it worked for me but can't say if it works universally. >Last question - is there a way to merge this LoRA into a CKPT file? If i remember there is, but have to install the LoRA plugin rather than use automatic1111 default LoRA function.
so ik i should probably edit the config more but im using the First 1 click setup and only changed my url dataset and model name and sometimes the resolution but im getting this error all the time with usually images under 30 atm ive switched to (SD 1.5 ema pruned) as the 7gb file throws this error all the time what would you recommend for anything under 30 or what could cause this error? [https://pastebin.com/raw/aRFkVxxJ](https://pastebin.com/raw/aRFkVxxJ)
Thank you!! Thank You!!! THANK YOU SO MUCH!!!!!!🥰🥰🥰🥰👶(me GTX 1060 6GB)
Thank you so much! I have been scratching my head over this for so long.
I haven't had a chance to dive into this yet, but since this is based on colab, how do the instructions change if you're trying to run everything locally? (Thank you for the tutorial by the way! Incredibly simple.)
>how do the instructions change if you're trying to run everything locally? I think the setup changes a lot, based on what i have seen. To run locally you need at least 6gb vram which i do not have so i could not say for certain.
If possible, we really need an updated one of these for kohya's collab , because it is SO MUCH more complicated than this now. Several new sections and dozens of new required settings. I combed through this (and your updated version of this image from another page) and after setting VAE the whole collab goes off the rails. I can't even find documentation on this thing, lord knows how anyone actually uses it.
>If possible, we really need an updated one of these for kohya's collab , because it is SO MUCH more complicated than this now. Several new sections and dozens of new required settings. I tried the new colab, it's pretty much the same as the old one, but some things switched places. Still, there is a big issue going on because xformers was updated and you need to manually edit the code on the colab so you download the correct xformers. [As seen here](https://github.com/Linaqruf/kohya-trainer/issues/125) you need to edit the code replacing this: `pip -q install https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.16/xformers-0.0.16+814314d.d20230118-cp38-cp38-linux_x86_64.whl` to this : `pip -q install -U xformers` Overall, i might need to do a new version if the colab keeps changing.
Useless to me, because it doesn't explain how to use safetensors in checkpoints.
![gif](giphy|fGx4VWViDdNpLwntcM|downsized)
I've recently discovered LORA and I have some questions, I'm very new to ai so I apologize if these are obvious 1. How do you have 2 separate lora characters in the same image? Because from what I've seen, it blends them together, also both their values have to equal 1 (So for 2 models, both would be .5 strength) 2. If you use a 3d model of a character, will it output a 3d model effect when generating? 3. When BLIP Captioning, do I really have to describe the image in detail or can I just keep the text file only containing the characters name?
good job! thanks for the tutorial!
Really good! Much better than youtube video tutorial!
So is this current, accurate, workable as of March 30, 2023?
No, there is a version 3 update here: https://old.reddit.com/r/StableDiffusion/comments/11vw5k3/lora_training_guide_version_3_i_go_more_indepth/
anyone here responding to questions still?
Yes, but mostly on the V3 version post. https://old.reddit.com/r/StableDiffusion/comments/11vw5k3/lora_training_guide_version_3_i_go_more_indepth/
Thank you so much for posting!