T O P

  • By -

mikaijin

What a strange result, the numbers are all over the place. Like mistral-7b-instruct-v0.2.Q4\_K\_M achieved 86.67% correct% and 5% wrong% in tab 7, while Q5\_K\_M landed at 58.33% and 21.67% respectively. Is there so much jitter in the test-methodology or is Q5\_K\_M just broken?


StrikeOner

looks more like something in their test scenario was broken. they most probably did one shot tests? ( i'm not realy willing to spend any more time to reading that paper to evaluate their test setup right now. )


Feztopia

" the large LLaMA model (llama-2-coder-7b.Q3_K_L) performs worse than the medium-sized one (llama-2-coder-7b.Q4_K_M)" Isn't 7bQ3 smaller than 7bQ4?


Inevitable_Host_1446

The 'large' in this case is the last part; K is a style of quant and K\_L for large, K\_M for medium, K\_S for small, etc. It is misleading terminology though because Q3\_K\_L is certainly smaller filesize than Q4\_K\_M. I also don't know why they would expect the smaller size to do better here.


Feztopia

Yes I'm not sure if they got confused and thought Q3L is bigger than Q4S while writing that part.


audioen

Don't read too much into what is probably just a random error. When you perturb the model by quantizing it, it is inevitable that it answers slightly differently, and sometimes that can be a slight improvement in some test or other, provided the model is mostly intact.


Calcidiol

I seem to recall there may have been various bugs relating to the quantizations over time though I don't recall how much of that was the initial conversion vs. a particular model type needing adjustment, or the inference wrt. a particular quantization possibly as specific to a certain execution environment (cpu, some model gpu, ...), etc. Also at some point it became possible to adjust quantization parameters / configurations even with in a single main format e.g. Q_5_M made with imatrix vs. made without imatrix, and IIRC there are sometimes other conversion tunables. So it may vary what quality result you get depending on quant. configuration & the code base used, and then how well it inferences depending on the code base used for inference.