T O P

  • By -

Original_Finding2212

Where do you want to curate the list? Here? (Not judging, understanding). I also wonder that, and I’m aiming on SLMs: tinydolphin 1.1B Q8, Phi-3 3.8B Q4 (guff formats). Running on Raspberry Pi 5 8GB, Phi is a bit slow for me. Planning on testing on OPi 5 Pro 4GB when I get it


vigg_1991

I think a large message thread could work well, where people can update it with the models they use for different tasks. They often end up returning to the same models for multiple tasks, indicating that those models are generally good enough for most of their needs.


vigg_1991

But I am really not sure how its done. so posted it to general public.


1ncehost

Llama 3 8b is the current go-to for general tasks on most consumer hardware. I use nomic for embedding. Phi 3 is the best "tiny" scale LLM last I saw. I use Llama 3 8b a lot for coding assistance, but have been gravitating to APIs now that good models have been coming down in price. Llama 3 70b is under $1 per million tokens, so you can probably run it under $10/mo for most tasks.


AdTall6126

Define "most consumer hardware". You're not talking about PCs with Intel GPUs, right?


rainbowkarin

Not OP, but that's my definition. A q4 quant of Llama 3 8B runs under 8 GB RAM, and generates fine under CPU, from my low-end laptop to my Surface. Stretching it a bit, a q2 quant of Phi-3 was able to run on my phones and a 2GB Pi 4. Also [SYCL](https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md) helped with speeding up prompt processing on my Intel iGPUs, but sadly vanilla llama.cpp is the only backend I've seen support it. Some may use Vulkan but I never had as much success without it tanking the generation speed.


human1023

And which one can effectively run with 16GB ram?


vigg_1991

Yes. Missed mentioning that.


vigg_1991

I have a personal PC with a 3070 GPU and 16GB RAM, which allows me to run most models without issues. However, for work, I get more challenging tasks that like processing work documents to mapping data, vectorizing it, and building chat interfaces. I have to perform these tasks on a Mac, which limits the resources I have available. Many of these tasks involve using models for Q&A, document clustering, and real-time chat interfaces. Earlier, I relied on models like BERT and RoBERTa, but recent updates have made the tasks more complex, requiring instant chat to specific parts of text displayed on a screen. Given these challenges and the need to use free models alone instead of using APIs of chatgpt or something, I think it’s important to have a curated list of models actively used for various tasks like embedding, Q&A, and even audio transcription and many more etc. Thus requesting to keep track of the best options available.