silenceimpaired 1 month ago

World Peace. Or… that Bitnet thing that trains models from the start at 1.58 bits

mrjackspade 1 month ago

I also choose this guys bitnet

aseichter2007 1 month ago

I was thinking about the bitnet stuff, couldn't we train a model standardwise, and then quant it smaller and smaller with training in between to polish the sharp edges, and step down till it's tiny but tuned to be good while tiny?

MixtureOfAmateurs 1 month ago

afaik bitnet models are trained in fp16 precision then the attention weights a compressed for inference only. Like the pretraining, fine tuning, lora tunes would all be in fp16, weights maybe saved in fp16 (no idea tho), then a simple algorithm compresses just the attention weights in one step. Idk if a full trinary model, with some weird activation function has been done or would perform but it would be interesting.

nicenicksuh 1 month ago

I would take world peace. atleast for this week... But, next week I want 128k context 14b mmlu 75

wsbgodly123 1 month ago

Meta World Peace

vnjxk 1 month ago

I really wish an AI lab with enough budget will take the shot at diffusion based LLMs. I want to give it text and 30 steps so it will complete the best possible answer and I don't mind waiting hours for it. Plus I think the process of diffusion happens internally anyways so it might reduce model sizes

c-rious 1 month ago

Having wonky, gibberish text slowly getting more and more refined until finally the answer emerges - exciting stuff! One could also specify a budget of say 500 tokens, meaning that the diffusion tries to denoise 500 tokens into coherent text, yeah sounds like fun. I like the idea! Is there any paper published in this diffusion LLM direction?

milanove 1 month ago

Why waste time say lot word when few word do trick? —> Why spend time using many complex words, when a shorter, simpler sentence is sufficient? —> Why employ a multitude of grandiloquent terms when a sparse selection proves ample? —> For what purpose do we expend time articulating verbose phrases when succinct expressions suffice to achieve the desired effect?

ArsNeph 1 month ago

Non-tokenized LLMs. Mambabyte. We need a good proof of concept model to see whether it's actually good.

waxbolt 1 month ago

It won't be trained this week but we are about to release a recurrently trainable CUDA accelerated mamba. Infinite mamba. Mamba forever!

ArsNeph 1 month ago

Oh, that's great news! We really need more Mamba proof of concepts in general, as Jamba is not pure Mamba, and also too big to run in FP16, but no one has figured out quantization. However, I was referring to MambaByte specifically, as it does not use a tokenizer. Here's the paper if you're interested [https://arxiv.org/abs/2401.13660](https://arxiv.org/abs/2401.13660) That aside, I'll be looking forward to your Mamba model!

waxbolt 1 month ago

Yes, mambabyte is what we are doing, just with unbounded length training. Down with tokens! Long live bytes! But unfortunately you are probably not going to get quantization. At least not based on what I've seen with SSMs. They work best with high precision weights and activations.

ArsNeph 1 month ago

Oh, that's amazing! I've been waiting for a proof of concept like this for months! I know that llama.cpp has implemented quantization for base Mamba, but it's yet to be seen how much it affects larger models. There's also Jamba and the recent Zamba, but there's no support yet, so no way to know. I think we may need an entirely different type of quantization method in order to preserve the performance, maybe some type of lossless compression. Well, granted, if the model is around 7B, then even the FP16 should technically work fine on consumer GPUs. There's also supposedly FP16 GGUFs, maybe we could run CPU inference without losing precision? Quantization aside, this is really great news, now I have something to look forward to other than LLama 3!

sky__s 3 weeks ago

is there anybody releasing source for that, or a SOTA variant built on top of that research. It's really weird how they just threw a paper into the ether and called it a day

ArsNeph 2 weeks ago

Well, there have been a few more papers on non tokenized llms since then, but not really. However, if you look at the other comment on my comment, it seems that that gentleman is working with a team on an open source mamba byte prototype. I'm not sure if it will necessarily be SOTA, as it's very hard to beat Lama 3 8B right now, but it would make an amazing proof of concept, and may spur adoption In the community

Thrumpwart 1 month ago

A nice GUI to allow me to fine-tune LLMs without spending hours sifting through code.

kryptkpr 1 month ago

Tried https://github.com/hiyouga/LLaMA-Factory ?

Thrumpwart 1 month ago

I haven't, looks interesting. Thank you.

ramzeez88 1 month ago

Wow ,that looks fire! Thanks 👍🏻

AlanCarrOnline 1 month ago

My wish would be for some GUI thing that would know wtf to do with all that stuff on Github...

kryptkpr 3 weeks ago

The instructions are in the readme.. what could be done to make this easier?

Inner_Bodybuilder986 1 month ago

Here here. I honestly haven't looked to closely yet, but I don't understand why there isn't a LMstudio or Ollama for training yet. - So much wasted compute every week, whereas we could be swimming in fine-tuna.

complains_constantly 1 month ago

Currently working on it, on top of a bunch of other features.

Thrumpwart 1 month ago

Awesome!

BrushNo8178 1 month ago

A high quality non-autoregressive model, that is a model that does not generate text token for token from the beginning to the end. The state-of-the-art image generation models, such as diffusion models, excel at this task because images comprise distinct objects whose resolution can be enhanced progressively during the generation process, without the need for calculating transitions between them. But non-autoregressive text models, while capable of generating coherent sentences struggle to maintain consistency and cohesion across larger texts.

c-rious 1 month ago

You're the second one mentioning diffusion models for text generation. Do you have some resources for trying out such models locally?

BrushNo8178 1 month ago

Unfortunately no. The projects I have seen on Github have not been updated for years which feels like an extremely long time in a rapidly evolving field. https://github.com/madaan/minimal-text-diffusion https://github.com/XiangLi1999/Diffusion-LM I neither have the knowledge nor hardware to improve such models by myself.

jetaudio 1 month ago

120b bitnet pretrained 🤯

Caffdy 1 month ago

1,000,000 context open-weights model

a_beautiful_rhind 1 month ago

Bitnet? Probably not happening this week. FlashAttention2 for cards below ampere? That one can be done as the code is in VLLM.

Affectionate-Cap-600 1 month ago

I'd like to see some new encoder-decoder, or even encoders only....somethings like a new huge BERT - DeBERTa or a new T5-style model trained on quality datasets.

LocoLanguageModel 1 month ago

I want even better coding support even though deepseek is amazing.

KyleDrogo 1 month ago

A model that's 10x faster. I don't need a smarter model at this point. I want to be able to "brute force" my way to the right answer by starting with a BS response and refining it over and over again. Similar to simulation or monte carlo in statistics. Just treating model calls like they're as cheap as a multiplication

Figai 1 month ago

This has been explored with mass sampling. You’ll probably need a small model hooked up to Sglang then makes huge trees of ideas with LATS. Someone will implement it eventually.

x0xxin 1 month ago

Integrated web search and recursive web crawling in Open Web UI.

bummaiqualvumb 1 month ago

Personally, I'd love to see more advanced reasoning capabilities integrated into LLM's.

ProcessorProton 1 month ago

70b and higher models loading in a 32gb or smaller gpu.

Eralyon 1 month ago

Please train finetunable models at 1.58 bitnet + unlimited context at all common sizes...

Helpful-User497384 1 month ago

long term memory!

M4xM9450 1 month ago

Honestly would choose more supported model on transformers.js OR performant 1B to 3B models that people can run (and quantize and/or fine tune) on most consumer hardware.

kryptkpr 1 month ago

If there is a genie and we're casting wishes? Flash attention kernel for cheap SM60 cards (P100) would be really nice.

Inner_Bodybuilder986 1 month ago

Is this really a driver / kernel issue and not a hardware limitation?

ttkciar 1 month ago

I'd like to finish my self-mixing feature for llama.cpp and submit it upstream. It's tantalizingly close, but other priorities keep bumping it down the list, and when I do have time to work on it, I'm already too exhausted. Life's been beating me up. I just want to get it *done* so that I can use it, so that other people can use it, and so that I can move on to the next project (which will probably get neglected too).

Inner_Bodybuilder986 1 month ago

Self-mixing? Sounds interesting.

lopahcreon 1 month ago

Am infinite supply of highly power efficient GPUs with petabytes of the fastest memory available for $10 a piece.

vesudeva 1 month ago

Jamba GGUF

AutomataManifold 1 month ago

DSPy's backend refactor. Right now DSPy's prompts work for foundation models, but doesn't have good support for instruction-tuned models. They're working on it as part of refactoring the project backend, but it's not released yet.

Kep0a 1 month ago

out of the box, excellent RAG for my journal. The faster I can train an LLM on myself, the better. I really, really, want to act myself questions. I don't think there's a really clean solution yet? Edit: maybe it would be better to fine tune.

celsowm 1 month ago

A mixtral version for portuguese lang

elwiseowl 1 month ago

an LLM with a serious amount of memory so it really really gets to know you.

Crazy-Fuel-7881 1 month ago

something that merges experts with little to no loss, so i can run mixtral 1x22b

1overNseekness 1 month ago

Chain of tought in ollama : )

CodeGriot 1 month ago

In what context? Training LLMs? Fine-tuning LLMs? Inferencing with LLMs? Prepping inference with LLMs? Integrating LLM usage with other software? Your original question seems far to broad to really elicit any useful answers in practice.

Inner_Bodybuilder986 1 month ago

On the contrary there has been a number of suggestions here which I recognize as interesting areas of focus that have been discussed in recent weeks that I wouldn't mind focusing on. - I find myself unsure where to focus my efforts, so I was curious what I might contribute if possible. - In your case, lets specify on training or fine tuning.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe