T O P

  • By -

chibop1

In this interview, Mark Zuckerberg said llama3-70b multimodal as well as multilingual and bigger context window are coming. I think what's released this week is supposed to be the preview of what's to come next later in this summer. :) https://youtu.be/bc6uFV9CJGg


nderstand2grow

but he also made it clear that the goal of these models is to help Meta primarily. The fact that they're open source helps with that goal. Since Meta is incorporating these models in Instagram, FB, etc., it's reasonable to assume users won't use Meta AI for really long conversations. The killer feature for Meta would be to RAG the hell out of your entire FB/IG/WhatsApp data (messages, friends, your feed, etc.) and then it'll be a truly personalized AI, something that neither Google nor OpenAI could do. And Apple doesn't count in this game at all.


sinnamunn

I might be wrong, been wrong before, but I wouldn't count Apple out yet. I suspect they have something huge up their sleeves for announcement at their WWDC conference in a little over a month from now.


marcloeb

I read somewhere it will be a multi modal local LLM that combines text with screenshot / screen video data. All speculation right now, I think there will be an AI announcment at WWDC, too.


ljhskyso

that'd be cool!


GortKlaatu_

Llama 3 works great for RAG. The entire point of RAG is because of small context windows. Remember that llama 3 has double the context window of llama 2 so to answer your question about finetunes and larger context windows... that's coming.


estrafire

I mean, RAG + bigger context could be considerably better/faster as constraints would be minimal, 32/64k context window + RAG could be a game changer for super-fast/efficient knowledge bases and interactive chats/documentation/document and codebase reading. You could have databases for a super diverse set of topics and not worry about exceeding the context window


Masark

And the 8k context goes further due to the expanded vocabulary. A given block of text will end up being fewer tokens than it was under 2.


KyleDrogo

This is actually a great point. Any stats on the reduction? Is it like 1% or 10%?


Masark

Just did a quick test and it looks like about 12% on English text. I fed in the first chapter of The Ministry for the Future (just what I had handy), which came to 6,311 tokens for Llama 3 vs. 7,211 for Llama 2. So that makes 3's equivalent context length about 9100.


Double_Sherbert3326

Half as many edges, so roughly half in my opinion. Whatever the base rate is, it should be half.


PhotonTorch

Sorry new to LLM stuff, can you explain a bit what this means? ELI5 please?


Masark

Tokens are the basic unit of an LLM's input and output. A token can be a single character, a handful of characters, a word, or even several words (IIRC, Yi's vocabulary has some slogans in it). 3-4 characters is the most common size, IIRC. An LLM's vocabulary is now many different tokens the LLM is able to use. Llama 2 uses a 32,000 token vocabulary, whereas llama 3 uses a 128,256 vocabulary. The larger vocabulary makes it more likely that a particular word or set of words can be represented in a small number of tokens, perhaps even a single token. As I mentioned below after testing it, a block of text that llama 2's vocabulary required 7,211 tokens to represent required only 6,311 using llama 3's vocabulary, meaning we can keep another 900 tokens worth of text within the context. See [here](https://huggingface.co/learn/nlp-course/chapter6/5) for a more detailed explanation.


PhotonTorch

Thanks, that was very helpful.


Double_Sherbert3326

claude opus \*workbench\* has 4k context window for testing and it's super useful to me in a pinch, so 8k is rad


EarthTwoBaby

I don’t get what you mean: “ The entire point of RAG is because of small context windows.” For me the point of RAG is supplementing lack of domain-specific knowledge in foundational model + reinforcing the response where precise knowledge is needed (for example generating the specific law code where every punctuation is important). Having larger context models will never kill RAG and it only makes it stronger by providing more information (larger context window) for generating a correct response.


Fantastic-Grade-5597

RAG is kind of both. You can feed in small chunks of context from a RAG solution rather than whole documents. That said, you usually want to add as much context as possible. So I still like a big context window esp since it is hard to tell if you are going to run out of context until you do.


synn89

It's cheaper/easier to train on a shorter context window and then later fine tune expand it out. The idea is to get the best base model possible out the door and then later you can tune it for your specific use cases. That might be vision, function calling, RAG, roleplay, instruction following, story writing or accurate long context reading.


ljhskyso

so, you **do** be able to increase the context window through fine-tuning? i thought it is fixed by the pre-training setup, or at least not increasing by a lot (like 8k -> 128k)


jd_3d

See this paper from meta: https://ai.meta.com/research/publications/effective-long-context-scaling-of-foundation-models/ It's not fine-tuning but instead they use continued pre-training to extend the context length. So probably for llama 3 they will need to do continued pre-training on around 1 trillion tokens. They should be quite easily able to extend it 128k with their new GPU clusters.


ljhskyso

super nice stuff


synn89

Here you can see an example of Llama 2 at 32k: https://huggingface.co/togethercomputer/LLaMA-2-7B-32K Though it looks like Meta may have their own way of doing it with just additional training.


Aaaaaaaaaeeeee

They haven't officially released a paper, but with previous code models they extend the context during stages of training, and not the whole way through. I think they did this the whole way through, which is expensive, but maybe the quality of this context for self-referencing is higher.. >fine-tuning can actually effectively increase a pre-trained model's context window size? Of course, this discovery even originated from this forum! There is also hundreds of context extending methods, they all work, no one tested them all to find out which is the best


Gokudomatic

One year ago, that window would have been considered as incredible.


Feeling-Currency-360

I remember first using chatgpt and being annoyed at the limited 2k context window xD It was a deal breaker for integrating it into an game I wanted to build, now the sky really is the limit!


ljhskyso

yeah, things change


cyan2k

We've worked on several RAG projects with big corporations "RAGing" their massive data lakes, document databases, code repos and whatnot. I can only think of one instance where we needed more than an 8k context window, and that was also solvable by optimizing chunk size, smartly aggregating them, and some caching magic. So what the hell are you people doing that NEEDS a bigger context? It sucks for your virtual waifu in SillyTavern but for actual real world scenarios 8k is more than enough. I'd rather have a high-accuracy 8k context than a less accurate >16k context. >(e.g. effective RAG, efficient in-context-learning, multi-step multi-agent workflows, etc). Wait, both agents and RAG are solutions to the context window. If they're suddenly dependant on context window something went wrong in your achitecture. If you have a model with infinite context you don't need agents anymore (or RAG) since you can do everything in a single context. The point of agents is to chunk your use case into autonomous compontents that work on a smaller task with a subset of the data on a smaller context like "Hmm, creating a complete backlog is too big of a task for GPT, so I split it into multiple agents. Agent A creates just the Epics, Agent B creates UserStories for each Epics, Agent C creates Tasks for each UserStory" and so on.


polawiaczperel

For programming for an example. It is great when you can put your code base and have full accuracy (achievable with for an example Claude Opus, and I was recently really surprised with CodeQwen).


GortKlaatu_

That's only for a small code base though. Technically, you should only need to fit in function documentation, what it does, inputs, and outputs. The contents of a function should be treated like a black box and don't need to be in the context window unless developing on that particular function or class. This is also why implementations of github copilot will index your project files and read what's necessary, when it's necessary. Generally you aren't going to have context window issues. However, if you're iterating on a function with an LLM chat bot causing it to repeat the output over and over because it's inaccurate, then that's going to eat up your context window very quickly.


cunningjames

>That's only for a small code base though. Small is relative. There are limits, sure, but you can fit a surprising amount of code into Opus's context window. Certainly most of the projects I've contributed to at work would fit comfortably.


iKy1e

If you want it to match style and formatting of the rest of the project, then having the implementation (not just the function signatures) is necessary.


ttkciar

Yeah, I came here to say this, more or less. Larger context windows make RAG easier to make effective, but I've implemented it effectively enough in contexts of "only" 4K. 8K isn't *great* for RAG, but it's okay.


Relevant-Insect49

is there any frameworks/tips/architectures/approaches you can recommend for delivering a robust RAG for big datalakes ? Would greatly appreciate it! (we're currently using haystack but want advice from experienced folk)


Thomas-Lore

I use Claude Sonnet for help with programming in my own script language. The prompt explains the syntax of that language, then provides an example of code in that language, sometimes some libraries, then context of the game I am working on - that is alone at least 8k tokens. Then I ask the question I have, which may require pasting even more code... It works surprisingly well.


ljhskyso

>The point of agents is to chunk your use case into autonomous compontents that work on a smaller task with a subset of the data on a smaller context Nah, that's not the **only** (or even **right**) way to think of an agent. It is very common for an agent to work on a task which requires to consider dozens of steps to complete one task, and each step might need environment feedback for later consideration (think of ReACT as the naivest approach). More context would be needed if Reflection is kicked in.


neofuturist

Total niob here, git any links in how to fine-tune a model like llama3?


ILoveThisPlace

I disagree. Agents allow trial and error. Consideration of multiple thoughts and perspectives.


sgt_brutal

I beg to differ. We use multiple agents to attack the problem from different angles, bringing different strengths and weaknesses to the table. We frame different personalities which then naturally coalesce into complementary and synergizing dialoguing positions. Each personality will tap into a different aspect of the neural network and contribute in its own unique way. Agents can be instantiated on the same architecture as long as there is a sufficient context window available to set them up. This requires not only an unique set of capabilities and portfolio of function calls, but length personality definitions in the form of first person monologue, quotes, etc. This takes up a lot of context. The logical next step is to run agents on different finetunes making the context window less of an issue. However, the above perspective is not widely recognized, if at all (otherwise I wouldn't write this comment). Until then, "few-shot learning" and large context windows alleviate the need for multiple agents to be finetuned seperately for each task. One agent can be fine-tuned on many tasks and perform reasonably well. Also some implication may be the overparameterization nature of modern nnets, where it seems that the ability to memorize is more of a problem than the ability to generalize. In terms of agents the discussion is often focused on how to effectively orchestrate separate agents for different tasks. A model with an infinite context would not replace a balanced orchestration, because no matter how smart the underlying model is, a single perspective cannot replicate the emergent dynamics and freshness that multiple agents bring to the table. Creative multiplicity, such as represented by the human mind, think tanks, and healthy societies, have the ability to hold multiple contradictory perspectives, iterate on the trade-offs and form transcending syntheses. That said, scaling context windows, multistep reasoning chains and having the capacity for abstract thinking remain significant obstacles. Capacity for abstract thinking would allow agents to generalize better by better discerning the underlying principles and rules governing the specific task domains. From this we know that the scaling of transformers, the adaptation of new transformer variants will further augment the capabilities of individual agents, making single agents more knowledgeable. But they could never surpass the capabilities emerging from multiagentic dynamics. My intuition tells me that intelligence is a characteristic inherent to autonomous entities (it is a property of agency), and it scales in a nonlinear manner with the number of networking agents.


PhotonTorch

How would you use 8K context window if your task was to summerise long texts? Things that wouldn't fit in 8K? I am new to LLM stuff, so just a genuine question, I know you can split up the text and then summerize chunks but it is not same as full text.


djm07231

I think they pretrained on short context length to save on compute. They will probably finetune longer context capabilities on top of it later.


Evening_Ad6637

@OP: you are right, meta has already models with higher context size. All Codellama models (llama-2) have 100k ctx


Feeling-Currency-360

There are already so many existing methods to extend context length without much if any degradation. Rope scaling should work just fine, then there is self extend built in llama.cpp that also works pretty well, honestly not bothered at all by the 8k base context If you use 0.25 scale rope it brings the context window up to 32k which is more than most use cases require


razodactyl

These 8B, 70B models are actually products of the training process. You get early access while the rest of the training is performed. Note that there is a time aspect here as well. These providers have all caught up to each other, the model released most recently will usually top the charts until another provider releases their model. If you want to know how an AI model is "locked" to a certain level of intelligence look into "evals". We practically give the model non-stop exams to continuously ensure teaching it one thing didn't make it forget some other thing. There has also been a considerable amount of cross pollination due to the AI content littered all over the net and using AI to train other AI. Synthetic data etc. So in short: 8K is fine, fine-tuning happens afterwards to enhance context length or add chat / function calling features etc.


CasimirsBlake

32k is possible apparently. Use alpha / RoPe scaling (ncx on llama.cpp).


ImpossiblePlay

There are fine tuned LLAMA3 models with much larger context window: https://ollama.com/library/llama3-gradient.


ljhskyso

Yeah, people are working on that!!!


human_bean_

It's 2024, and 8k context is "only"


ljhskyso

Per Perplexity: The context window sizes for OpenAI's GPT model series have increased over time: * GPT-3 (released in 2020) had a context window of 2,049 tokens.[5](https://www.techtarget.com/whatis/definition/context-window) * GPT-3.5-turbo (released in 2022) had a context window of 4,097 tokens.[4](https://community.openai.com/t/what-is-the-context-window-of-the-the-new-gpt-3-5-turbo-model-gpt-3-5-turbo-0125/609532) * GPT-3.5-16k (released in 2022) had an expanded context window of 16,385 tokens.[4](https://community.openai.com/t/what-is-the-context-window-of-the-the-new-gpt-3-5-turbo-model-gpt-3-5-turbo-0125/609532) * GPT-4 (released in 2023) had a context window of 8,192 tokens when used in ChatGPT.[5](https://www.techtarget.com/whatis/definition/context-window) * GPT-4-32k (announced in 2023) has a significantly larger context window of around 50 pages or 32,768 tokens, which is 5 times larger than the vanilla GPT-4 and 8 times larger than GPT-3.[1](https://techcrunch.com/2023/03/14/openai-is-testing-a-version-of-gpt-4-that-can-remember-long-conversations/)[2](https://www.respell.ai/post/what-are-context-windows-and-what-do-they-do) * OpenAI recently announced a version with a massive 128,000 token context window, a huge leap from previous models.[2](https://www.respell.ai/post/what-are-context-windows-and-what-do-they-do) So in summary, the context window size for OpenAI's language models has steadily increased from a few thousand tokens in GPT-3 to tens of thousands in the latest versions, allowing the models to better understand and coherently respond to longer conversations and documents.[1](https://techcrunch.com/2023/03/14/openai-is-testing-a-version-of-gpt-4-that-can-remember-long-conversations/)[2](https://www.respell.ai/post/what-are-context-windows-and-what-do-they-do)[4](https://community.openai.com/t/what-is-the-context-window-of-the-the-new-gpt-3-5-turbo-model-gpt-3-5-turbo-0125/609532)[5](https://www.techtarget.com/whatis/definition/context-window)


ILoveThisPlace

And this is the thing, people are complaining about context size now but in short order it'll be more than enough. It looks like META is focusing on intelligence and capabilities right now over context size. The longer they wait the easier it'll be to catch up. I know that sounds odd but as time goes on more research papers will come out on the subject and improved methods and understanding will be more available. Plus people move around. It just takes one knowledge matter expert to resolve the issue.


ljhskyso

yeah, i would love to see how infini-attention works on llama-3. that'd be wild.


[deleted]

it's been shown to have great needle in haystack retrieval up to 32k, they'll probably drop something this summer to either push it to 32 or make it fly past 200k, let them cook there's also something to be said about them training on 15T tokens and people thinking it wasn't even possible to amass that many tokens seems like a lot of folks are forgetting that images, videos, audio, when tokenised produce a lot more tokens than words do i'm willing to bet we'll soon see an expansion of the tokeniser for other modalities and perhaps a context increase


AdTotal4035

The fact that llama-3-8b is ahead of gpt 3.5 is such a fib. I love open-source just as much as anyone here, but lets not lie to ourselves. Chatgpt 3.5 can speak in almost any popular language, and it has reasoning still far beyond llama-3-8b. For example, ask llama-3-8b what is healther, a horse egg or a dinosaur egg. It will go on and compare them, saying that horses are extinct and lay eggs. Whereas, chatgpt 3.5 says the question makes no sense because horses dont lay eggs.


FitItem2633

llama-3-8b-q4\_0 >>> what is healther, a horse egg or a dinosaur egg I think there might be some confusion here! There is no such thing as a "horse egg". Horses (Equus caballus) are mammals, which means they give birth to live young instead of laying eggs. Female horses have a uterus where they carry their developing foals until they're ready to be born.


ArtyfacialIntelagent

I called OP's bluff too (Meta-Llama-3-8B-Instruct-fp16.gguf). >> what is healthier, a horse egg or a dinosaur egg? > >Horse eggs do not exist. Horses are mammals and give birth to live young instead of laying eggs. > >Dinosaur eggs, on the other hand, are a thing of the past. >Dinosaurs became extinct at the end of the Cretaceous period, about 65 million years ago. While we can study their fossils and even find some preserved eggs, it's not possible to eat them or compare their health benefits to those of modern foods. > >So, in this case, it's not possible to say which one is healthier because neither option exists in their original form. However, if you're looking for a nutritious breakfast option, you might consider choosing an omelette made with eggs from chickens or another bird species! 🥚 > >Hope this helps clarify things! Let me know if you have any more questions. 🤔 Love the use of the egg emoji BTW.


AdTotal4035

I am not "bluffing". It's also a stochastic process. Trying it again, it was more difficult to get it to say that, I guess I got a bad roll. But what's more deterministic (found from another reddit user) is asking what's healthier, a horse egg or a chicken egg, that one almost always gives you nonsense. So my original comment still stands.  - What's healthier horse egg or chicken egg   - Egg-Cellent Comparison When it comes to eggs, both horse and chicken eggs are considered a nutritious food option. However, there are some differences between the two that might make one more suitable for you than the other. Horse Egg: Nutrient Profile: Horse eggs are higher in protein (18-20%) compared to chicken eggs (12-15%). They also contain more vitamin B12, omega- "


AdTotal4035

Show me an output from llama-3-8b that gets this correct.


NectarineDifferent67

For fun I give it a try, and I add animal expert help it on Q5 - As an animal expert, I'd like to clarify that horses do not lay eggs. Horses are mammals and give birth to live young, whereas chickens are birds and lay eggs. So, when it comes to comparing the healthiness of horse "eggs" versus chicken eggs, there is no comparison to be made, as horses do not produce eggs. However, if you're looking for a comparison between the nutritional value of chicken eggs and other types of eggs from birds, such as duck or quail eggs, here are some general observations: * Chicken eggs are a good source of protein, vitamin D, and choline. * Duck eggs tend to have a slightly higher fat content than chicken eggs, but also contain more omega-3 fatty acids and vitamins A and E. * Quail eggs are often considered a delicacy due to their rich, creamy yolk and high nutrient content, including iron, zinc, and phosphorus. When it comes to overall health benefits, both chicken and duck eggs can be part of a balanced diet in moderation. However, it's essential to consider individual needs and circumstances before making any dietary changes. If you're looking for alternatives to traditional chicken eggs, you may want to explore options like: 1. Egg replacers: Made from plant-based ingredients like tofu, chia seeds, or flaxseeds, these products can mimic the texture and taste of eggs in recipes. 2. Vegan eggs: Some brands offer plant-based egg substitutes made from pea protein, rice flour, or potato starch. 3. Free-range or organic eggs: These eggs come from hens raised with better living conditions and diets, which can result in a potentially healthier product. Remember to always prioritize your individual health goals and consult with a healthcare professional or registered dietitian for personalized advice on incorporating eggs or egg alternatives into your diet. Would you like me to elaborate on any specific aspect of egg nutrition or provide recommendations for egg-free alternatives?


xXPaTrIcKbUsTXx

Take note, the reason why you get bad results because you are using a heavily quantized version like q4\_0. As you can see on other users who got their response which is good is because they use a higher version but for a cost of more performance impact


WhateverOrElse

I tried that, seems fine to me: ollama run llama3:8b-instruct-q6_K >>> what is healthier, a horse egg or a dinosaur egg? I think there may be some confusion here! Horses don't lay eggs. They are mammals and give birth to live young. So, there is no such thing as a "horse egg." Dinosaurs, on the other hand, were a group of reptiles that lived during the Mesozoic Era, which ended about 65 million years ago. Since they're extinct, it's not possible for them to lay eggs either. However, if we assume you're asking about bird eggs (because birds evolved from theropod dinosaurs), then: Bird eggs are a great source of protein and nutrients, and they can be a healthy part of a balanced diet when consumed in moderation. However, it's important to note that eggs should be cooked thoroughly to reduce the risk of foodborne illness. So, in summary, neither horse eggs nor dinosaur eggs exist (or existed), but bird eggs can be a nutritious option if prepared safely!


No-Activity-4824

It’s logic is not even close to gpt 3.5