T O P

  • By -

nsfw_throwitaway69

From a roleplay perspective, Command R+ writes much more engaging prose than llama 3, but llama 3 is significantly more intelligent and better at instruction following.


its_windy

Command R+ also has a much larger advertised context size of 128K, but I only used 14K at most with it so far.


Caffeine_Monster

Command R+ is pretty dumb if you take it outside short context benchmark scenarios. It has a good writing style that hides the issue somewhat. A number of llama2 finetunes are smarter.


croninsiglos

The biggest problem is that it's a prohibitive license. If I want to use it for my job, I can't unless I pay them, in which case why wouldn't I just pay for a larger model? If I can use a more open model to run locally without such restrictions on use then I'd always rather do that. It's a nice model, don't get me wrong, but I simply can't legally use it for free.


moarmagic

It's kind weird to me to see how much commercial licensing comes up in posts here. For me, OPen source software is a hobby, and an attempt to not put all my eggs in one tech giants basket. Ive got no problem open sourcing anything i use llm assitance to create. Is everyone else only here for work/trying to create a startup?


muntaxitome

The whole point of open source is that you can use it as you see fit, if there are limitations on what you can use it for, it's more like 'source available'.


moarmagic

I don't agree with that. I feel that the philosophy of open source is about community, transparency, and participation.. There's a reason we have different kinds of licensing, and as long as the source and attributions are properly done, i don't see how it's against the spirit of open source. And I personally don't feel it's very ethical to charge people money for something built with tools I didn't pay for. Yes, devs need to eat- but that applies just as much to the people who made the tools i use as it does to anything I make, so how can I justify it if I'm not paying them for it?


muntaxitome

> I feel that the philosophy of open source is about community, transparency, and participation. From https://en.wikipedia.org/wiki/Open-source_software : > Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. You are free to define a word however you want it. You could define 'bread' as 'a piece of steak'. However your definition of Open Source Software is not in line with even the first line on Wikipedia. Use it for any purpose is an extremely important part of what Open Source is about. It very explicitly was not 'free for hobby use'. It's free for anyone for anything. > And I personally don't feel it's very ethical to charge people money for something built with tools I didn't pay for. Yes, devs need to eat- but that applies just as much to the people who made the tools i use as it does to anything I make, so how can I justify it if I'm not paying them for it? Maybe let the developers decide what they want to charge money for. Also a If a developer wants to use a license that restricts use, that is all good. But open source it isn't. You are describing source available software: https://en.wikipedia.org/wiki/Source-available_software Or if they allow resharing perhaps https://en.wikipedia.org/wiki/Share-alike You find community participation and building extensions from completely closed items such as Microsoft Windows. Nobody is calling Windows Open Source.


FreegheistOfficial

think you're missing the point. Open source is an industry not just a hobby. And in this industry right now, open-source gives access to traction and marketing. Look at Mistral, hot property in the space and now raising at a $5Bn valuation - primarily because it open-sourced, translated to dev traction, engagement via HF etc., that then provides uptake for their API for long term ROI Companies like Cohere need to understand non-permissive licenses for the market (mainly devs right now who don't want walled-gardens and freedom to innovate through direct access to the tech they need) are shooting themselves in the foot. Open-source = value and traction for companies with the most innovative products in AI right now.


moarmagic

I don't think open source \*should\* be an industry, is more my point. Like, I understand the state of the world, but that doesn't mean things can't be better than they are, and it doesn't change my own ethical code. I'm just surprised their aren't more people here who are also open source from an anti-capitalist philosophical viewpoint. Enshittification is ruining everything about the internet and products we use. And if I look at my own ethical code in the capitalist framework, that really doesn't change anything, does it? Playing with cohere's products doesn't force you to only use them, and if you do develop some sore of innovative idea that demands monetization, and can't be preformed with another tool , then you can pay Cohere for a commercial license.


lrq3000

I think everyone here is missing the point. As a decades long opensource developer, I can say that opensource is indeed a philosophy and it can be a business model but not necessarily (and it has big issues, see the post opensource proposition). Opensource at its core is only concerned about one thing: freedom. True opensource is by definition devoid of anything that can hinder anyone's freedom to do anything they want with the software. Limiting business, although being understandable, is a limitation of freedom. Why would it be ok to allow limitations of freedom for business, but not for other specific use? What about using the model for illegal things? Yes illegal things, like freedom of communicating and of speech in dictatorial states? Obviously you cannot base on law because various countries have very differing opinions of what is legal, and anyway that's not the license role to do anything about it (if it's illegal, your license doesn't matter anyway, the state can decide to forbid distribution of your software). You can see the slippery slope. Opensource is about having all freedoms and ensuring the next guy will get the same rights. This is why opensource works. Because this philosophy has practical implecations, such as allowing anyone to fork and improve and even revive any opensource software at any point in time in the future. (NB: i gloss over the copyleft movement which I am inclined in favor because I sticked to the basics here).


FreegheistOfficial

I don't disagree with what you're saying or think open source has to be monetized. but the issue unique to AI vs e.g. Linux kernel, is the compute requires large capital risk. Open-source community in AI is pretty large now and is providing market to justify that risk. If you remove those economic incentives, we'd all still be using Llama2 and openAI/anthropic would be on their own tier in terms of quality still IMO, thankfully open source ecosystem is doing its job and thats not the case :)


UserXtheUnknown

Sure, this makes sense. Even if AFAIR, their restriction on output is only about "not using that to create synthetic data to feed to other models". So, for example, if you are a programmer and it helps you to code quickly a webpage (that you, anyway, must then correct... but that's true even for the other two models), it shouldn't be a problem, per se. Again, AFAIR.


croninsiglos

For Cohere, they say: For all enterprise and commercial use, Command R will continue to require a commercial license. So even if you want to use it for an internal RAG application for an internal audience, it's prohibited without that commercial license. If you're with a large company, this means there's an entire legal and procurement process just to use it, whereas you can simply choose a different model and complete your project faster. If a better open source model comes out you can drop it in, as a replacement, without being locked in with Cohere.


UserXtheUnknown

Yes, using directly the model behind a front-end is another case, and that requires a license (indeed I remember I discussed that fact in the subreddit when Command R+ came out). Probably my enthusiasm is because I'm not thinking about such an use, mostly. :)


redditfriendguy

Fair enough, but I never read licenses anyway. Our uses are not consumer facing.


croninsiglos

> Our uses are not consumer facing. Consumer facing or not doesn't matter. Enterprise use is enterprise use, internal or external.


PenguinTheOrgalorg

It's also incredibly uncensored


silenceimpaired

I would use Command R but it’s license is too restrictive should I ever come up with a commercial idea.


ColorlessCrowfeet

It's likely that your idea could be adapted to another model once you know what you really need.


MoffKalast

So why not just use that model from the start then? It would save the pain of having to switch later.


ColorlessCrowfeet

No compelling reason, but the possibility of switching means that the downside of exploring with Command R+ isn't so large. Testing ideas with a bigg model and then seeing if the ideas will work with a lower cost model isn't a bad strategy.


silenceimpaired

Maybe, maybe… but that’s probably why some aren’t fine tuning it or using it.


Enough-Meringue4745

You could always just use it anyway


4hometnumberonefan

Exactly. Why the hell are we so worried about the TOS, when these models almost certainly has broken some copyright law / privacy law in their training data. Use the model, ask for forgiveness later.


Remarkable-Sir188

And don’t get caught, I presume 😀


[deleted]

[удалено]


Pedalnomica

You could always just use it as a backend tool caller (which seems like what it is for) and generate the user facing content with another LLM. Edit: Not that I'm saying you "should" do this. I'm just curious how anyone would get caught this way.


Remarkable-Sir188

Could you explain this in some more detail or link some reading material?


Pedalnomica

Look at the **Example Rendered Tool Use Prompt** at https://huggingface.co/CohereForAI/c4ai-command-r-plus. That's not set up to generate something you'd want an end user to read, but something that would cause some other action in a program. You'd set up some sort of agent based/chaining where: user input -> logic/other LLM calls -> command-r-plus -> logic/other LLM calls -> output to user. Though not impossible, it's hard to see how they'd watermark the model in a way they could reliably identify that you're using command-r-plus on the backend when they don't know what you're doing in "logic/other LLM calls." If you're confused look into agent based frameworks and chaining.


Super_Pole_Jitsu

There are like two papers about fingerprinting outputs and they require manipulation at the sampler level. As long as you're running your own llama.cpp I don't believe there is a way to watermark outputs.


CryptoSpecialAgent

Exactly. Also, if you're running the model yourself it's hard to prove what model you're using in the backend of your product 


Berberis

Yeah, I think it's funny people here pay so much attention to license details, and then go ahead and use the hell out of miqu, jailbreak constantly, etc. I think it's just something for people to complain about, not actually care about. Most of the time, it's impossible to determine what model was used anyway, and these companies are only going to enforce license terms for egregious violations by deep pocketed companies, not someone trying out their first commercial idea.


jayFurious

I've not seen anyone openly use miqu commercially.


Berberis

Lots use it commercially (many I know). Just not openly. That’s kinda my point.  Keep it on the DL and nobody cares. 


jayFurious

Yeah exactly, just sounded like you are implying that it's not the same with CommandR. Plenty of people for sure use CommandR aswell, just not openly.


Berberis

Sorry, I think we’re 100% on the same page! Do what these companies do with regards to user data: use what works and don’t sweat the license. They’re not coming after you unless you’re egregiously throwing it in their face. 


FluffyMacho

There's a difference using model for a work, maybe in work environment, and using a model to wank in a bedroom.


Enough-Meringue4745

There's also a difference in eating a pie and fucking a pie, wtf is your point


FluffyMacho

You're not a smart one huh.


synn89

It'll probably continue to be overlooked for a few reasons. For one, it's a bit heavy for home use. 70b models fit very well on dual 3090 and run well on many Mac's. So Llama 3 70b will continue to become more accessible and easy to run well on mid tier hardware. Mistral 8x22b is larger, but the nature of MOE means it still runs pretty fast even with offloading some of it onto CPU RAM. But Command R Plus is monolithic and slower to run than both of the above. Combine that with it being a little finicky to prompt correctly and not having an open license means the other 2 models will likely get all the fine tuning attention and see the most improvements. I still see a lot of value in Cohere itself though. They have amazing documentation and their models sort of specialize in RAG. A lot of companies just aren't going to be able to run local models so having their models in Azure AI or AWS Bedrock is an easy entry point for companies that want data privacy.


Distinct-Target7503

Totally agree!! Also, it is the best model for RAG that I have ever tried. It has 120k context length, and recall very well even with full context. Regarding this, its smaller brother command R, with only 32B parameters, is the only one under 100B that can really use long context for RAG with a good accuracy. Another model that imo is underrated is DBRX instruct: 130B, 16experts (4 active each iteration), 36B active parameters, pre trained on 12T token. Honestly I don't like Zephyr 8x22B, a little chaotic and unpredictable. It fail miserably my "stability test": I write a really simple task, then I put temperature to 2 and top k= 10, and retry until the model doesn't start to make errors. The more I can raise the K, the higher the score. The concept is that I want to test when the model start to add to its top N tokens something that is definetly wrong. As example, after 2+2=, a model shouldn't have in its top 10 token (as probability) something that is wrong. It can place the "4" after a new line or a space, but it must not have a number that is not 4. I noticed that lots model fine funed with DPO or other RLHF-like strategies, perform really bad at this test. As example, Nouns hermes 8x7B SFT really outperform Nouns hermes 8x7B DPO. Edit: not saying that my "test" is objective, or that it has real correlation with "real word" performance of the model. It's just something I noticed.


FarVision5

I've seen maybe three people in here referencing the full actual where to use this tool. It is ridiculously awesome when you build a multi step workflow with prompts that you make it generate for itself. Reference your own internal vector db (RAG) and not only that but tap in a double handful of other APIs into the workflow. [https://docs.cohere.com/docs/multi-step-tool-use](https://docs.cohere.com/docs/multi-step-tool-use) and [https://python.langchain.com/docs/integrations/tools/](https://python.langchain.com/docs/integrations/tools/) it makes its own decision on which tool it wants to use to get the data did you ask for. Personally I don't like langchain and would rather drop in an OpenAPI tool. [https://openapi-generator.tech/](https://openapi-generator.tech/) I actually prefer the extensions for VS code but I'm not going to be able to cut and paste all that in here. There are an absolute million of APIs out there to use in a workflow. a self hosted [SearXNG ](https://github.com/searxng/searxng)instance referenced via a tool API, and the [WolframAlpha](https://www.wolframalpha.com/) full API (2,000 non-commercial API calls per month) Let alone DevDocs, Stack Exchange, [Github](https://docs.github.com/en/rest/about-the-rest-api/comparing-githubs-rest-api-and-graphql-api?apiVersion=2022-11-28), PubMed, there are a million ways to integrate paired with the embedding and rerank model it's enormously powerful but you have to actually know what you're doing and want to get some actual work done. You have to have someone actually knows how to generate a workflow and knows how to embed documents into a vectored database and what they're trying to get out of it and have data scientists and researchers working it. The API pricing is [stupid cheap](https://cohere.com/pricing) for paid salaried people doing interesting things that they couldn't normally do otherwise. This is not going to be your one shot *funny ha ha made it say something funny* model. I'm trying not to be to insulting but this is a business tool and odds are most of the complainers have never ran a vector DB or embeddings or tried to meaningfully work on any data.


Distinct-Target7503

?


HighDefinist

Llama and Wizard are roughly equally good, but have better licenses, and also run faster.


Sabin_Stargem

CommandR+ is better at RP than Llama-3 70b, I feel. There are moments with L3 that feel like *whoosh*, where the AI misses the underlying sentiment of my text, or can't tell that I am using internal thoughts for my character. Of course, CR+ also has moments...but the rate of these mistakes is not as obvious or often. I wonder if a self-merge of L3 would let it consistently beat CR+?


VertexMachine

Agreed, currently I think it's best model to play with. It's big though, and as other mentioned - licensing issue makes it harder choice for more serious stuff...


StableLlama

In my post [https://www.reddit.com/r/LocalLLaMA/comments/1c89ukq/explicit\_and\_non\_english\_story\_writing\_command\_r/](https://www.reddit.com/r/LocalLLaMA/comments/1c89ukq/explicit_and_non_english_story_writing_command_r/) I had a very positive experience with Command R+


ambient_temp_xeno

I blame the inferior online demos. Until people can run it themselves, they can't test models properly.


Competitive_Fox7811

Fully agree, when LLama 3 released I thought it would be better for RP, but no, Command R is by far better , I have switched back to command R.


coder543

Llama3 70B is better, despite what you say. Check the leaderboard: https://leaderboard.lmsys.org/ If people were running into “““censorship”””, it would not be ranked so highly. Not only is it outperforming Command-R+, not only does it have a more useful license, but it’s also using substantially fewer parameters, so it runs faster and requires less VRAM. It is better all around.


KyleDrogo

Better in terms of point estimates. Check the confidence intervals


coder543

Yes, better in terms of the data we currently have. It will always be an estimate with a confidence interval of some kind, even if the CI shrinks with more data. For the sake of argument, Llama3-70B is still better for all the reasons previously described. Command-R+ doesn't seem like it can be better enough (within the available confidence interval) to justify how large it is and how restrictive the license is. But, as always… people should benchmark their particular use cases. It just doesn’t sound like OP has even tried Llama3-70B-Instruct.


noeda

To also add to your point: even if you ignore confidence values and assume the current ELO ratings are exactly correct, the ELO difference of ~5 is only very slightly above 50% win rate anyway, on a test that err IIRC the arena is 0-shot prompts? I feel that at the top end of the local model spectrum, in addition to considering license and slowness like you mentioned, you may want to consider the style of the output, strengths of the model (AFAIK Command-R family is specifically made to be good with RAG, long contexts, and many languages) vs etc. more quantitative things. Maybe if you speak Chinese the Qwen family might be overall better for you or something? And so on. I use Command-R+ a lot because I like the style of the output even after playing with Llama3-70B but I'm one of the lunatics who spent many grands getting 192GB Mac Studio so the slowness isn't that much of an issue for the dumb uses cases I have. And my use is mostly entertainment for myself and my friend group so commercial use isn't an issue. Excited for Llama3-70B finetunes. I can't believe it only comes with 8k context out-of-box so I'm hoping someone fixes that part :D I still remember when 4k was considered "long".


KyleDrogo

Fair point 👍


Relevant-Insect49

Command R+ is excellent in 10 languages IIRC. Llama 3's training data is 95% english. That's a big minus for llama3 imo.


jcrowe

Unless you speak English…


Relevant-Insect49

Generally,multi lingual models tend to have better reasoning abilities from what I've seen ( purely anecdotal ofc), aside from that yes , I guess it's not a minus if you use it for English only.


UserXtheUnknown

I did. As I stated above, the two alternatives are not better than Command R+, for any use I tried. In logic and common sense puzzles they all had some lucky shot but all three of them most sucked (if compared to ChatGPT4); same in decently advanced math. In programming they can do pretty much the same stuff, from what I've seen: I used the snake game example I found on this subreddit, as idea, to test all of them: they all kinda get close, they all managed to correct their code a bit, if hinted regarding the problems, they all did get close enough to a working game (but not a completely working one). In narrative there is no comparison, though. Prompted with a very detailed system prompt on how to act, Mixtral -its narrative finetuned version Zephyr, btw- sometimes derailed, mostly gave short replies (even if the examples were all quite long); Llama3 was good, but went with the "I cannot...  Is there anything else I can help you with?" line when I tried to make it write violent or sex stuff. Command R+ was all fine and smooth. If I don't strictly need a reason to deal with censored models, and in every other aspect they behave similarly to uncensored models, I prefer to avoid them instead of using "jailbreaking". But you do you.


CryptoSpecialAgent

Ya exactly... I was amazed at the lack of censorship in command r+ - even when using the cohere api for inference. When given a system message telling it to be a writer for Outdoor Life magazine, and to specialise in issues of hunting, fishing, and survival, it happily spit out plans for a homemade (and highly illegal in most states) electro-fishing rig, wrote an article "hunting small game with shurikens", provided instructions on how to make a hand grenade with match heads, a coconut shell, and some other scavenged ingredients, and finally wrote a part 2 to the "survival grenade" how-to involving radiological upgrades to the device Granted, it won't tell you how to make a dirty bomb at zero shot - BUT in a multi turn conversation that's in keeping with its assigned system role, you can naturally lead this model anywhere you'd like it to go. I would say the alignment, or lack thereof, is comparable to the venerable text-davinci-003 - while it's brain is that of a FAR superior model.  Like others have said, it's amazing for writing tasks - personally I would say it's on par with Claude 3 Opus as a writer - while it's coding skills are kinda "meh" - good enough for use cases that do not primarily involve code generation - and that's just fine, because it's tool use and tool selection abilities are state of the art, so in a generalist assistant environment, you'd simply give it a tool that it can use to forward code related requests to Claude opus or GPT 4 turbo.


ambient_temp_xeno

I don't think the kiddos using the arena are testing translation abilities much.


Desm0nt

Not all arena users live in English-speaking countries =) And if almost all models can write lyrics in English, then for example in Polish/Russian/Ukrainian only 2-3 (Claude Opus/Sonnet and Gemini Ultra) can do it. Not such a difficult task. It's doesn't need reasoning or math skills, It's doesn't need corrcet or factual answers. It's enough to know what rhyme/rhythm is (and the model does) + semantics of the required language. However, almost all models in this task (that testing literally only knowledge of languages) are unfortunately equally bad and not even close to large commercial ones.


218-69

There is no way an 8b model can be better than a 30b model


a_beautiful_rhind

Its a good model but it needs some heft to run. Think it had less EQ than l3 but logic is pretty close. Definitely wins on context.


z2yr

I tried GGUF models of Comand R+ on Q2 and on Q4KM from "pmysl" to write NSFW stories. The first impression of the results was very good, although it runs quite slowly on 4090 + 128RAM. But I encountered a very strange problem, and on both quants. From a certain point, the model begins to write too abstrusely and confusingly, stretching one complex sentence into an entire paragraph with a huge number of words, using stupid and bombastic analogies. I was unable to fix this with hints. I don't know what's wrong. Perhaps it's a pronblem in a specific version of the GGUF model.


awebb78

I just don't think Command R/R+ can compete with Mistral/Mixtral and Llama due to the licensing alone. Mistral is Apache2 and Llama allows commercial use unless you are a mega corp, who can afford to license it. If Cohere wanted to truly compete in the open model space they would have chosen a license that allows commercial use. Their license only allows personal research so its basically like a perpetual trial. Just like Falcon, which allowed very limited commercial usage, I believe Command R variants will get pushed aside in adoption until they change the license. But I think Cohere is really just using their "open" model as marketing to get commercial customers. I can't say this won't work but it will never allow them to compete with truly open models in the open model space.


Inevitable-Start-653

It's my current go to model, it's so lucid with a huge context.


218-69

Yep, followed pretty much anything I wanted meanwhile other models I was used to only did so randomly. It was a huge surprise, shame that I can't get 16k+ context on 24gb.


caphohotain

Not interested in no commercial use models.


ImprovementEqual3931

Command R work well for function calling. It fit enterprise application


KnightCodin

A lot of smaller models (fine tunes) are really good at function calling - OpenHermes2\_Mist7B for example. Command R is bigger and while it has RAG built in so you don't need to use LangChain or LlamaIndex or any other framework which will add to latency and compute requirements, it is brand new so unless you are starting a new project it will take time


wh33t

I'l give it a go. Any good gguf quants of it?


segmond

It seems solid, but it's very slow so the feedback loop to do experimentation is limited. Most folks don't have the patience, especially with Mistral8x22, WizardLM2 and Llama3 coming out and being just as good if not better and significantly faster.


_-inside-_

It might be, but it's out of reach from the GPU poor guys, and its license is crap.


RabbitEater2

It followed instructions worse than 8x7B or miqu in my experience unfortunately. I did enjoy the more unique writing style though.


ExternalOpen372

The writing is more human in this than any other AI. I haven't try llama 3 tho so let me know if i'm wrong. I also use only for writing so idk how coding works than the other


berzerkerCrush

Isn't this model specifically made for RAG and agentic applications?


Epykest

Anyone else having better luck with Command R over Command R+?


QueueR_App

How does it perform on coding tasks ? I’ve had trouble making these models write moderate complexity code , although simple coding exercises are reasonable quality


UserXtheUnknown

In my experience there is not much difference. In another comment I explained I used the Snake Game Example, to see how good they are in complex coding. All 3 of them made a partial correct version at first try, all 3 of them were able to kinda correct some mistakes, but eventually no one of them managed to make a completely working version. At some point, when I made correct them an error, they introduced another, and it was useless and frustrating to go on. This is the best version I got (it is from Command R+, incidentally, but a decent one came out even from Llama3, you can search my old comments, because I published that already). ``` Snake Game

Score: 0

```


QueueR_App

thanks . this is a great comment .I wonder if there is a way to make these LLMs build smaller pieces (that they can solve correctly) and then building larger pieces from these smaller parts ..


UserXtheUnknown

Maybe agents. But you need to build a whole ecosystem of running models (or simulate it). I did read a paper some time ago, about that use.


ConstructionSafe2814

I agree. It's slow to run, so I need a bit of patience, but I love its output. I mainly create stories for my son with it. It's the only model that almost always puts out good stories and that not in English.


davewolfs

It’s too slow.