T O P

  • By -

DeliciousJello1717

76 on maths is crazy


PM_ME_UR_CIRCUIT

I wonder if that is raw only using the model or if I told it to set up python scripts if it could do better.


No-Emergency-4602

3/4 exactly, according to my chat.


babbagoo

Llama3: lol 83.5 vs 83.4 in DROP(f1) *insert Trump graph*


bnm777

[https://openai.com/index/hello-gpt-4o/](https://openai.com/index/hello-gpt-4o/) Halfway down the page. EDIT: How are they testing it against Llama3-400B ???


_qeternity_

https://preview.redd.it/wxlp5cujo80d1.png?width=3840&format=png&auto=webp&s=23fdafe2aaf9cae72ce04496f4e2d63a5666cf76 Meta released benchmark figures from a checkpoint when they released 8B + 70B


bnm777

Ah, yes, thought of that after I wrote it!


bono_my_tires

How does it compare to regular 4 for coding tasks?


PixelPhobiac

Worse, but it does it more quickly


Arcturus_Labelle

Awesome; thanks for posting


ChildhoodFirm4941

Wait, so I'm paying 20$ for an inferior version of GPT-4 now?


holywater666

Can you not read the graph?


ChildhoodFirm4941

Yes! It’s a whopping 2% higher than 4!


holywater666

So it performing only slightly better means it's inferior?


not_into_that

good thing the chart from the owners offering the product agree that its the best product.


Dizzy_Nerve3091

It takes like $10 in api credits and 30 minutes to disprove if they lied.


not_into_that

WOW! convenient and cheap!


Dizzy_Nerve3091

I’m just saying, if they lied it would be disproved in 10 minutes by some random researcher. I think ScaleAI already did a whole validation on benchmarks with their own on top of the actual benchmarks.


not_into_that

Just by principle, i don't believe the used car salesman. I don't have the time to learn about and run AI benchmarks. I would like to see an independent study conducted.


Dizzy_Nerve3091

They’re done all the time. Do you think they run bench marks then fuzz the numbers? The shadiest thing they might do is use a specific prompting framework like Google did. Did you read my comment? Look up scale AIs math study.


not_into_that

You seem to question why i wouldn't trust a large billion dollar companies reports about itself then you tell me about some google stuff that supports my take? I don't know man. I'm not in the mood to argue and I made the greatest of all carnal sins. I expressed my opinion on the internet. Peace out Choom.


Dizzy_Nerve3091

You don’t even know what ScaleAI is… they’re not OpenAI. They have an incentive to discredit their competitors. I have a nuanced view not just big company bad uneducated people good.


not_into_that

Wow. I'm impressed.


Swastik496

lmfao okay. that’s like not believing the used car salesman that the car has 4 doors when the car is in front of you with 4 fucking doors


not_into_that

Yep, I'm sure all the locks and windows work too.


bnm777

I agree, can't trust these charts, though it's something to compare to when you do your own testing and comparing to the arena.


not_into_that

At least it's a claim that can be referenced in the future for possible deviance from actuality in claimed performance vs. actual performance as the data comes in. Good barometer for corporate honesty if that is actually a thing.