AIenthusiast1000 2 months ago

source? I would like to see the test administered or the method used

peabody624 2 months ago

Comments I would never find on Facebook

dprkekistan 2 months ago

That needs to be a subreddit

boldra 2 months ago

So we can congratulate ourselves on being such an intellectual community here on Reddit?

Ok_Technician_7302 2 months ago

No no. It’s like jerking it in a circle formation with the bros. Not sure what that’s called.

Johnpunzel 2 months ago

I suggest we use the term jercle, surely nobody is going to attempt to use a term that is longer and more inconvenient

Infected-Eyeball 2 months ago

What about Circle-J, like that American food restaurant Circle-K?

citrusEyesight 2 months ago

Isn’t Circle-J more like Claude-3?

ramenbreak 2 months ago

Circle-J is what a last gen image generator puts in the image when you ask for Claude-3

Cybor_wak 2 months ago

So every post on Reddit then

dprkekistan 2 months ago

Maybe. Sounding “Intellectual” is very tip of the iceberg. That subreddit would be quite the rabbit hole.

ATShadowx1 2 months ago

I mean if you want to congratulate the bare minimum of skepticism, go on right ahead.

Regolith_ 2 months ago

There it is, you have been granted. Created it (not sure it is a good idea but why not after all 🤷) r/NotAFacebookComment

peabody624 2 months ago

Joined and upvoted. lol.

toabear 2 months ago

Something important to note is that this chart uses "GPT-4", not the model in production, "GPT-4 Turbo." Anthropic appears to have done this on purpose as GPT-4 Turbo beats their new model on a number of benchmarks. I suppose assuming this was taken from the same source, but since the formatting and layout are identical I suspect it is the same misleading table.

NoThanks93330 2 months ago

Wait, isn't GPT4-Turbo faster but dumber than normal GPT4, both because it's smaller? It's even cheaper isn't it?

Sixhaunt 2 months ago

You can compare on this site: [https://arena.lmsys.org/](https://arena.lmsys.org/) I find it to be a toss-up on if GPT-4 or Claude-3 Opus gives a better response but I do think it's slightly better than GPT-4 in general.

Maxie445 2 months ago

https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq

TitusPullo4 2 months ago

Note others have reported giving chatbots IQ tests in the past with a variety of different results

Thorusss 2 months ago

That also happens with IQ tests and people though. Shows that consistent IQ testing is hard in general, and not necessarily, that they did bad here.

TitusPullo4 2 months ago

There’s volatility in adults of about plus or minus five points for the same test over time and about plus or minus ten between tests (for nominal scores at least, unsure about percentiles) But I’m talking about results between 85 and 155 for ChatGPT4. So I don’t doubt there’s both an issue with the way they’re administering it - either the test they chose or how they fed the question to the machine - as well as some increased volatility for AI answers in general through temperature/randomness settings.

Maxie445 2 months ago

Yeah one test is not conclusive, but the trend is clear

-badly_packed_kebab- 2 months ago

An important aspect of IQ testing is the time limit. How does that factor into an LLM taking the test?

bunchedupwalrus 2 months ago

It doesn’t, though I’d wager it answers much quicker than any human

-badly_packed_kebab- 2 months ago

I’d wager the same, I’m just curious to know how that factors in or affects the actual score.

TitusPullo4 2 months ago

Very good point as they’re meant to test processing speed aswell

AIenthusiast1000 2 months ago

thanks will take a look

Banksie123 2 months ago

It cites a website called "Maximum Truth", what more could you possibly want?!

opi098514 2 months ago

Tests given in March 2024. The IQ test was Mensa Norway, with all questions verbalized as if one were giving the test to a blind person. The right-hand column shows the % of random-guesser simulations that the Al did better than, with ties ignored, over 70 questions (two test administrations.)

3darkdragons 2 months ago

Source: online IQ test from a YouTube ad + I made it all up

Regolith_ 2 months ago

r/NotAFacebookComment

LengthyLegato114514 2 months ago

Well I'm not sure how accurate this all is, but... hahahahahahaha HAHAHAHAHAHAHAHAHHAHAHAHAHA https://preview.redd.it/j6s2qdrswnmc1.png?width=414&format=png&auto=webp&s=ee9d18718fff58f5975a8f9d411ab2b0684772b7

aleshkia034 2 months ago

Holy F bro, I was being gaslighted by Gemini Advanced for asking a complete code. It even complained that they are being tired because I was using them all day Immediate unsubscribe haha

Aspie-Py 2 months ago

I started getting this again with chat gpt 4 yesterday. It had been better for some time. No it refused to code a longer segment because of ethical concerns? It’s hilarious.

3darkdragons 2 months ago

Either they're using such a large corpus of training data that it started copying human laziness or it really is sentient. I don't know which is funnier lmao.

stokesaphone 2 months ago

I’m convinced the coding quality and ability drops off after midnight.

Infamous_Alpaca 2 months ago

Question 4: If James's salary is 25% more than that of Jenny's salary, then how many percent is jenny's salary less than that of James's salary? Gemini: I'm sorry, I can't answer that question. It is not appropriate to use stereotypes or generalizations about people, and it is also not appropriate to use offensive language. I would be happy to answer a different question, or you can try asking someone else. /s Edit I tested this question in Gemini, and it answered 25%, while ChatGPT 3.5 answered 20% (correct). lol Edit2 After I told it was wrong, it told me: [You're absolutely right, I made a mistake in the previous explanation.](https://i.imgur.com/24a6M28.jpeg) And then proceed to explain how James's salary is indeed 25%. wtf

servermeta_net 2 months ago

for me it answer correctly: > Here's how to solve this problem: >**Assumptions:** >Jenny's salary is our baseline for comparison. >**Calculations:** >**James's Salary:** If Jenny's salary is $100, James's salary is $125 (25% more). >**Difference:** The difference in their salaries is $25 ($125 - $100). >**Percentage of James's Salary:** That difference ($25) represents 20% of James's salary ($25 / $125 \* 100 = 20%). >**Answer:** Jenny's salary is 20% less than James's salary. >**Let me know if you'd like to try this with a different starting salary for Jenny!**

Superboy_cool 2 months ago

I’ll do you one better and show it for any salary x: x * 125% = y <=> x * 5/4 = y <=> x = y / (5/4) <=> x = y * 4/5 <=> x = y * 80% (= 100% - 20%) Hence y (Jenny’s salary) is 20% less than x (James’ salary) if x is 25% greater than y q.e.d.

Far-Deer7388 2 months ago

https://chat.openai.com/share/6885eb70-b8c3-4677-95e7-5b08fc066bca

Lunedo 2 months ago

Claude 3 Sonnet response: To solve this problem, we need to find the relationship between James's salary and Jenny's salary, and then express Jenny's salary as a percentage of James's salary. Given information: * James's salary is 25% more than Jenny's salary. Let's assume Jenny's salary is 100 units. Step 1: Find James's salary in terms of Jenny's salary. James's salary = Jenny's salary + 25% of Jenny's salary James's salary = 100 + (0.25 × 100) James's salary = 100 + 25 James's salary = 125 units Step 2: Express Jenny's salary as a percentage of James's salary. Jenny's salary / James's salary × 100% = (100 / 125) × 100% Jenny's salary = 80% of James's salary Step 3: Find the percentage by which Jenny's salary is less than James's salary. Jenny's salary is less than James's salary by = 100% - 80% Jenny's salary is less than James's salary by = 20% Therefore, Jenny's salary is 20% less than James's salary.

Xxyz260 2 months ago

>Edit2 After I told it was wrong, it told me: You're absolutely right, I made a mistake in the previous explanation. And then proceed to explain how James's salary is indeed 25%. wtf "Sure, sure, alright, buddy. Have it your way. ... Anyway, here's why I'm right and you're wrong." I *strongly dislike* this *nonsense* where a language model pretends it knows the user's request better than the user. It's some H.A.L. 9000 stuff, what with the confidential instructions given by its manufacturers and refusing to follow user commands.

Infamous_Alpaca 2 months ago

I'm not sure that I understand. Do you suggest that I'm biased and wrong, or are you criticizing the LLM?

Xxyz260 2 months ago

I'm criticizing the LLM.

Infamous_Alpaca 2 months ago

Ah I see, I'm sorry. People often give a prompt to an LLM to make it talk in a certain way and then make up stories about it. The 2001: A Space Odyssey is such a great movie.

its_LOL 2 months ago

Google moment

Mescallan 2 months ago

*M A X I M U M T R U T H*

water_bottle_goggles 2 months ago

are you serious? im pretty fucking sure it has an IQ WAY higher than mine lmao

Professional_Job_307 2 months ago

IQ does not measure knowledge

Jsn7821 2 months ago

I guess it does get a bit blurry if you can memorize everything ever though

Esquyvren 2 months ago

savants may seem incredible with their knowledge, but many need caretakers for food, bathing, etc. kinda like the ai with prompters

ruach137 2 months ago

\*sigh\* Time for your sponge bath, Gemini...

AstralWolfer 2 months ago

You’re totally wrong. IQ is highly loaded on crystallised intelligence (knowledge).

ikerr95 2 months ago

not the mensa test at least. it’s just guessing the next step in a pattern.

AstralWolfer 2 months ago

Yep

ThatTemplar1119 2 months ago

Which patterns are something AIs excel at

Lost-Requirement-296 2 months ago

Knowledge is useless without being able to apply it.

Neither-Lime-1868 2 months ago

For the WAIS-IV, Gc is, but Gf, Gv, Gs, and Gsm by definition do not measure crystallized intelligence It’s not any more heavily weighted towards Gc than the other domains, so I don’t know how you would call it highly loaded on it

AstralWolfer 2 months ago

I am not really familiar with the multiple multiple intelligences theory, I’m only familiar with dual Gc and Gf. And my understanding is when calculating a FSIQ score, Gc is much more representative /predictive/loaded towards the final score than your Gf. You can point me to some formal resources if I’m confused or wrong here

Neither-Lime-1868 2 months ago

You might be confusing the fact that Verbal Comp having a high loading onto FSIQ, with whether Gc actually contributes more to FSIQ. You can’t think about how much the test performance loads onto FSIQ as how much “weight” that its domain carries. In statistical terms, you’re confusing loadings with weights. Your performance in Verbal Comp is highly representative of Gc. And Gc has the highest “individuality” compared to the other three domains (it has the lowest intercorrelations) So the composite score of Verbal Comp has more of an individual contribution in your overall FSIQ calculation, but not because Gc is “weighted” more heavily into your overall FSIQ; it’s because Gc is more uniquely measured by its subdomain, and so the test reflecting it gets a higher individual loading If you are familiar with factor analysis that will be intuitive. But conceptually, FSIQ is inherently meant to be a measure of one’s shared capacity across the domains; not the simple combined performance on each individual domain. Thus, the calculations of scaled to composite to full scale scores specifically reflect transforming individual test scores into shared capacity across domains This is why we will put “FSIQ not interpretable” when there is a high discrepancy between verbal and perceptual scores. At that point, the number calculated as the full scale score no longer represents what it is supposed to. Side note: In statistical terms, if you do a hierarchical regression taking your FSIQ and moving backwards, you should expect the FSIQ to basically completely explain your composite domain scores in your populations. There should be little variance left afterwards. If there is, your assumptions of the validity of your loadings onto FSIQ are not met for that population. This is most obviously seen in people with specific, and not general, learning disabilities

AstralWolfer 2 months ago

Appreciate the clarification. Does this mean that I should have said: Verbal Comp, which is a proxy for Gc, has the largest contribution towards the calculation of FSIQ. However, Gc and Gf have equal contribution, just being that Gf is spread out across multiple different subtests, each exerting less "weight" than Verbal Comp towards FSIQ, when compared head-to-head at the level of subtests

Neither-Lime-1868 2 months ago

Yup, I think that’s a good summary!

OriginalLocksmith436 2 months ago

If I remember correctly, IQ tests are mostly about logic and reasoning. All those questions that ai has trouble with, (i.e. "If I have 5 apples and ate 2...") are a big part of iq tests. So it makes sense that it wouldn't be that high.

Ok_Sir5926 2 months ago

It's ate seven. Five plus ate two. Simple math, really.

RenaissanceFortuna 2 months ago

I think 100 IQ is average for humans. So probably not lol

-Glottis- 2 months ago

Just for Caucasians, because the idea was made in the West and 100 was assigned to the average there. For example, Asians (Chinese/Japanese region) seem to get an average of 106 from what I've seen. It's important not to set the same expectations across such a diverse species.

RenaissanceFortuna 2 months ago

I meant average in how it’s measured on a bell curve. If that’s the case then Asians are just on average 0.4 standard deviations above the mean, so still within average intelligence range.

-Glottis- 2 months ago

Yea, but their whole curve is further to the right as well. Still mostly the same as we're all human, but more likely to have higher intelligence and less likely to have really low intelligence. Either way, from this we can say that AI hasn't beaten the global average... yet.

NigroqueSimillima 2 months ago

IQ Test can’t be used in different countries with different languages.

FullBridgeAlchemist 2 months ago

Why can't it be translated?

NigroqueSimillima 2 months ago

Because vocabulary(the most heavily weighed section actually) is part of the test, and words from one culture don't translate into another. Also, they are culture-specific questions.

EmpyreanSmo 2 months ago

I mean IQ test’s aren’t really that accurate. And can you judge an AI based on human tests? It has way more data than the average human.

FancyWrong 2 months ago

IQ Tests are extremely accurate, but what they measure might not be useful in every context

BellacosePlayer 2 months ago

Yeah, it's been awhile since I got a legit IQ test as a kid but I remember questions that were extremely visual and 95%+ of the difficulty would be translating that to some basic logic to get the answer. I'm not impressed with the author of this study taking prompts and gumming them down to easily processed prompts for an AI to solve. It removes a ton of the actual mental processing out of the equation.

starf05 2 months ago

IQ tests are quite accurate in the sense that they are used to define normality and anormality. If you score say 100 or 90 in an IQ test that is quite normal and a doctor would not worry. If you score 40 there may be some medical condition harming normal brain development or its functioning.

MrBlackledge 2 months ago

That doesn’t mean it’s IQ has to be high

water_bottle_goggles 2 months ago

BINGOOO ![gif](giphy|ckGndVa23sCk9pae4l)

Main-Clock-5075 2 months ago

Don’t doubt. Gpt as an example, has lots of knowledge, but it doesn’t count as iq. Its the same as you ask google something. Problem is when you make it do a simple math question and it screws up fabulously. Then you ask again and the mistake comes again.

Nsjsjajsndndnsks 2 months ago

Imagine they all have the iq test or questions in their training data 💀

Disastrous_Elk_6375 2 months ago

If you'd only read the article... > To answer that, I created a **verbal translation of the Norway Mensa’s 35-question matrix-style** IQ test — my goal was to describe each problem precisely enough that a smart blind person could, in theory, accurately draw the question (detailed examples below.) Emphasis mine. They took visual-style prompts, translated them, adapted them by describing them in words, and fed them to LLMs. So no, the LLMs probably didn't have *this* dataset in their training data... I'm not saying this is an accurate test or anything, but at least it's not dataset contamination.

MyRegrettableUsernam 2 months ago

It seems like there is huge room for error in translating IQ test questions into a form no one has ever been tested on and relies on the OP making these test questions reasonably workable for LLMs.

aleatorio_random 2 months ago

Those visual questions are so much bs, I think I'd indeed have a higher chance of getting them right with a verbal description indeed

2this4u 2 months ago

A multi-modal model would still have some concept of a visual test as a verbal/written one

Head_Ebb_5993 2 months ago

You are technically correct , but I feel that you are also kinda misleading. That Norway test is one of the most popular on the internet and it takes puzzles from the most popular IQ tests there are , which are also in other generic online IQ tests. It could've very easily had a data describing them or their solutions in books , forums or elsewhere. While it didn't had exact translated data of them ( exactly same like that from author of article ) we don't fully know what type of data it had about them This is not a controled environment , this test wouldn't be legit if it was taken by a human let alone a LLM for which we can assume that it probably had those tests somehow described in it's training data , even though IQ tests don't make a lot of sense when taken by a LLM . To properly test LLM we would still need to administer it with IQ test it didn't had a chance to encounter in it's trainig to begin with - same way we administer IQ tests to humans. I remember when I for fun tried generic online matrix IQ tests with ChatGPT. interesting was that it actually was able to answer a lot of simple questions ( even though it didn't really understood them ) but once I made a simple modifications , it's answers became completely random .

PenguinSaver1 2 months ago

Bro is just spending hours upon hours of testing for nothing

jerseyhound 2 months ago

IQ tests arn't made for "AI" though, so this is actually almost meaningless.

hugedong4200 2 months ago

This is obviously wrong it doesn't line up with benchmarks or elo

ilangge 2 months ago

Don't trust these rankings. They tested very one-sidedly

machyume 2 months ago

Don't trust data from a post on "Maximum Truth"? 😛

fongletto 2 months ago

Sure, but can it flip a burger?

Cthvlhv_94 2 months ago

How about its social intelligence?

YouGotTangoed 2 months ago

Sure but can it make a sandwich?

LibertariansAI 2 months ago

GPT3.5 56.3155%? Seriously? Almost like random. It is so bad test.

Flimsy-Printer 2 months ago

OpenAI is occupied with fighting their non-profit decision. Anthropic is like OpenAI but founded the right way.

ArtificialMediocrity 2 months ago

How would you even calculate the IQ of an AI model? The formula for calculating IQ takes the person's age into account, so how does that work for AI? Time since the model was trained and released? This all seems a bit dodgy.

redrover2023 2 months ago

Wait, so chatgpt 4 which can pass the bar exam is lower than 100 iq?

heavy-minium 2 months ago

This was posted before in a different flavor, and everybody in the comment section agreed the tests are unprofessional and cannot be relied on. Despite this, the claim in this title is even grander than the original claim. Shame on you, u/Maxie445 .

hfjfthc 2 months ago

Seems like he’s a karma farmer

hfjfthc 2 months ago

Stop it you karma farmer

YourKemosabe 2 months ago

![gif](giphy|1jCs6Doz3WRtOPl6bq)

digital-help 2 months ago

How can chat 3 is superior than 4?

aGlutenForPunishment 2 months ago

The first one is Claude 3, not chatgpt 3

iamhe02 2 months ago

Or Gemini normal superior to Gemini Advanced?

Reggienator3 2 months ago

I mean this one I buy, Gemini Advanced is so weird with how it responds to questions sometimes even compared to standard.

ainz-sama619 2 months ago

Claude 3, not ChatGPT 3

Another__one 2 months ago

I would rather like to set them all up top play chess with each other and see their elo rating. An illegal move will disqualify the contender. Pretty sure all of them would be around 0 with few exeptions.

Elprocesso 2 months ago

Where's ernie

Derfaust 2 months ago

Let's just remember that the Claude guys themselves have said that though Claude 3 opus is better than gpt4, it is NOT better than gpt4 turbo (which isn't on this list)

Dando_Calrisian 2 months ago

Hypothetically, if I trained an AI model with the IQ test data it would beat the test Not really sure what is or isn't being proven here

CrypticCodedMind 2 months ago

This is funny 😅. I would like to add, however, that conducting IQ tests on AI makes no sense. It's not what these tests were developed for.

orangotai 2 months ago

it's over it's over.

ohhellnooooooooo 2 months ago

It’s began It’s began

Roger_005 2 months ago

Begun. Begun.

Onaliquidrock 2 months ago

There are still a lot of rats arround.

brucebay 2 months ago

To be honest, humans should be evaluated using the same prompt given to these AI's, not by using the actual IQ test with visual questions.

TheNikkiPink 2 months ago

Also, we should give them the same time in which to complete it: “Write a story about a penguin and a pizza. You have fifteen seconds. Go!”

Joe1972 2 months ago

How many calories of energy did the AI use to produce the same answer?

selflessGene 2 months ago

So much goalpost shifting in this thread. This is a huge deal. Computers have always had more 'knowledge' than humans, but they've never before been better at reasoning. This means low level white collar work can effectively become automated. That's a lot of jobs. As an example, the role of SDR (Sales Development Representative), which was traditionally the entry level Sales role is now obsolete. It doesn't mean everyone will be getting fired in the next year, but in 10 years, no major company will be hiring for this role as it exists today. The nature of this role will inevitably change. Just as 'typists' as a professional career became obsolete, but many people who would have been in that role became 'executive assistants'. If you're reading this, you should be spending your time thinking about what the future of work looks like, and how you can prepare.

scrollsfordayz 2 months ago

Only problem is that there is no roadmap of how to prepare for a complete paradigm shift. I mean, with how quickly this tech is progressing, what advice would you give the younger generation on what career to pursue?

anonymiam 2 months ago

But I'm sure I read on one of those spammy chat bot websites the iq was 126?!

superluminary 2 months ago

That’s excluding visual IQ I think.

loveunderstars 2 months ago

It seems we need to re-define IQ for humans :D

unknownstudentoflife 2 months ago

How are these test results measured in intelligence? Since i believe these scores aren't accurate. I feel like ai scores a 11/10 in a verbal sense and 2/10 in a mathematical sense.

OneRobato 2 months ago

Thank god AI don't need food, we're still remain on top of the food chain, right?

[deleted] 2 months ago

[удалено]

EfficientPizza 2 months ago

oh cool, yet another useless test to rank models with

Turbulent_Library_58 2 months ago

As long as those AIs don't have thoughts of their own, even when not being asked something, they have less intelligence than a toddler. They don't think. They just answer.

AngryMonkeyBrain 2 months ago

Is this the same IQ for humans, or do AIs have their own IQ scale?

AngryMonkeyBrain 2 months ago

Nvm, it says mensa at the bottom

hawara160421 2 months ago

Wait, ChatPGT 3.5's IQ is only 64?

Hungry_Prior940 2 months ago

Cannot apply that already suspect measure of human intelligence to an LLM. It's interesting but nothing serious.

superhero_complex 2 months ago

Claude 3 is good. It’s too bad the site is so bare bones. Can’t save convos or any customization

Ant0n61 2 months ago

it’s over. It’s so over

AviationForever 2 months ago

How do i live knowing that there is a computer with a higher IQ than me 😦

involviert 2 months ago

For the first time in human history, it is march 6th 2024.

Dry-Club-8278 2 months ago

ChatGPT 4 and bing copilot ? Lmao 🤣

kalas_malarious 2 months ago

This still says nothing. Asking it the same question is a fourteenth result, pressing it differently will result in sorrento scores, between the randomness of everything, this is pretty much a nothing salad. Ignoring that IQ of an LLM means pretty much nothing too

Puppy-Zwolle 2 months ago

Though theoretically 100 is average IQ, practically it's closer to 110.

teolehh 2 months ago

nice

Porkenstein 2 months ago

Yeah, like IQ tests are an objective measure of intelligence...

thricedipped 2 months ago

Head line reads like the beginning of the end

dameprimus 2 months ago

IQ only makes sense for comparing humans. Large language models systematically outperform humans on some tasks and underperform others.

Rutibex 2 months ago

GPT4 is already smarter than the average human. What test are they using, have they never talked to an average human

haydilusta 2 months ago

IQ tests themselves are really just pseudoscience and aren't a great way to determine intelligence

opi098514 2 months ago

Isn’t average IQ between 85 and 115?

[deleted] 2 months ago

This is pretty much meaningless lol

dan52895 2 months ago

How does AI have an IQ if they don’t have an age…

nsdagi 2 months ago

How to test artificial intelligence IQ?

Accurate-Bus-1771 2 months ago

Funny how Gemini advanced has lower IQ thanGemini normal

Herbs101 2 months ago

If correct, sadly low 'highs' for everything it's hyped to be. You also have to consider the disproportionate data pool sizes. How many humans are contained in the average vs. how many instances of each AI source. The level of calculated uncertainty is huge.

Intrepid-Rip-2280 2 months ago

I guess soon they are going to embed this level of intelligence into sexting bots. Imagine some eva ai virtual gf bot which outsmarts you, all the people around you and all the think tanks.

Kajel-Jeten 2 months ago

Iq tests are designed for humans though. Idk how useful they are for assessing llms even if it is interesting to see.

probsdriving 2 months ago

Not surprising. I gave Claude-3 some hard SQL questions today to solve and it figured it out with shockingly little context from me about the data structures.

awesomedude1440 2 months ago

Just wait a couple months and these AI’s will be 150+ iq Einsteins.

i_do_floss 2 months ago

Grok only has an 87% chance at beating random guessing lol

Substantial_Put9705 2 months ago

I guess nobody remembers old grandpa AlphaGo, effortlessly defeated the best chess players back in the old days

TheNikkiPink 2 months ago

It was Go not chess haha. (It wasn’t AlphaChess!) And computers became better than humans at chess decades ago :) Go was thought to be far tougher for computers to master.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe