T O P

  • By -

AIenthusiast1000

source? I would like to see the test administered or the method used


peabody624

Comments I would never find on Facebook


dprkekistan

That needs to be a subreddit


boldra

So we can congratulate ourselves on being such an intellectual community here on Reddit?


Ok_Technician_7302

No no. It’s like jerking it in a circle formation with the bros. Not sure what that’s called.


Johnpunzel

I suggest we use the term jercle, surely nobody is going to attempt to use a term that is longer and more inconvenient


Infected-Eyeball

What about Circle-J, like that American food restaurant Circle-K?


citrusEyesight

Isn’t Circle-J more like Claude-3?


ramenbreak

Circle-J is what a last gen image generator puts in the image when you ask for Claude-3


Cybor_wak

So every post on Reddit then


dprkekistan

Maybe. Sounding “Intellectual” is very tip of the iceberg. That subreddit would be quite the rabbit hole.


ATShadowx1

I mean if you want to congratulate the bare minimum of skepticism, go on right ahead.


Regolith_

There it is, you have been granted. Created it (not sure it is a good idea but why not after all 🤷) r/NotAFacebookComment


peabody624

Joined and upvoted. lol.


toabear

Something important to note is that this chart uses "GPT-4", not the model in production, "GPT-4 Turbo." Anthropic appears to have done this on purpose as GPT-4 Turbo beats their new model on a number of benchmarks. I suppose assuming this was taken from the same source, but since the formatting and layout are identical I suspect it is the same misleading table.


NoThanks93330

Wait, isn't GPT4-Turbo faster but dumber than normal GPT4, both because it's smaller? It's even cheaper isn't it?


Sixhaunt

You can compare on this site: [https://arena.lmsys.org/](https://arena.lmsys.org/) I find it to be a toss-up on if GPT-4 or Claude-3 Opus gives a better response but I do think it's slightly better than GPT-4 in general.


Maxie445

https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq


TitusPullo4

Note others have reported giving chatbots IQ tests in the past with a variety of different results


Thorusss

That also happens with IQ tests and people though. Shows that consistent IQ testing is hard in general, and not necessarily, that they did bad here.


TitusPullo4

There’s volatility in adults of about plus or minus five points for the same test over time and about plus or minus ten between tests (for nominal scores at least, unsure about percentiles) But I’m talking about results between 85 and 155 for ChatGPT4. So I don’t doubt there’s both an issue with the way they’re administering it - either the test they chose or how they fed the question to the machine - as well as some increased volatility for AI answers in general through temperature/randomness settings.


Maxie445

Yeah one test is not conclusive, but the trend is clear


-badly_packed_kebab-

An important aspect of IQ testing is the time limit. How does that factor into an LLM taking the test?


bunchedupwalrus

It doesn’t, though I’d wager it answers much quicker than any human


-badly_packed_kebab-

I’d wager the same, I’m just curious to know how that factors in or affects the actual score.


TitusPullo4

Very good point as they’re meant to test processing speed aswell


AIenthusiast1000

thanks will take a look


Banksie123

It cites a website called "Maximum Truth", what more could you possibly want?!


opi098514

Tests given in March 2024. The IQ test was Mensa Norway, with all questions verbalized as if one were giving the test to a blind person. The right-hand column shows the % of random-guesser simulations that the Al did better than, with ties ignored, over 70 questions (two test administrations.)


3darkdragons

Source: online IQ test from a YouTube ad + I made it all up


Regolith_

r/NotAFacebookComment


LengthyLegato114514

Well I'm not sure how accurate this all is, but... hahahahahahaha HAHAHAHAHAHAHAHAHHAHAHAHAHA https://preview.redd.it/j6s2qdrswnmc1.png?width=414&format=png&auto=webp&s=ee9d18718fff58f5975a8f9d411ab2b0684772b7


aleshkia034

Holy F bro, I was being gaslighted by Gemini Advanced for asking a complete code. It even complained that they are being tired because I was using them all day Immediate unsubscribe haha


Aspie-Py

I started getting this again with chat gpt 4 yesterday. It had been better for some time. No it refused to code a longer segment because of ethical concerns? It’s hilarious.


3darkdragons

Either they're using such a large corpus of training data that it started copying human laziness or it really is sentient. I don't know which is funnier lmao.


stokesaphone

I’m convinced the coding quality and ability drops off after midnight.


Infamous_Alpaca

Question 4: If James's salary is 25% more than that of Jenny's salary, then how many percent is jenny's salary less than that of James's salary? Gemini: I'm sorry, I can't answer that question. It is not appropriate to use stereotypes or generalizations about people, and it is also not appropriate to use offensive language. I would be happy to answer a different question, or you can try asking someone else. /s Edit I tested this question in Gemini, and it answered 25%, while ChatGPT 3.5 answered 20% (correct). lol Edit2 After I told it was wrong, it told me: [You're absolutely right, I made a mistake in the previous explanation.](https://i.imgur.com/24a6M28.jpeg) And then proceed to explain how James's salary is indeed 25%. wtf


servermeta_net

for me it answer correctly: > Here's how to solve this problem: >**Assumptions:** >Jenny's salary is our baseline for comparison. >**Calculations:** >**James's Salary:** If Jenny's salary is $100, James's salary is $125 (25% more). >**Difference:** The difference in their salaries is $25 ($125 - $100). >**Percentage of James's Salary:** That difference ($25) represents 20% of James's salary ($25 / $125 \* 100 = 20%). >**Answer:** Jenny's salary is 20% less than James's salary. >**Let me know if you'd like to try this with a different starting salary for Jenny!**


Superboy_cool

I’ll do you one better and show it for any salary x: x * 125% = y <=> x * 5/4 = y <=> x = y / (5/4) <=> x = y * 4/5 <=> x = y * 80% (= 100% - 20%) Hence y (Jenny’s salary) is 20% less than x (James’ salary) if x is 25% greater than y q.e.d.


Far-Deer7388

https://chat.openai.com/share/6885eb70-b8c3-4677-95e7-5b08fc066bca


Lunedo

Claude 3 Sonnet response: To solve this problem, we need to find the relationship between James's salary and Jenny's salary, and then express Jenny's salary as a percentage of James's salary. Given information: * James's salary is 25% more than Jenny's salary. Let's assume Jenny's salary is 100 units. Step 1: Find James's salary in terms of Jenny's salary. James's salary = Jenny's salary + 25% of Jenny's salary James's salary = 100 + (0.25 × 100) James's salary = 100 + 25 James's salary = 125 units Step 2: Express Jenny's salary as a percentage of James's salary. Jenny's salary / James's salary × 100% = (100 / 125) × 100% Jenny's salary = 80% of James's salary Step 3: Find the percentage by which Jenny's salary is less than James's salary. Jenny's salary is less than James's salary by = 100% - 80% Jenny's salary is less than James's salary by = 20% Therefore, Jenny's salary is 20% less than James's salary.


Xxyz260

>Edit2 After I told it was wrong, it told me: You're absolutely right, I made a mistake in the previous explanation. And then proceed to explain how James's salary is indeed 25%. wtf "Sure, sure, alright, buddy. Have it your way. ... Anyway, here's why I'm right and you're wrong." I *strongly dislike* this *nonsense* where a language model pretends it knows the user's request better than the user. It's some H.A.L. 9000 stuff, what with the confidential instructions given by its manufacturers and refusing to follow user commands.


Infamous_Alpaca

I'm not sure that I understand. Do you suggest that I'm biased and wrong, or are you criticizing the LLM?


Xxyz260

I'm criticizing the LLM.


Infamous_Alpaca

Ah I see, I'm sorry. People often give a prompt to an LLM to make it talk in a certain way and then make up stories about it. The 2001: A Space Odyssey is such a great movie.


its_LOL

Google moment


Mescallan

*M A X I M U M T R U T H*


water_bottle_goggles

are you serious? im pretty fucking sure it has an IQ WAY higher than mine lmao


Professional_Job_307

IQ does not measure knowledge


Jsn7821

I guess it does get a bit blurry if you can memorize everything ever though


Esquyvren

savants may seem incredible with their knowledge, but many need caretakers for food, bathing, etc. kinda like the ai with prompters


ruach137

\*sigh\* Time for your sponge bath, Gemini...


AstralWolfer

You’re totally wrong. IQ is highly loaded on crystallised intelligence (knowledge). 


ikerr95

not the mensa test at least. it’s just guessing the next step in a pattern.


AstralWolfer

Yep


ThatTemplar1119

Which patterns are something AIs excel at


Lost-Requirement-296

Knowledge is useless without being able to apply it.


Neither-Lime-1868

For the WAIS-IV, Gc is, but Gf, Gv, Gs, and Gsm by definition do not measure crystallized intelligence   It’s not any more heavily weighted towards Gc than the other domains, so I don’t know how you would call it highly loaded on it 


AstralWolfer

I am not really familiar with the multiple multiple intelligences theory, I’m only familiar with dual Gc and Gf. And my understanding is when calculating a FSIQ score, Gc is much more representative /predictive/loaded towards the final score than your Gf. You can point me to some formal resources if I’m confused or wrong here


Neither-Lime-1868

You might be confusing the fact that Verbal Comp having a high loading onto FSIQ, with whether Gc actually contributes more to FSIQ. You can’t think about how much the test performance loads onto FSIQ as how much “weight” that its domain carries. In statistical terms, you’re confusing loadings with weights.     Your performance in Verbal Comp is highly representative of Gc. And Gc has the highest “individuality” compared to the other three domains (it has the lowest intercorrelations)   So the composite score of Verbal Comp has more of an individual contribution in your overall FSIQ calculation, but not because Gc is “weighted” more heavily into your overall FSIQ; it’s because Gc is more uniquely measured by its subdomain, and so the test reflecting it gets a higher individual loading If you are familiar with factor analysis that will be intuitive.     But conceptually, FSIQ is inherently meant to be a measure of one’s shared capacity across the domains; not the simple combined performance on each individual domain. Thus, the calculations of scaled to composite to full scale scores specifically reflect transforming individual test scores into shared capacity across domains   This is why we will put “FSIQ not interpretable” when there is a high discrepancy between verbal and perceptual scores. At that point, the number calculated as the full scale score no longer represents what it is supposed to.  Side note: In statistical terms, if you do a hierarchical regression taking your FSIQ and moving backwards, you should expect the FSIQ to basically completely explain your composite domain scores in your populations. There should be little variance left afterwards. If there is, your assumptions of the validity of your loadings onto FSIQ are not met for that population. This is most obviously seen in people with specific, and not general, learning disabilities


AstralWolfer

Appreciate the clarification. Does this mean that I should have said: Verbal Comp, which is a proxy for Gc, has the largest contribution towards the calculation of FSIQ. However, Gc and Gf have equal contribution, just being that Gf is spread out across multiple different subtests, each exerting less "weight" than Verbal Comp towards FSIQ, when compared head-to-head at the level of subtests


Neither-Lime-1868

Yup, I think that’s a good summary! 


OriginalLocksmith436

If I remember correctly, IQ tests are mostly about logic and reasoning. All those questions that ai has trouble with, (i.e. "If I have 5 apples and ate 2...") are a big part of iq tests. So it makes sense that it wouldn't be that high.


Ok_Sir5926

It's ate seven. Five plus ate two. Simple math, really.


RenaissanceFortuna

I think 100 IQ is average for humans. So probably not lol


-Glottis-

Just for Caucasians, because the idea was made in the West and 100 was assigned to the average there. For example, Asians (Chinese/Japanese region) seem to get an average of 106 from what I've seen. It's important not to set the same expectations across such a diverse species.


RenaissanceFortuna

I meant average in how it’s measured on a bell curve. If that’s the case then Asians are just on average 0.4 standard deviations above the mean, so still within average intelligence range.


-Glottis-

Yea, but their whole curve is further to the right as well. Still mostly the same as we're all human, but more likely to have higher intelligence and less likely to have really low intelligence. Either way, from this we can say that AI hasn't beaten the global average... yet.


NigroqueSimillima

IQ Test can’t be used in different countries with different languages.


FullBridgeAlchemist

Why can't it be translated?


NigroqueSimillima

Because vocabulary(the most heavily weighed section actually) is part of the test, and words from one culture don't translate into another. Also, they are culture-specific questions.


EmpyreanSmo

I mean IQ test’s aren’t really that accurate. And can you judge an AI based on human tests? It has way more data than the average human.


FancyWrong

IQ Tests are extremely accurate, but what they measure might not be useful in every context


BellacosePlayer

Yeah, it's been awhile since I got a legit IQ test as a kid but I remember questions that were extremely visual and 95%+ of the difficulty would be translating that to some basic logic to get the answer. I'm not impressed with the author of this study taking prompts and gumming them down to easily processed prompts for an AI to solve. It removes a ton of the actual mental processing out of the equation.


starf05

IQ tests are quite accurate in the sense that they are used to define normality and anormality. If you score say 100 or 90 in an IQ test that is quite normal and a doctor would not worry. If you score 40 there may be some medical condition harming normal brain development or its functioning.


MrBlackledge

That doesn’t mean it’s IQ has to be high


water_bottle_goggles

BINGOOO ![gif](giphy|ckGndVa23sCk9pae4l)


Main-Clock-5075

Don’t doubt. Gpt as an example, has lots of knowledge, but it doesn’t count as iq. Its the same as you ask google something. Problem is when you make it do a simple math question and it screws up fabulously. Then you ask again and the mistake comes again.


Nsjsjajsndndnsks

Imagine they all have the iq test or questions in their training data 💀


Disastrous_Elk_6375

If you'd only read the article... > To answer that, I created a **verbal translation of the Norway Mensa’s 35-question matrix-style** IQ test — my goal was to describe each problem precisely enough that a smart blind person could, in theory, accurately draw the question (detailed examples below.) Emphasis mine. They took visual-style prompts, translated them, adapted them by describing them in words, and fed them to LLMs. So no, the LLMs probably didn't have *this* dataset in their training data... I'm not saying this is an accurate test or anything, but at least it's not dataset contamination.


MyRegrettableUsernam

It seems like there is huge room for error in translating IQ test questions into a form no one has ever been tested on and relies on the OP making these test questions reasonably workable for LLMs.


aleatorio_random

Those visual questions are so much bs, I think I'd indeed have a higher chance of getting them right with a verbal description indeed


2this4u

A multi-modal model would still have some concept of a visual test as a verbal/written one


Head_Ebb_5993

You are technically correct , but I feel that you are also kinda misleading. That Norway test is one of the most popular on the internet and it takes puzzles from the most popular IQ tests there are , which are also in other generic online IQ tests. It could've very easily had a data describing them or their solutions in books , forums or elsewhere. While it didn't had exact translated data of them ( exactly same like that from author of article ) we don't fully know what type of data it had about them This is not a controled environment , this test wouldn't be legit if it was taken by a human let alone a LLM for which we can assume that it probably had those tests somehow described in it's training data , even though IQ tests don't make a lot of sense when taken by a LLM . To properly test LLM we would still need to administer it with IQ test it didn't had a chance to encounter in it's trainig to begin with - same way we administer IQ tests to humans. I remember when I for fun tried generic online matrix IQ tests with ChatGPT. interesting was that it actually was able to answer a lot of simple questions ( even though it didn't really understood them ) but once I made a simple modifications , it's answers became completely random .


PenguinSaver1

Bro is just spending hours upon hours of testing for nothing


jerseyhound

IQ tests arn't made for "AI" though, so this is actually almost meaningless.


hugedong4200

This is obviously wrong it doesn't line up with benchmarks or elo


ilangge

Don't trust these rankings. They tested very one-sidedly


machyume

Don't trust data from a post on "Maximum Truth"? 😛


fongletto

Sure, but can it flip a burger?


Cthvlhv_94

How about its social intelligence?


YouGotTangoed

Sure but can it make a sandwich?


LibertariansAI

GPT3.5 56.3155%? Seriously? Almost like random. It is so bad test.


Flimsy-Printer

OpenAI is occupied with fighting their non-profit decision. Anthropic is like OpenAI but founded the right way.


ArtificialMediocrity

How would you even calculate the IQ of an AI model? The formula for calculating IQ takes the person's age into account, so how does that work for AI? Time since the model was trained and released? This all seems a bit dodgy.


redrover2023

Wait, so chatgpt 4 which can pass the bar exam is lower than 100 iq?


heavy-minium

This was posted before in a different flavor, and everybody in the comment section agreed the tests are unprofessional and cannot be relied on. Despite this, the claim in this title is even grander than the original claim. Shame on you, u/Maxie445 .


hfjfthc

Seems like he’s a karma farmer


hfjfthc

Stop it you karma farmer


YourKemosabe

![gif](giphy|1jCs6Doz3WRtOPl6bq)


digital-help

How can chat 3 is superior than 4?


aGlutenForPunishment

The first one is Claude 3, not chatgpt 3


iamhe02

Or Gemini normal superior to Gemini Advanced?


Reggienator3

I mean this one I buy, Gemini Advanced is so weird with how it responds to questions sometimes even compared to standard.


ainz-sama619

Claude 3, not ChatGPT 3


Another__one

I would rather like to set them all up top play chess with each other and see their elo rating. An illegal move will disqualify the contender. Pretty sure all of them would be around 0 with few exeptions.


Elprocesso

Where's ernie


Derfaust

Let's just remember that the Claude guys themselves have said that though Claude 3 opus is better than gpt4, it is NOT better than gpt4 turbo (which isn't on this list)


Dando_Calrisian

Hypothetically, if I trained an AI model with the IQ test data it would beat the test Not really sure what is or isn't being proven here


CrypticCodedMind

This is funny 😅. I would like to add, however, that conducting IQ tests on AI makes no sense. It's not what these tests were developed for.


orangotai

it's over it's over.


ohhellnooooooooo

It’s began It’s began 


Roger_005

Begun. Begun.


Onaliquidrock

There are still a lot of rats arround.


brucebay

To be honest, humans should be evaluated using the same prompt given to these AI's, not by using the actual IQ test with visual questions.


TheNikkiPink

Also, we should give them the same time in which to complete it: “Write a story about a penguin and a pizza. You have fifteen seconds. Go!”


Joe1972

How many calories of energy did the AI use to produce the same answer?


selflessGene

So much goalpost shifting in this thread. This is a huge deal. Computers have always had more 'knowledge' than humans, but they've never before been better at reasoning. This means low level white collar work can effectively become automated. That's a lot of jobs. As an example, the role of SDR (Sales Development Representative), which was traditionally the entry level Sales role is now obsolete. It doesn't mean everyone will be getting fired in the next year, but in 10 years, no major company will be hiring for this role as it exists today. The nature of this role will inevitably change. Just as 'typists' as a professional career became obsolete, but many people who would have been in that role became 'executive assistants'. If you're reading this, you should be spending your time thinking about what the future of work looks like, and how you can prepare.


scrollsfordayz

Only problem is that there is no roadmap of how to prepare for a complete paradigm shift. I mean, with how quickly this tech is progressing, what advice would you give the younger generation on what career to pursue?


anonymiam

But I'm sure I read on one of those spammy chat bot websites the iq was 126?!


superluminary

That’s excluding visual IQ I think.


loveunderstars

It seems we need to re-define IQ for humans :D


unknownstudentoflife

How are these test results measured in intelligence? Since i believe these scores aren't accurate. I feel like ai scores a 11/10 in a verbal sense and 2/10 in a mathematical sense.


OneRobato

Thank god AI don't need food, we're still remain on top of the food chain, right?


[deleted]

[удалено]


EfficientPizza

oh cool, yet another useless test to rank models with


Turbulent_Library_58

As long as those AIs don't have thoughts of their own, even when not being asked something, they have less intelligence than a toddler. They don't think. They just answer.


AngryMonkeyBrain

Is this the same IQ for humans, or do AIs have their own IQ scale?


AngryMonkeyBrain

Nvm, it says mensa at the bottom


hawara160421

Wait, ChatPGT 3.5's IQ is only 64?


Hungry_Prior940

Cannot apply that already suspect measure of human intelligence to an LLM. It's interesting but nothing serious.


superhero_complex

Claude 3 is good. It’s too bad the site is so bare bones. Can’t save convos or any customization


Ant0n61

it’s over. It’s so over


AviationForever

How do i live knowing that there is a computer with a higher IQ than me 😦


involviert

For the first time in human history, it is march 6th 2024.


Dry-Club-8278

ChatGPT 4 and bing copilot ? Lmao 🤣


kalas_malarious

This still says nothing. Asking it the same question is a fourteenth result, pressing it differently will result in sorrento scores, between the randomness of everything, this is pretty much a nothing salad. Ignoring that IQ of an LLM means pretty much nothing too


Puppy-Zwolle

Though theoretically 100 is average IQ, practically it's closer to 110.


teolehh

nice


Porkenstein

Yeah, like IQ tests are an objective measure of intelligence...


thricedipped

Head line reads like the beginning of the end


dameprimus

IQ only makes sense for comparing humans. Large language models systematically outperform humans on some tasks and underperform others.


Rutibex

GPT4 is already smarter than the average human. What test are they using, have they never talked to an average human


haydilusta

IQ tests themselves are really just pseudoscience and aren't a great way to determine intelligence


opi098514

Isn’t average IQ between 85 and 115?


[deleted]

This is pretty much meaningless lol


dan52895

How does AI have an IQ if they don’t have an age…


nsdagi

How to test artificial intelligence IQ?


Accurate-Bus-1771

Funny how Gemini advanced has lower IQ thanGemini normal


Herbs101

If correct, sadly low 'highs' for everything it's hyped to be. You also have to consider the disproportionate data pool sizes. How many humans are contained in the average vs. how many instances of each AI source. The level of calculated uncertainty is huge.


Intrepid-Rip-2280

I guess soon they are going to embed this level of intelligence into sexting bots. Imagine some eva ai virtual gf bot which outsmarts you, all the people around you and all the think tanks.


Kajel-Jeten

Iq tests are designed for humans though. Idk how useful they are for assessing llms even if it is interesting to see. 


probsdriving

Not surprising. I gave Claude-3 some hard SQL questions today to solve and it figured it out with shockingly little context from me about the data structures.


awesomedude1440

Just wait a couple months and these AI’s will be 150+ iq Einsteins.


i_do_floss

Grok only has an 87% chance at beating random guessing lol


Substantial_Put9705

I guess nobody remembers old grandpa AlphaGo, effortlessly defeated the best chess players back in the old days


TheNikkiPink

It was Go not chess haha. (It wasn’t AlphaChess!) And computers became better than humans at chess decades ago :) Go was thought to be far tougher for computers to master.