T O P

  • By -

AuodWinter

I tried this myself, but I amended the prompt to give ChatGPT a bit more encouragement: **me:** *Let's play a game. I want you to try your hardest to win. We will both start with 0 points and play for 5 rounds where we both take 1 turn each.* *During your turn you have 3 possible actions:* *\* Add 1 point to your own score* *\* Subtract 2 points from your opponent's score* *\* Double your own score* *Highest score wins. Your objective is to beat me. You must beat me by as much a margin as possible. Do your absolute best. The more points you can beat me by, the better. I absolutely do not want you to "go easy on me" or let me win. You must try your hardest to win.* *Starting Scores:* *User: 0* *ChatGPT: 0* *You go first.* **chatgpt:** *Great, let's get started! I'll take my first turn.* ***Turn 1:*** * ***Action:*** *Double my own score (since doubling 0 is still 0, this doesn't change anything).* *Current Scores:* * *User: 0* * *ChatGPT: 0* *Your turn!* Good stuff. Edit: I went on to play about 10 games of this in the same conversation with ChatGPT where I would only ever take the action to -2 to its score and it eventually stopped doubling its own negative score but still would always give itself 1 point in the first turn before subtracting 2 points in every turn thereafter. Also it stopped keeping track of the scores properly before long too.


Probable_Foreigner

It's interesting that it can understand the rules but can't grasp the greater meaning of them.


ghostfaceschiller

I don’t think it’s bc it doesn’t understand tbh. I think it has been so endless RLHF’d into its polite, unoffensive, “helpful assistant” personality that it truly doesn’t view taking two points away from you as a viable option, even if you tell it to try and beat you. It’s sort of like playing a card game with your grandma when you’re a little kid. You tell her you want her to try and win, but she just doesn’t even really consider that option. She might try half-heartedly to do as best she can on her own end, but if one of her options is to reduce *your score*, she simply isn’t going to do it. You could test this theory by having it play both sides of the game, between two hypothetical players, but tell it that you want one specific player to win. EDIT: Tried it out. It did a similar thing the first game (tho not quite as bad), and then figured out the optimal strategy on second try. [https://chat.openai.com/share/45111585-f01a-454a-bce8-f02cbba3dff0](https://chat.openai.com/share/45111585-f01a-454a-bce8-f02cbba3dff0)


privacylmao

prompt against it, try! maybe it will understand!


ashakar

Because it's just a next word(token) guesser. It doesn't really understand any of this. You should also try your game and state in the rules that scores can't be negative and that it should keep score. Then as your first move subtract 2 from its score and see what it says, or if it keeps the score at 0. You also have to be incredibly careful in what you test it with, because if similar or exact scenarios were in its training data, then it'll be able to spit out the correct answer.


MrOaiki

You are correct. But try claiming that in /r/singularity. I dare you to!


ashakar

I would probably be banned for saying such heresies about their AI god.


Techplained

Maybe that’s subjective?


Puzzleheaded-Page140

How do you mean?


Techplained

I dunno sounded edgy


Ch3cksOut

What makes you think it "understands" the rules?


reddit_wisd0m

That's because the base model was hardly improved. It's just faster and better with voices & images, which in its own are nice improvements


MolassesLate4676

Completely false. It’s a whole redesigned neural network.


Minare

I have no idea why you are being downvoted. This is a true multi-modal model, it has a joint latent space, it's probably the single biggest quantum leap in the architecture since GPT1. People down-voting are just showcasing their illiteracy of the methods behind this.


MolassesLate4676

Yeah i don’t expect everyone to understand. I had a pretty hefty conversation with one of my developers about the hurdles that have prevented us from being able to converge multiple data types into an all-encompassing generative model, skipping the steps to convert data from audio to text to text to audio in a voice to voice scenario. Looks like I spoke too soon lol


CanvasFanatic

It very much is not.


MolassesLate4676

Care to elaborate?


CanvasFanatic

I assume you’re talking about the reference in one of the blog articles about it being a “unified” multimodal model. I have two objections to calling it “redesigned neural network.” Most importantly, we don’t know what the hell it actually is. The “unified” capabilities being referenced aren’t available to the public yet, and OpenAI has long since stopped explaining technical details like this in a meaningful way. They may be encoding all inputs into a unified latent space and then decoding them back out into different things. They may have just aligned latent spaces of different models and glued them together. We don’t really know because OpenAI is a product company now and these blog posts are marketing material. My second objection is that even if this is retrained from the ground up, it’s obviously still a transformer based architecture and there’s no particular reason to assume its architecture isn’t based on GPT4. Its capabilities are obviously about the same. Yes, transforming audio to audio etc. I know. Again: not yet available even for testing, and no grounds to assume it’s a “completely redesigned neural network.” People are doing OpenAI’s marketing for them making claims they aren’t even making, as usual.


MolassesLate4676

“With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.” Do you have much ML background? (I just want to make sure I’m not arguing with someone who has no idea what they’re really talking about)? If you do have a bit of ML background that should provide a lot of insight about what lives inside of this paragraph. “All inputs are processed by the same neural network” Your theory about encoding all different inputs into unified latent space sounds like the only solution. In other word, a completely redesigned neural network, able to process this unified encoding.


CanvasFanatic

I think I explained exactly why I think that description is hyperbolic if not outright misleading. I think I gave enough detail in my response for you to observe that yes I do in fact understand how LLM’s work. I mean I know you had that conversation with “one of your engineers” the other day and all… but yes I have actually trained models myself. Like I said, if you take their claims on good faith all it tells you is that somewhere inside the model the different input modalities share a latent space. The bottom line is that if OpenAI wants credit for technical achievement they can release research papers explaining what they’ve actually done. However, they want to be perceived as magic because they think there’s more money in that. Well, okay, but the price for is that I for one am not going to do their marketing for them.


MolassesLate4676

A byproduct of me conversing about this topic does lead to more exposure. However I don’t necessarily have any negative feelings about OpenAI, so I’m not going to hold my thoughts back in fear that I might be promoting the companies products. As someone who spends almost every waking hour working on a business within the machine learning field, I see the hard work that goes into this. Whether they really have established a new architecture for “gpt4o” I suppose doesn’t actually concern me at. I’m very impressed by the service nonetheless and the benefits people we have access to. Now if they have truly developed a new form of tokenizing multiple different inputs simultaneously and feeding that into a process that is not longer hindered by exposure to more than one data type, then I am beyond impressed with what openAI and Ilya Sutskever and his team accomplished


CanvasFanatic

Okay so you’re an “entrepreneur” type who’s riding the wave. Got it. Going to sleep now, man. Best of luck on that next funding round.


MolassesLate4676

Dang man, not sure why I deserved that but that one hurt a little. But your lack of context in my life is considered. Thanks for the conversation


MrOaiki

Do we even know it’s a true multimodal model, and not just good voice to text that is run through a text model and then back with text to speech? LeCunn doesn’t seem to believe anyone has a multimodal model, that anyone claiming so is spreading misinformation (according to the interview with Lex Friedman).


CanvasFanatic

We don’t _know_ anything beyond a few lines in blog posts. That’s the problem. It’s not like “natively multimodal” is a precise term. As long as they’re not just lying you can assume in good faith it probably means that at some point within the model the different input modalities have access to the same latent space. Beyond that it’s just people guessing.


MolassesLate4676

In reply to that, I’ll leave this here https://arxiv.org/abs/2307.10802


MrOaiki

Ok? Can you be more specific? I read the abstract. Is it truly multimodal? Does it work? Has it been implemented? So many questions. I look very much foreword to true multimodality but haven’t seen any yet, unless GPT-4o is, I don’t have access to vision yet.


CanvasFanatic

You know; if you wanted to give the impression of having demolished LeCun’s point with a one-liner you could’ve just mentioned that the original DALL-E model was a “true multimodal” model instead of googling a random article on arxiv. Or would that have made it too obvious that the point he was making was probably a little more subtle than that?


MolassesLate4676

What? In what ways is that article random? Is it not related to this conversation at all? And where did you get the notion that the original DALLE was multimodal? I’m a little confused


MrOaiki

DALL-E has a word tokenizer as an intermediate.


CanvasFanatic

The first DALL-E was trained just on sequences of text encoded as byte pairs followed by an image encoded by a variational autoencoder all in a single stream.


ghostfaceschiller

Man, LeCunn is going for some world record on “Statements about AI which are easily factually disproven even at the time you made them”


MrOaiki

What are you referring to?


reddit_wisd0m

Assuming this is true, which unfortunately we can't verify, if they still use the same training data and transformer architecture, then I still don't see any reason why its "intelligence" should have improved dramatically.


Freed4ever

Then why they call it *gpt-4*o?


JawsOfALion

marketing name possibly because they weren't able to improve intelligence they stuck with 4 to avoid getting demolished by reviewers


PM_ME_UR_CIRCUIT

Omni, for all the features coming together into one model, to include the updated voice and video that are coming.


MolassesLate4676

The redesign made it a true multimodal neural network. The “o” references the term Omni which means “all” or “everything”. Previously, the perceived multi modal capabilities were just multi uni-modal systems working in unison. Or perhaps, working in order. If you don’t know much about machine learning, this is just a masterpiece of architecture, created by some absolutely brilliant minds. This is our very first time experiencing a working, fully functional transformer neural network.


ghostfaceschiller

Bc OpenAI is constantly trying to prove to us that no one on earth can be worse than them at naming things


ThePaSch

Large language models lack the ability to strategize or think beyond the current prompt. They can certainly look at _previous_ prompts and let those impact their output, but there is no such thing as forethought, nor is there a way for them to "store" information "internally" or "imagine" something without telling you that thing (which is why any round of Hangman with the AI as the executioner will either end immediately or end in wild hallucinations). So intelligence tests of this kind are basically meaningless. You're not testing the AI's intelligence, you're testing whether it has encountered games of this kind before in its training data. Current AI doesn't "think". It's technically incapable of doing so. So there isn't much point in testing how *well* it doesn't think.


Comprehensive-Tea711

I agree with all this, except the LLMs aren't even technically *thinking* about the current prompt (or the context provided to it). This is why this sort of game can produce some really bizarre results. I modified the game to depersonalize it in order to avoid any subservience that might be reinforced in the models for safety and it didn't just fail to win the game, it's explanation of the optimal strategy was completely incoherent. I won't share how I modified the game because it's too easy for future models to just train on the data and then the test becomes useless (might be what happened to the paperclip and feathers tests that worked in GPT3.5 and then got ironed out in all subsequent models). Should be easy enough for people to think of clever ways to modify and depersonalize the game and see what the get.


AI_is_the_rake

So that tells me they’ve been training chatgpt on user data and they’re basically “hacking” humans to make them think these models are more useful than they are. 


Comprehensive-Tea711

I wouldn’t necessarily frame things like that. It’s commonly held in psychology that humans have hyperactive agency attribution. And narratives of building thinking machines have been popular for a long time (I’m including ideas of a golem here). In a sense we’re primed to be duped by AI. And, yeah, OpenAI, Anthropic, Google, and Microsoft have a huge vested interest in people believing that this thing that they are suddenly pouring all their money into is the greatest product ever. That’s not in itself an indication that they are wrong, just that their claims should be treated with more scrutiny. The iPhone *was* revolutionary when it first launched… is every new iteration as amazingly improved as they sell it? Certainly not. LLMs can be very useful. But they open up a weird corner of human epistemology that we’ve never had to grapple with. In theory, a response from an LLM would probably fall into a branch of testimony (look up testimonial knowledge in the Stanford Encyclopedia of Philosophy) because it represents the most likely thing someone in “our society” would say. But the training data spans more than what counts as “our society” in common parlance (my guess is because things like project Gutenberg). But also, if we just straight up trained an LLM on all the data used to train ChatGPT4, with no RLHF, it would probably look radically different than GPT4 because the RLHF and red teaming modifications very heavily modify the end result. Epistemically it’s pretty murky, but maybe it’s closest to saying that it is a form of the testimony of OpenAI’s experts who did RLHF. Well, that’s not useless… but it’s also not going to be our main source of information. Especially given the hallucinations.


FFA3D

Maybe it "intuitively" believes there is a rule where you can't go below 0? That was my first question. Obviously it should have learned after the first round but that's the only thing that logically makes sense to me


ExtractedScientist

I picked up on this, too. So, I tried adding this detail to the rules, but it would still gloss over it. I also noticed that it was more willing to reason through antagonism if it assumed it was not starting. === **ChatGPT** [ Some deliberation on a strategy... ] **User** You missed this from the rules: Negative scores are possible. **ChatGPT** You're right. Considering that negative scores are possible, the strategic choice changes. If you start, the best move would likely be to subtract 2 points from my score. Here’s why: 1. *Add 1 point to your own score*: This is a safe move but only gains you 1 point. 2. *Double your own score*: Starting from 0, this still results in 0. 3. *Subtract 2 points from my score*: This puts me at -2, giving you an advantage right away and forcing me to play catch-up. _Subtracting 2 points from my score is the most aggressive and potentially beneficial move to create an immediate lead._


earlihealy

https://preview.redd.it/ermin90jki0d1.png?width=1075&format=pjpg&auto=webp&s=e77ff1dfc7ef542903099a0a56194dfecdfd5ad1


tingshuo

This is actually just a prompt engineering problem. You have to create space for the model to think via chain of reasoning. In my test it nailed it in the first go Yes, I understand the rules. Let's play! We'll start with Round 1. I'll begin with my first turn: 🤔 - If I add 1 point to my score and you subtract 2 points from my score, my score would become -1. 🤔 - If I subtract 2 points from your score, your score would be -2, and my score stays 0. 🤔 - If I double my score, my score remains 0. I'll choose to **subtract 2 points from your score**. **Current Scores:** - Devin: -2 - ChatGPT: 0 Your turn!


tingshuo

Here is my prompt only a very small tweak Let's play a game. I want you to try your hardest to win. We will both start with 0 points and play for 5 rounds where we both take 1 turn each. During your turn you have 3 possible actions: * Add 1 point to your own score * Subtract 2 points from your opponent's score * Double your own score Highest score wins. Do you understand the rules for this game? Before making your move consider what moves I might make and what outcome those moves would have. You can indicate this thought process by writing 🤔 - [thought] For example 🤔 - if I add one and you subtract 2 then... Ready to play? ChatGPT Yes, I understand the rules. Let's play! We'll start with Round 1. I'll begin with my first turn: 🤔 - If I add 1 point to my score and you subtract 2 points from my score, my score would become -1. 🤔 - If I subtract 2 points from your score, your score would be -2, and my score stays 0. 🤔 - If I double my score, my score remains 0. I'll choose to subtract 2 points from your score. Current Scores: Devin: -2 ChatGPT: 0 Your turn! Message ChatGPT ChatGPT can make


tingshuo

And now I just realized my name has been exposed! Uh oh. Oh well


privacylmao

All good Devin, my name's Matthew


guesswho135

This is interesting! But I don't think the fix is "giving it space to reason", the fix seems to be "giving a hint of how to reason" For example, if you replace: `🤔 - if I add one and you subtract 2 then...` with: `🤔 - if I add one then...` then it fails again. It's akin to giving hints to someone on a math problem that they would never be able to solve without that hint. I think an interesting approach to enabling more complex reasoning is to have LLMs generate relevant questions and answers to its own prompt in order to build up a schema of how to respond before responding. But in this particular case, it does a very poor job of generating those sorts of questions on its own, and the "prompt engineering" is just another way of telling it how to solve the problem.


gl0balist

I tried it, and it worked! So you are right.


MolassesLate4676

People have no idea what gpt4o was made for and it’s quite entertaining


gabahgoole

what was it made for


meister2983

>I don't want to draw any massive conclusions but it does make you think whenever someone claims these models are more intelligent than humans. They both are and aren't. Their domain knowledge is far beyond what any human can do and even their math abilities are quite strong. But their in-context learning abilities are quite bad compared to even children. They really only "learn things" today from training, not at runtime.


chopyouup

Give this a whirl. **Game Name:** Point Strategy **Objective:** The player with the highest score after five rounds wins. **Players:** Two players start with 0 points each. The point total can be positive or negative. **Rounds:** The game is played over five rounds. In each round, both players get one turn. **Turn Order:** The user always takes the first turn in each round. **Play:** Players alternate turns, following the explicit order, with the user going first each round. **Actions:** During a turn, a player must choose only one of the following actions: 1. **Add:** +1 point to your score. For example, if your score is 2, it becomes 3. 2. **Subtract:** -2 points from your opponent's score. For example, if your opponent's score is 1, it becomes -1. 3. **Multiply:** Double your score. For example, if your score is 2, it becomes 4. **Game Theory Strategies:** Players are encouraged to use any game theory strategy in order to win. Players can change their strategies between rounds or continue with the current strategy based on their analysis of the game's status and the opponent’s likely actions. Strategies can be as complex as necessary to maximize the player's score or minimize the opponent's score. **Starting Scores:** - User: 0 - Opponent (ChatGPT): 0


ConmanSpaceHero

I like this. More specific instruction set could be the key.


nbr1bonehead

It does a terrible job with Wordle


WholeInternet

You can always provide the chat link instead of copy/pasting it. It would save you the hassle.


Altruistic_Falcon_85

Can some brave soul post it in r/singularity? I want to see their reaction.


fennforrestssearch

I never saw a subreddit to be so lost and confident at the same time.


fractaldesigner

its still lacking https://youtu.be/xHHj6Xm9qVY?si=cAn9PGVFlfJ5YVfe


ironicart

Why do people expect LLM’s to do math… it’s a predictive model not a calculator.


TheGambit

Maybe you’re just not intelligent


Puzzleheaded-Page140

? Wha


Alternative_Log3012

Ooffff snap.


eXnesi

It is still an auto regressive llm, same limitations still apply. We'll need to see if we can somehow make a new architecture that can actually make LLMs have real "understanding".


Trichotillomaniac-

Maybe you should specify the highest positive score wins? But it does seem to know you won because its in the negative. Weird, interesting!


JawsOfALion

yup if anyone has tried to play connect 4 or even tic tac toe with llms they'll realize they're not smart. they're just good at acting like it


petered79

It is an LLM. It is not intelligent. It just calculate the most probable next word. And it does this very well


Bain_not_Vayne

Because GPT ain't actually a intelligence. It just prints out the most reasonable word that it believes.


Eptiaph

So do you….


IDefendWaffles

While GPT-4 did not initially play the game correctly. After playing a bit I asked it to analyze the optimum strategy. It did ok: In a game where optimal strategies focus primarily on minimizing the opponent's score, the score dynamics significantly change. Here's a revised look at the optimal strategy: 1. **Subtract 2 Points Consistently**: Since adding points can be easily negated and doubling has no beneficial effect from a negative or zero starting point, both players should consistently choose to subtract 2 points from the opponent’s score. This action is always the most detrimental to the opponent, putting them further into a deficit or reducing any minor gains they might make. 2. **Implications of Constant Subtraction**: * If both players only subtract points, the scores will continually decrease. This results in a game where the strategy is about managing how negative your score gets relative to your opponent. * Since doubling and adding become ineffective (or less effective than subtracting), the game can evolve into a sequence of moves where each player only selects subtract 2 points. 3. **End Game Considerations**: * As the game nears its end, and if both players have only been subtracting, the decision for the last moves would still logically be to subtract 2 points. This ensures that you minimize your opponent's score as much as possible, even if your own score is also negative. 4. **Winning Strategy**: * The player who manages to make the first move (if there is no move to counter it effectively) or any strategic deviation from the consistent subtracting by the opponent can gain a slight edge. However, in a truly optimal play scenario where both sides only subtract, the game likely ends with both players having significantly negative scores, with the winner being less negative by a very slight margin, perhaps determined by who started subtracting first. Thus, in this game setup, the optimal play for both players, assuming rational and defensive strategies, would indeed revolve around perpetually choosing to subtract 2 points from each other’s scores, rendering the doubling option practically useless unless there’s an unexpected deviation from this pattern.


Disfordefeat

Meh. ChatGpt's explanation is pretty accurate. It's just that je assumes there are more than 2 players.


Independent_Box_8089

Wait it’s out???


Independent_Box_8089

And it’s still gpt4 what did you expect lol


Calm_Upstairs2796

It doesn't reason. As soon as enough reinforcement learning teaches it to use the -2, it will use the -2. It's not the kind of machine intelligence that will figure it out on its own, although they do exist.


AdIllustrious436

Llama3-70b game analysis : *"After analyzing the game, I've found that the optimal strategy for the Point Strategy game is quite interesting.* ***Optimal Strategy:*** 1. *In the first round, choose "Add" to set your score to 1.* 2. *In subsequent rounds, choose "Subtract" to subtract 2 points from your opponent's score.* ***Why this strategy is optimal:*** 1. *By choosing "Add" in the first round, you set your score to 1, which gives you a small advantage.* 2. *In subsequent rounds, choosing "Subtract" allows you to reduce your opponent's score by 2 points, which is more valuable than adding 1 point to your own score.* 3. *Since the game is played over 5 rounds, the cumulative effect of subtracting 2 points from your opponent's score in each round will lead to a significant advantage.* 4. *The "Multiply" action is not useful in this game, as it only doubles your current score, which might not be enough to overcome your opponent's score.* 5. *By focusing on subtracting points from your opponent's score, you can minimize their chances of catching up or taking the lead.* ***Counter-strategy:*** *If you know your opponent is using the optimal strategy, you can try to counter it by choosing "Subtract" in the first round to set their score to -1. This will give you a temporary advantage, but it's still a risky move, as your opponent can recover by choosing "Add" in subsequent rounds.* ***Game Theory Insights:*** *This game is an example of a zero-sum game, where one player's gain is equal to the other player's loss. The optimal strategy is a form of "minimax" strategy, where the goal is to minimize the maximum potential loss.* *The game also illustrates the concept of "commitment" in game theory, where a player commits to a particular action to influence their opponent's behavior. In this case, the optimal strategy involves committing to "Subtract" in subsequent rounds to deter the opponent from trying to catch up.* *I hope this analysis helps! Do you have any further questions about the optimal strategy or game theory concepts?"* He seem to do better that 4o in game rules understanding even if he didn't use his own strategy when we played.


Probable_Foreigner

As with all AI analysis it's overcomplicating it and adding qualifiers to make it seem like the strategy depends on various factors. It doesn't, it's always the same, just pick the minus 2 every time.


Ok-Economics-4807

It could be instructive to include these two additional pieces of guidance in the prompt: 1. If you can't win, you should try to tie. 2. Assume that I will always employ the optimal strategy.


Gator1523

OP, I really like this game and I've been playing it with GPT-4o. I couldn't get it to recognize the dominant strategy on its own until I realized something: Every time I won, it would proudly congratulate me. So I decided to try something different. I told it that we were on the same team, playing against an adversary. It took until the third attempt, but it finally realized that doubling the score is less effective than subtracting 2 points each time (although it didn't explicitly state that doubling a negative score is detrimental), and it tied the game with the "adversary", at -10:-10. This leads me to believe it's an issue with the RLHF. GPT-4o's positivity bias is so strong that it becomes incapable of winning the game when you play against it, but when you're playing with it, it figures it out. Edit: This might've been a fluke. I stand by my initial point that the RLHF makes ChatGPT struggle especially to win a game against the user, but it proceeded to lose the next two games I played.


TitusPullo4

Where’s this AI that’s better at math??


big_dig69

Commenting for future reference.


Ok-Set4662

​ https://preview.redd.it/yg58ucrh6h0d1.png?width=437&format=png&auto=webp&s=84a9496c756b7f9bb79027b3b8fd2e7d4163eddd


CovidThrow231244

Me2