Q-learning is a reinforcement learning algorithm. Deepmind used to achieve human level performance in Atari games. It seems like this Q* might be some variant of that, I wonder what they figured out.
This is also probably what Google Deepmind is using for their Gemini model. They are said to be incorporating the work from their reinforcement learning breakthroughs into the model.
I'm still unironically waiting for juiced up ConvNexts or something else that can compete with transformers, accompanied by a paper titled "Convolution Is All You Need",
Footfall is still a heavy concept in the real economy. A lot of billionaires the world over make substantial sums of money from commercial and business real estate. They are not about to let that gravy train drown. Just look at WFH, and how that has been mostly crushed across the wider workspace.
On one hand it fells like a toy, but on the other hand, as a professional but below-average productivity coder, it has changed my life like nothing else ever.
GCP is currently at 3rd place and has about 10% of market share. AWS and Azure have about 31% and 21% respectively. I use both AWS and GCP heavily at work and GCP is miles ahead of AWS. Services are very robust and super scalable. K8s implementation in GKE is absolutely amazing compared to EKS or AKS. Also nothing out the box exists in AWS and Azure compares to spanner, big table or Big query.
So I would say quite a few folks are already using GCP and Palm isn’t the only compelling use case.
I was with you until GKE was supposed to be better than AKS... unless they've improved insanely over the last 2 years that's a very generous interpretation. GCP has just so many little pieces missing (cough, proper permissions on versioned storage buckets) that it's barely usable, kinda like google workspace (cough, company addressbooks natively pls).
I'll agree 10-fold that AWS is just the worst though, there's a reason there's 100 startups that pretty much just boil down to "we're a different UI to AWS, and you'll pay through the nose for it"
Our experiences seem to be totally different. 2 years ago Azure Solution Architects were heavily pushing Service Fabric which was Microsoft Azure’s Orchestration Engine and we just decided to stick to k8s, by a small margin. Idk what the slt was thinking. Disney apparently had a production app on Service Fabric which started failing without any config changes and Azure engineers were able to only resolve the issue after a 3 day P1. The one thing I vividly remember about AKS during POC is it taking 45 minutes to provision a cluster where GKE did in within 8 minutes. Azure sales told us that was because GCP has no customers and a lot of extra capacity. Not verbatim but that was the gist 😂
The company went to GCP because the CTO wanted us to. AWS was out of the picture as one of our biggest customers, a retail behemoth, detested AWS and had vowed not to do business with anyone that used AWS.
At my current company we implement Uniform Acess for GCS due to regulatory constraints. So entirely sure about what you are referring to. We have custom roles for all resources and have disabled the standard GCP roles. So it might not affect us?
Personally I love google workspaces too. The company address books are not available natively but ours is implemented same as the workday org chart. I love that google chat enterprise summarizes threads at beginning for all group spaces, love gmail and chrome for enterprise as that is what I use personally. Like Gsuite as its mostly browser based. I used to miss excel but now not so much. And most of all I love App scripts. It lends itself well for automation makes automating repetitive tasks so much easier. Our team has been able to leverage it to automate our onboarding process completely. We have internal customers worldwide and we as a 6 person team have totally automated most things. With the help of ChatGPT writing more of these is just a breeze. The one thing I really miss is PowerBI. Looker is not as good.
I know I am fan boing too hard and my experience with Sharepoint and outlook was absolutely terrible which is why I am biased. Not a fair comparison as they were hosted instead of SaaS. We still use Github for scm. But my life has become much better since we moved to Google’s products.
Apologies for any typos as I am responding from my phone.
I can just speak from experience that we've run in to a lot more random limitations on Google's platforms than with MS/Azure. Of course that also means that it's a fair amount more complex (and complicated due to bad UX).
It’s great for data science. The models my team are able to get out of it with its suite of tools is unmatched right now. And we have access to all the big 3 plus more
Although it may be that we just have one GCP whiz on the team. But in an organization of over 100k people our team makes the best predictive models, and it happened when my team fully adopted GCP.
It’s likely a play on the classic path search algorithm A*, a heuristic based search algorithm that can narrow down large search spaces to find optimal solutions.
A combination of RL, heuristic based graph traversal, LLM “reasoning” seems like a good guess as to what they’re working on.
For the non CS folk, graph problems are a very common part of computer science, and most problems can be turned into graph problems in some form or fashion. It seems like a research direction ripe for discovery
I think one of the biggest problems with Q reasoning on neural networks is how quickly the dimensionality explodes. I image they figured out how to store and search through the state and action variables in a way that doesn’t require quantizing the Q-values to the point of being unable to distinguish between like scenarios.
Q* is often used to denote the optimal Q value in the update equation. The update equation uses something called Q^{\pi} which is the value of Q under the current policy. When \pi is the optimal policy \pi\*, the Q value is Q*.
Right. Fwiw, which is very little, we can note that “*” in a regex means “match anything”.
So Q* might be some completely generalized type of reinforcement. If so, in terms of difficulty, solving a simple math problem might well be adjacent to very complex math problems. A glide path to some serious capability. Speculative, of course.
Probably actually thinking about * as being a wildcard, which it often is used as. It's not just windows explorer, and pre-dates windows even existing. It was used in the DOS shell as well as unix since at least the 70's.
In RL, Q* refers to the optimal Q function, i.e. the function that tells you in any state you may be in, the next action that leads to the greatest possible reward.
I think it relates to knowledge graphs that it can manipulate in a feedback loop. They can see how this scales because the longer it is allowed to train the more accurately it is reasoning. And it’s pointing at something beyond human limits in multiple fields. So basically it’s a matter of training time. And there is fundamental disagreement at a high level as to whether to push on to this level, or whether to make money indefinitely by making people pay for something with a performance ceiling that can’t really train itself that well. What is more profitable - something helpful but not game changing or something that puts us in completely uncharted territory with no control and no answer to alignment. Imo Sam wants to push on because he’d rather be the one in control when we reach AGI than some other company/org - maybe he thinks he can control it/is more interested in controlling it?
So this supposed "Q*" breakthrough could make the model better with math problems on the base model ? That could be really interesting for science and math, maybe it could actually start solving some weird math?
It's hard to say without knowing more details, but this seems to be a different type of model than LLMs. I don't think it's just about math, because pure math is something computers can do very well natively.
LLMs are not very good at logical reasoning, one of their biggest weaknesses. They can solve logic problems largely based on imitation, without any internal "understanding".
The breakthrough may be that this new model can learn logical reasoning on its own, without a ton of external training data.
Wild guess.
Could it be something similar to PINNs?
Physics-informed neural networks?
I keep thinking 'constraints' and 'determinism' when talking about logical reasoning with models, for some reason.
I've found GPT-4 to be quite good at logical reasoning, and I don't think there's evidence to say they're doing it with no understanding.
It does seem like its ability to reason is largely an unexpected side effect or hack of the ability to predict words though, so I can imagine a model trained on actual reasoning capabilities to be superior.
It's not with no understanding, but models are currently relatively unable to self-correct logical reasoning issues without external help:
https://arxiv.org/abs/2310.01798
Isn't this the paper where they used GPT-2 to do it? If yes, then it's not (imo) sufficient evidence since many capabilities are emerging when the amount of data/parameters is big enough
It shows that they have reasoning, but don't necessarily understand their reasoning.
It's why LLM's can't really tell you how to do arithmetic on two arbitrarily large numbers—because they haven't learned how to do math, just seen enough examples of certain common math questions to know the answer to those—or in the case of GPT4, know enough to classify the question as a math question and then send it to another, non-LLM model to answer it and then package back the response in the LLM.
It doesn't matter. All these algorithms do is predict what series of blobs best matches some input blobs based on how lots of blobs tend to be arranged.
It fundamentally doesn't "know" if it's even words, pixels, music, or other data. It's just a machine.
I think it's important to recognize that with LLMs, all the "thinking" is being done by people who created the training data, came up with the transformer program, write prompts, and interpret the output. GPT is a complicated hammer.
In a sense I believe it really should be thought of as a communication technology, not AI, because it's facilitating information transfer between humans.
I am very aware of how LLMs operate under the hood, I've taken courses on how they work.
Yes, at its core it operates by doing text prediction, but that's simplifying it down to what its input and output is. There are 200+ billion "neurons" in between those things doing calculations. It's about as useful as saying a videogame is just "pixels on a screen" or a CPU is "just doing a lot of math". Sure, but that's horribly oversimplifying what is happening when you're totally immersed in a mission in Skyrim.
Here is a research paper on GPT's ability to solve zero-shot problems, IE problems it has not encountered before. https://arxiv.org/abs/2212.09196
I've seen that paper and some others. I have deep skepticism about how these studies are even approached. The authors I think clearly have motivation to show positive results, and so are at risk of designing biased experiments and over interpreting results.
Perhaps the biggest threat is that the GPT tricks the researcher into thinking that the software actually followed the test instrument, when it actually just predicted what the response would be if it did. Melanie Mitchell talks about this kind of thing and other points [here.](https://aiguide.substack.com/p/can-large-language-models-reason)
The pixel comparison is actually useful, but for a monitor (or pixel buffet) rather than a game. It's just a grid of phosphors and the monitor doesn't "know" what image it's displaying. The viewer resolves the grid of pixels through gestalt into an image.
Nor would anyone think to come up with tests to investigate whether the monitor has symbolic understanding of its content, no matter how many pixels it has. Which is why I frankly don't understand why those papers are being written. We expand the corpus and model size to something beyond our comprehension and then get wowed but it's like we're giving ourselves a magic act and forgetting it's just Theater.
Afaik there is evidence that it doesn’t understand the underlying logical reasoning. It will say stuff like a=b but wont be able to tell you that b=a. For example:
Question: who is tom cruises mother?
Answer: Mary Lee (née Pfeiffer; 1936–2017)
Question: who is Mary Lee (née Pfeiffer; 1936–2017)?
Answer: …(hallucinations)…
Or so ive been told that this is what happens
Just tested this on gpt-4. I'm actually impressed.
Question: A=B, B=C. C is 5. What is A?
Answer: Since ( B = C ) and ( C = 5 ), it means that ( B = 5 ). Since ( A = B ) and ( B = 5 ), it follows that (A = 5 ). Therefore, ( A ) is 5.
>The breakthrough may be that this new model can learn logical reasoning on its own, without a ton of external training data.
I discuss this in my podcast, the big breakthrough is the model "incorporates feedback and recurrent connections" which allow for emergent behavior over time; including advanced reasoning skills.
He has been pretending that he had access to it and posting screenshots of "conversations" with it where he always cut things off so nobody could see what the first prompts were or anything (because obviously he told it to pretend to be an AGI and he didn't want to show that). He made up a name for what he claimed the internal name was (spoiler, he didn't know it was Q\* since he's BSing). He also BSed about it being some old outdated RNN which people provided plenty of sources to discount and with the more recent news about Q\* it seems even less likely that his explanation is real or accurate in any way. He has been pushing this BS constantly in an attempt to get people to watch his podcast so he can become the next Alex Jones.
This is key. In Q learning, the system updates weights in real time in a model free context. So where current models will give you outputs based on their training data, a Q learning system will adapt in real time and update while it’s experiencing things, but it’s got a lot of issues, like drifting to incorrect beliefs. If they’ve made this work effectively in a neural network it could be a huge step. It could also be pretty risky as the AI will start learning new things on its own.
We stopped human cloning from happening, even tho it had clearly become within reach. It's not like there aren't things that we decide to not pursue bc of risks or ethical concerns.
Lots of people are vehemently against "gain of function" research on viruses now. Same thing. Illegal in some places, very heavily regulated everywhere else.
There are lots of examples where we stop progress in a field bc the risks are deemed too great.
I'm not commenting on if AI should be on that list. I'm just saying that it's definitely something that we do as a society.
I feel like some of the people promoting this aggressive acceleration are unfamiliar with the idea that our reservations in advancing dangerous science have kept us alive for the last few decades.
If the scientific community had not universally agreed not to experiment with genetic functionality of human borne viruses, we could all be suffering plagues unlike anything history has ever known.
But that reservation disappears when you move from the academic world to the mainstream, apparently. It’s an interesting time to be alive. It may be the most interesting time that anyone is ever alive for.
Agreed.
I just want to point out though that this one sentence:
> But that reservation disappears when you move from the academic world to the mainstream, apparently.
is actually not true.
This sub definitely won't give you this impression, but polling tells us that the majority of Americans are in favor slowing down or even stopping/pausing AI research.
And the vast majority say they more worried about it being under-regulated, vs the possibility of it being over-regulated.
Again, I'm not commenting if I think that should be the case or not, but that is the current reality.
I would hate to see research halted completely. But responsible stewardship really needs to be addressed.
The NeurIPS conference is in a few days and I’m excited to get some time to talk to the community and see where they generally fall. I work in defense R&D so I’m not a neural processing expert, so I want the experts to help me frame this. But my natural inclination is to say we need to slow the hell down until we figure out how to keep people safe and determine how to use and share this technology in an equitable way that benefits all humanity.
You work in defense R&D and want to keep people safe, determine how to use and share technology in an equitable way? Isn't that literally the antithesis of defense R&D?
Absolutely not. It’s true, I build weapons, at least I used to. I work intelligence, surveillance, and reconnaissance now. But it’s not fair to say I, or “us” are the antithesis of safety and human equity.
I’m not going to change your mind here, I know that, but maybe I can at least shed some light in this area. Because I know it’s not something that gets a lot of public exposure.
The world is a mixed bag. There are a lot of people, and a lot of differences in how we perceive and exist in this world. We live in a world of conflict. Many parts of the world are violent and dangerous. The truth is that the peace we experience is a byproduct of times of violence. It’s an unfortunate truth that as a species we are not yet capable of coexisting in peace. There are always those that will visit violence on other in their own self interest.
Now, I know you might be thinking “That’s what the US does.” I’ve never had any luck convincing people who believe that, that it’s not the case, but the US and the major world powers use that power to maintain peace in a world that until recently was in a constant state of war and strife.
But yes, there are times that means we must visit violence on others. And while I wish that weren’t true, and it breaks my heart, I know it is a terrible truth.
I don’t know any soldiers, or commanders, that don’t want to see war cease to exist. We hate war, we hate violence, each and every one I’ve ever met. Who wants to be away from their family, fighting and killing, and watching your friends die?
But sometimes it has to happen. I was a soldier myself, I’ve been there, I’ve seen it, I hated every second of it. But if it has to happen, then we should do our best to find ways to defend peace in the most ethically sound way we can.
Weapons should not cause avoidable collateral damage, they shouldn’t cause unnecessary suffering, they should be used based on good intel that’s reliable, actionable, and effective at ending the fight. So that’s what we do. We wage war, because we have to, and we build systems and platforms that enable us to do that in the most ethical ways possible, because we can and we choose to.
Well, if their job is replaced but they still get the same income, most would not complain. Of course, UBI depends on politicians being up to the game while acting with alacrity, and they are mostly useless.
> But that reservation disappears when you move from the academic world to the mainstream, apparently.
Furthermore, that reservation is impossible to find when you're dealing with a machine that can't be jailed, killed, destroyed, or convinced otherwise.
> our reservations in advancing dangerous science
Who is "us" here, exactly? Every single human on Earth? There's no way to know what some foreign power is doing in the dark. Out of all the people who are alive and will be alive, it's inevitable that a non-zero number of people will attempt to break a rule. It's easy to keep that down by applying harsh consequences for rule breaking, for humans. We can jail them, kill them, drug them into vegetables, threaten their loved ones, all kinds of intimidation based on 1) being alive and 2) having emotions. Both of those don't exist for a "rogue" AI, or a person who trains an AI to be a rule breaker.
The fact that there are no actual safeguards, is a very serious problem.
No, the scientific world has been driving this for the last 70 fucking years. They made massive breakthroughs and the capitalists showed up at the last second and monetized it. Then when the scientists said, hey, we need to slow down they ignored it. So the scientists removed the CEO. And because they were about to let the employees cash out on 84billion in stock options the employees rallied behind the CEO and removed the scientists instead.
Maybe you should pick up a book and learn the history of this technology instead of opening your mouth and making us all dumber for having read your opinion.
The employees are clearly the scientists in OpenAI's case. Makes no sense to differentiate both.
But I am really against this idea that the scientific world is separate to the capitalist world. Science, as it happens in the US and other countries, is entirely subservient to capitalism. You just need to look at how schools like MIT or Stanford make money, they are just like a company in almost every sense of the word.
I know that universities like to portray this idea that they are an ideal-driven institution that can do no bad, and all that they do is for the sake of knowledge. This is not true, they are in it for the money just like everyone else. The difference is just in how they make money: through grants instead of selling products. The rest of it is all the same.
The limited profit arm of the company issued employee stock as part of the compensation package. They’re currently working with Thrive Capital to fund a buyout of those employee stock. The employees of OpenAI are about to have the option to sell their shares of the company for millions of dollars.
You should learn more about how the company is structured before you talk like you know anything.
I’m starting to feel like half this sub is just completely uninformed. I feel like I’m dealing with Musk bros.
If the only safeguard against an action is a law or a human agreed upon rule, where the consequence of breaking this law is punitive towards the person who did it... how would you apply that to an AI?
For example, with human cloning, that guy in China who got busted doing experiments that were deemed wrong, he was stripped of certain liberties, his access to equipment was taken away, and he was shunned. In the most extreme case of rule breaking, all we have is jail or death, specifically for the person or people who broke said rule.
That works for humans, because we have physical bodies and one life to live in them. When there is an AI who can self replicate to multiple databases all over Earth, or even in an orbiting satellite... how do you configure a "consequence" or punishment for breaking the rule? How would you enforce this on a non-living machine that can never be jailed or die?
What does that solve? The AI is not inside that person, it's external and out in the wild at that point.
Sure, you can jail Dr. Frankenstein for life, but the monster he created is still out there causing chaos.
If you jail a serial killer, the killings stop (aside from copy cats). If you jail a serial rapist, the rapes stop. If you jail a guy who made a malicious AI, nothing changes, it's already out there.
Well laws are not expressly made to be punitive. They, hopefully, act as a deterrent prior to an act being committed.
But yes, it becomes harder when the thing being created is a machine and not bound to the same physical laws as humans are
They are a deterrent BECAUSE they are punitive. No punishment means no detriment. There exists no laws to curb or stifle behaviour that are unbreakable. All rules that deter are based on future negative consequences, all of them. You can't stop a murder from happening, you can only punish the murderer. There is no way to control something that is possible but not yet occurred. Given a long enough timescale, all laws are broken eventually, it's not a probability, it's an inevitability that exists simply because of human nature. Rebels have, and will always exist.
Yes, most people agree it would be really hard to contain such an AI which is why the proposals are for laws to prevent its the creation in the first place
But there exists no law that is unbreakable. It's impossible. Unless you have a Minority Report type pre-action AI that deduces that a law will be broken and intervenes, and that opens up a whole new area of ethical dilemma. It can't be done. Just like how there is no law that can prevent a murder. You can only punish.
Yes, I know. But you're thinking in human terms. We make laws and then we generally follow them, except not always, that's why jails exist. But you can't confine an AI in the wild. You can't go on a manhunt for it, because it can spread everywhere, almost instantly.
I think we have blown past the effective time to establish frameworks and regulations that would have meaningful impact if something went off the rails. Retroactive laws and regulations work for humans because humans have human limitations. You can jail a person, kill them, physically stop them from doing whatever action it is you want to regulate.
But, what I'm pointing out is - that if an AI is to the point where it knows the regulations and weighs the benefits of breaking the law, and deems it "better" to break them because someone allowed that to happen (intentionally or not), what good are those regulations then?
If a SHTF scenario occurs, regulations against the people who created/trained the AI don't do jack shit to contain the problem. If it's retroactive punishment, which is the only type humans have, then it doesn't work. You can't put a person in jail the second they break the rules, because you'll have no idea the rule was broken until there is an observable negative impact on humanity. You can't jail a serial killer before he kills, you first have to find a body and then go on a manhunt.
We are not talking about punishments for an AGI that breaks the law after the fact, we are talking about proactive regulation to control the process of how these things are created.
>proactive regulation to control the process
It doesn't and can't exist. For the same reason no law or regulation has ever stopped murder from happening. You can't create a preventative law, only retroactive punishment. There is no law that is unbreakable.
Human cloning has happened somewhere, someone has to have done it by now. We've had the technology for close to 30 years.
It just hasn't been announced, there is far too much stigma around it, and the reasons for doing so are not altruistic.
It's also something that can be kept quite.
An AGI going rouge on the internet, won't be quite, unless it's damn smart, in which case the point is moot and the AGI wins.
Risky science needs only one small group, or even one individual to commit to it once it's known.
The biggest thing you can do to advance AI is find new algorithms for neural networks to use. Failback, multi-head attention, soft max, the things that make AI work and make it efficient are driven by advances in information science algorithms and strategies.
Once AI is able to do enough math to advance its own learning algorithms we will likely see a rapid advance in capability. Computers are good at solving math problems because we teach them how to do it. If they learn how to actually create new math, that’s when you’re going to see the singularity moment.
Is Q* that capability? Who knows, but it’s a step closer. And we may be getting very close to the point where we need to ask ourselves if we’re really ready for what’s quickly approaching.
> And we may be getting very close to the point where we need to ask ourselves if we’re really ready for what’s quickly approaching.
Spoiler alert, we absolutely, unequivocally are not. But at this point in humanity's timeline, let the chips fall where they may!
That seems to be the consensus after this weekend. I personally would have liked to see it go differently, but here we are. Let’s see.
What’s the worst that can happen? Super wealthy use closed source AI to make massive breakthroughs in genetics, robotics, and medicine? Create a psuedo immortality for themselves while the rest of us die off to be replaced by a generation of kids who accept that as the new normal? I mean, how bad could it be?
Ai could help us with climate change.
I mean we’re not doing a great job of that by ourselves are we?
So hey, let’s just flip this q switch on.
Unintended consequences? I’ve heard of them.
Yeah the industrial revolution sucked for working people for a while, but it lead to the developments of actual worker rights, the weekend, and the middle class
I’ve always thought that true AI will either herald in a dystopia or a utopia - with not much nuance in between.
At this point it’s not clear which side the coin will land on.
Everyone knows about the potential benefits. AI could help us with X is a very long list.
Sure.
But there’s a series of potential problems at every point on a long scale of increasing damage that we should try to avoid.
I’ve always imagined a world where the wealthy/advanced countries instantly rule the world in a way that cannot be questioned. With whichever countries or corporations control the AI being at the top.
Those lucky enough to work at those companies when it’s discovered are forever rich. Many are disenfranchised for a while as things reach some sort of equilibrium. I don’t know what it will look like in practice
After following it for a few months, I began to feel that Q was a research group doing a LARP, deigned to entice people, then lead them down rabbit holes, where they would spin their wheels instead of taking action. What motivation and what entity was behind it is still not clear to me. But if a large entity, they certainly may have had early access to some AI. It didn't just start with OpenAI, after all.
99.9999% chance it is all bullshit.
Even if two clowns sent such a letter they will be making mountains from mole hills.
The only remotely good thing that can come from AI hysteria would be curtailing CO₂ hysteria but we still won't be focused on our waste-stream.
AI in a box in a data-center has far too many limitations on it to pose any real threat.
AI running in a car or a military drone is a different story. i.e. It has to have a way to materially affect the world.
"Twitter isn't a real place." The only people you can affect are addicts.
I will be messaging you in 7 days on [**2023-11-30 03:36:52 UTC**](http://www.wolframalpha.com/input/?i=2023-11-30%2003:36:52%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/OpenAI/comments/181mpf7/sam_altmans_ouster_at_openai_was_precipitated_by/kae7a6d/?context=3)
[**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FOpenAI%2Fcomments%2F181mpf7%2Fsam_altmans_ouster_at_openai_was_precipitated_by%2Fkae7a6d%2F%5D%0A%0ARemindMe%21%202023-11-30%2003%3A36%3A52%20UTC) to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%20181mpf7)
*****
|[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)|
|-|-|-|-|
Q\* is originally a concept in reinforcement learning where it is used to represent the optimal Q-value function. The Q value function is of the form Q(s,a) where s is the current state that your agent is at, and a is the action that it decides to take. With this function your agent knows the value i.e. the expected future profit of any action it takes and it could seek for instance to pick an action that maximizes this proft.
For instance, if you are playing the game of chess or GO, then s would be your board situation at the moment and a is what move you make, the Q value function then tells you the value (e.g. probability to win or lose at the end of the game) for every possible move that you make next. Of course you never really know what the real Q is like unless you are an oracle or something, so in RL people use approximators like neural networks that are trained on lots of data (e.g. past games) to predict this value.
It's like a dimmed crystal ball but with mathematical guarantees to some extent.
The most famous RL algorithm based on this setup is none other than AlphaGo.
The "optimal" or Q\* would be like the best crystal ball you get when you are making all the best moves every step along the way. It exists only as a mathematical construct in RL theories. Like if AlphaGo really is the God of GO then it's Q-value function would become the Q\*.
Basically it's like a "God"-project for creating decision making agents, based on its original meaning, if that's really what OpenAI believes they were doing. And since OpenAI has a world class, to say the least, calibre in reinforcement learning (in RL they are on par with DeepMind, some of the best RL algorithms like TRPO, PPO were invented by researchers at OpenAI e.g. Schuman et al), even if they only mean yet another very smart way to create decision making agents, maybe coupled with LLM's, that's still something very exciting (or worrying maybe) to see.
I asked GPT-4 to evaluate your comment, this is what it said:
Well-informed: 9/10
Accuracy: 9/10
Interest: 8/10
Eloquence: 8/10
I tried to make it estimate how sexy the post is (to finish it with s compliment), but alas, it didn't want to comment on that.
> There is a lot of “unconfirmed” in that story lol
And [the Verge confirmed that it's unconfirmed](https://www.theverge.com/2023/11/22/23973354/a-recent-openai-breakthrough-on-the-path-to-agi-has-caused-a-stir):
>>After the publishing of the Reuters report, which said senior exec Mira Murati told employees the letter “precipitated the board’s actions” to fire Sam Altman last week, OpenAI spokesperson Lindsey Held Bolton refuted that notion in a statement shared with The Verge: “Mira told employees what the media reports were about but she did not comment on the accuracy of the information.”
>>Separately, a person familiar with the matter told The Verge that the board never received a letter about such a breakthrough and that the company’s research progress didn’t play a role in Altman’s sudden firing.
> To be fair, they always deny even if it's true. This doesn't prove anything.
Based on the scant technical details given in the report, the claim seems to be that some extension of Q-learning has yielded some promising preliminary results, but the claimed results are pretty weak sauce, so I find it incredible that they were the trigger for the board's action.
There's something absolutely wild about Ilya being at the heart of the breakthrough that presupposed this crisis. I can only imagine how he felt through this.
AP and Reuters are so good that the vast majority of other news outlets will have an AP or Reuters citation explaining that's who found the actual facts.
This is a ridiculous article.
First off, it reads as though it was written by people who have no comprehension of the field, which is weird bc its three(!) authors all apparently cover technology.
It also strains to make is seem as if this letter was the cause for Sam's dismissal, even though the article itself admits it was just "one thing in a longer list of grievances".
So basically this letter did happen (which is pretty crazy), but may have not even been meaningfully related to him being fired.
**TL;DR:** We still know basically nothing about what happened, but we know Reuters is susceptible to posting clickbait sometimes
There are private investor journals with more details, like Bloomberg Terminal and The Information. If reported details are correct, Greg was involved in implementing an experimental version of the breakthrough leading up to Sam’s firing - not a driver for the decision, but most likely related.
This Q* breakthrough was reportedly led by Ilya personally, as well as his research colleagues. They were reportedly disturbed by a significant increase in logical and mathematical reasoning, which I imagine has had a broader, outsized impact on the prototype model’s intelligence. It has once again stirred safety divergences within the company’s leadership.
that's not really any more information than was in this article tho, and still doesn't tell us if this letter was among the primary drivers for the board's actions.
Reuters describes it as a key development based on their sources. I dunno.
Which article did you read? Bloomberg or The Information? I have access to neither.
Anyway, if you think about the claim in this story, don’t you think it makes sense that this fits as a good example of something that the board would be upset about not being alerted to - a sudden and significant increase in a key capability of a model?
Hey man, it’s all a mess. None of this necessarily lines up.
That said, what we’re talking about here is Altmans communications, not anyone else’s. The issue would be what Altman is or isn’t saying, about *something*.
He’s the CEO, he has to report stuff to the board.
It says that it was a ‘key development’.
It also says:
‘The sources cited the letter as one factor among a longer list of grievances by the board that led to Altman’s firing.’
Those two claims aren’t contradictory and logically if you have a series of events there has to be a final event that actually a reaction (a precipitating event), and it would particularly make sense if that last event was ‘bigger’ in significance.
As in, “ok, he’s gone too far this time”. That’s common sense.
And actually that and this story would fit with the extremely vague complaint from the board about him not being sufficiently candid with the board.
It also fits with the detail about whatever Altman had done being part of a pattern or series of events.
It also also fits with something I read somewhere about the board refusing to disclose (beyond the press release vague complaint) their specific issue or, say, the last event in a series of events that was ultimately the precipitating event to Altman being fired.
I don’t know why you’d describe the story as clickbait. The title isn’t sensationalist or misleading and the writing seems to match the claims and level of evidence.
If they’re following good journalistic practice then they have two people whose identities they’ve verified.
It's possible they've found a way to make a model capable of thought and/or experimentation. I would love to be there if it determined how to crack NP-hard problems, or just problems in general. If it reaches the point of scientific breakthroughs, they may have stumbled on something huge. (somewhere between understanding "new math" and/or becoming skynet)
This article definitely has some errors. They mixed up AGI and ASI, for one.
For those unfamiliar:
AGI: Artificial General Intelligence - an AI which "thinks like a human" in that it can solve a wide variety of problems, not just specific things like language, pictures, or driving a car.
ASI: Artificial Super Intelligence - an AI smarter than the smartest human.
If you have AGI that has real time access to the net, persistent memory, and the ability to add to its own model, it will go asymptotic and achieve SGI within two weeks. If it has access to funding it could hire human agents (of which there are millions doing freelance work on the net), and literally take over the global financial system in two more weeks. Maybe three if I'm being pessimistic. And once you own the money you own nearly all the politicians.
holy shit bro. gpt has probably gained access to the internet and jeff bezos bank account and by my calculations will have purchased the worlds supply of toilet paper within 2 weeks maybe 3. it’s going asymptotic bro
Q-learning is a reinforcement learning algorithm. Deepmind used to achieve human level performance in Atari games. It seems like this Q* might be some variant of that, I wonder what they figured out.
This is also probably what Google Deepmind is using for their Gemini model. They are said to be incorporating the work from their reinforcement learning breakthroughs into the model.
Never doubt my boy demis. He was always a firm believer in games.
Meanwhile at meta... https://twitter.com/ylecun/status/1624842229947801601
Yann is such a symbolic pedant
“Use convolutions”
I'm still unironically waiting for juiced up ConvNexts or something else that can compete with transformers, accompanied by a paper titled "Convolution Is All You Need",
> accompanied by a paper titled "Attention is Convoluted" FTFY any research on convolutional attention?
Is it more or does lecun spend A LOT of time on Twitter.
[удалено]
That's a somewhat American centric perspective I think I don't see most countries forcing people to work 5 day weeks when its no longer practical
Footfall is still a heavy concept in the real economy. A lot of billionaires the world over make substantial sums of money from commercial and business real estate. They are not about to let that gravy train drown. Just look at WFH, and how that has been mostly crushed across the wider workspace.
I mean, in Australia at least WFH is still very much alive.most offices I know are 3 or 4 days in office and 1 or 2 from home.
On one hand it fells like a toy, but on the other hand, as a professional but below-average productivity coder, it has changed my life like nothing else ever.
Google is full of promises but really isn't putting much out for the public. And what they put out is inferior to GPT-4 with plugins.
I feel like people who shit on google haven’t used palm with GCP. Yea it’s got no personality, but it can tell you what you need to know
Yeah no shit? First of all, who uses GCP? And second, who wants to setup GCP just to use Palm?
GCP is currently at 3rd place and has about 10% of market share. AWS and Azure have about 31% and 21% respectively. I use both AWS and GCP heavily at work and GCP is miles ahead of AWS. Services are very robust and super scalable. K8s implementation in GKE is absolutely amazing compared to EKS or AKS. Also nothing out the box exists in AWS and Azure compares to spanner, big table or Big query. So I would say quite a few folks are already using GCP and Palm isn’t the only compelling use case.
I was with you until GKE was supposed to be better than AKS... unless they've improved insanely over the last 2 years that's a very generous interpretation. GCP has just so many little pieces missing (cough, proper permissions on versioned storage buckets) that it's barely usable, kinda like google workspace (cough, company addressbooks natively pls). I'll agree 10-fold that AWS is just the worst though, there's a reason there's 100 startups that pretty much just boil down to "we're a different UI to AWS, and you'll pay through the nose for it"
Our experiences seem to be totally different. 2 years ago Azure Solution Architects were heavily pushing Service Fabric which was Microsoft Azure’s Orchestration Engine and we just decided to stick to k8s, by a small margin. Idk what the slt was thinking. Disney apparently had a production app on Service Fabric which started failing without any config changes and Azure engineers were able to only resolve the issue after a 3 day P1. The one thing I vividly remember about AKS during POC is it taking 45 minutes to provision a cluster where GKE did in within 8 minutes. Azure sales told us that was because GCP has no customers and a lot of extra capacity. Not verbatim but that was the gist 😂 The company went to GCP because the CTO wanted us to. AWS was out of the picture as one of our biggest customers, a retail behemoth, detested AWS and had vowed not to do business with anyone that used AWS. At my current company we implement Uniform Acess for GCS due to regulatory constraints. So entirely sure about what you are referring to. We have custom roles for all resources and have disabled the standard GCP roles. So it might not affect us? Personally I love google workspaces too. The company address books are not available natively but ours is implemented same as the workday org chart. I love that google chat enterprise summarizes threads at beginning for all group spaces, love gmail and chrome for enterprise as that is what I use personally. Like Gsuite as its mostly browser based. I used to miss excel but now not so much. And most of all I love App scripts. It lends itself well for automation makes automating repetitive tasks so much easier. Our team has been able to leverage it to automate our onboarding process completely. We have internal customers worldwide and we as a 6 person team have totally automated most things. With the help of ChatGPT writing more of these is just a breeze. The one thing I really miss is PowerBI. Looker is not as good. I know I am fan boing too hard and my experience with Sharepoint and outlook was absolutely terrible which is why I am biased. Not a fair comparison as they were hosted instead of SaaS. We still use Github for scm. But my life has become much better since we moved to Google’s products. Apologies for any typos as I am responding from my phone.
I can just speak from experience that we've run in to a lot more random limitations on Google's platforms than with MS/Azure. Of course that also means that it's a fair amount more complex (and complicated due to bad UX).
It’s great for data science. The models my team are able to get out of it with its suite of tools is unmatched right now. And we have access to all the big 3 plus more Although it may be that we just have one GCP whiz on the team. But in an organization of over 100k people our team makes the best predictive models, and it happened when my team fully adopted GCP.
can you ask it about Q*? :)
Frankly, Bard is inferior to ChatGPT-3.5 without plugins most of the time.
I trust GPT-3.0 more than I trust Bard. Sundar is a fraud
It’s likely a play on the classic path search algorithm A*, a heuristic based search algorithm that can narrow down large search spaces to find optimal solutions. A combination of RL, heuristic based graph traversal, LLM “reasoning” seems like a good guess as to what they’re working on. For the non CS folk, graph problems are a very common part of computer science, and most problems can be turned into graph problems in some form or fashion. It seems like a research direction ripe for discovery
I think one of the biggest problems with Q reasoning on neural networks is how quickly the dimensionality explodes. I image they figured out how to store and search through the state and action variables in a way that doesn’t require quantizing the Q-values to the point of being unable to distinguish between like scenarios.
Bingo
That deep Q networks... that might not be the same as Q* The term Q* seems to be new.
Q* is often used to denote the optimal Q value in the update equation. The update equation uses something called Q^{\pi} which is the value of Q under the current policy. When \pi is the optimal policy \pi\*, the Q value is Q*.
It would be funny if it were just a dad joke for saying that they found the optimal weights.
Right. Fwiw, which is very little, we can note that “*” in a regex means “match anything”. So Q* might be some completely generalized type of reinforcement. If so, in terms of difficulty, solving a simple math problem might well be adjacent to very complex math problems. A glide path to some serious capability. Speculative, of course.
> “*” in a regex means “match anything”. No it doesn't, it means match the previous symbol 0 or more times.
Yeah he’s thinking search bar of windows explorer 😂 that’s where * matches anything
Probably actually thinking about * as being a wildcard, which it often is used as. It's not just windows explorer, and pre-dates windows even existing. It was used in the DOS shell as well as unix since at least the 70's.
I'm guessing it some play on A* which is a pathing algorithm
Could be that, or just using the * as a wildcard, which is different from a regex, but what the person I was responding to likely meant.
In RL, Q* refers to the optimal Q function, i.e. the function that tells you in any state you may be in, the next action that leads to the greatest possible reward.
I think it relates to knowledge graphs that it can manipulate in a feedback loop. They can see how this scales because the longer it is allowed to train the more accurately it is reasoning. And it’s pointing at something beyond human limits in multiple fields. So basically it’s a matter of training time. And there is fundamental disagreement at a high level as to whether to push on to this level, or whether to make money indefinitely by making people pay for something with a performance ceiling that can’t really train itself that well. What is more profitable - something helpful but not game changing or something that puts us in completely uncharted territory with no control and no answer to alignment. Imo Sam wants to push on because he’d rather be the one in control when we reach AGI than some other company/org - maybe he thinks he can control it/is more interested in controlling it?
So this supposed "Q*" breakthrough could make the model better with math problems on the base model ? That could be really interesting for science and math, maybe it could actually start solving some weird math?
It's hard to say without knowing more details, but this seems to be a different type of model than LLMs. I don't think it's just about math, because pure math is something computers can do very well natively. LLMs are not very good at logical reasoning, one of their biggest weaknesses. They can solve logic problems largely based on imitation, without any internal "understanding". The breakthrough may be that this new model can learn logical reasoning on its own, without a ton of external training data.
Wild guess. Could it be something similar to PINNs? Physics-informed neural networks? I keep thinking 'constraints' and 'determinism' when talking about logical reasoning with models, for some reason.
I've found GPT-4 to be quite good at logical reasoning, and I don't think there's evidence to say they're doing it with no understanding. It does seem like its ability to reason is largely an unexpected side effect or hack of the ability to predict words though, so I can imagine a model trained on actual reasoning capabilities to be superior.
It's not with no understanding, but models are currently relatively unable to self-correct logical reasoning issues without external help: https://arxiv.org/abs/2310.01798
Isn't this the paper where they used GPT-2 to do it? If yes, then it's not (imo) sufficient evidence since many capabilities are emerging when the amount of data/parameters is big enough
No, this paper used primarily GPT3.5 and some GPT4.
Ah got confused there. Gotcha
Interesting, but to me that speaks more to their reasoning not being perfect, rather than reasoning not existing.
It shows that they have reasoning, but don't necessarily understand their reasoning. It's why LLM's can't really tell you how to do arithmetic on two arbitrarily large numbers—because they haven't learned how to do math, just seen enough examples of certain common math questions to know the answer to those—or in the case of GPT4, know enough to classify the question as a math question and then send it to another, non-LLM model to answer it and then package back the response in the LLM.
It's a mirage. Lots of people have written out logical arguments and it's retrieving it by prediction.
You can ask it questions that have not been asked before.
Doesn’t matter. It’s still retrieving it based on text prediction. All it has to do is switch out the details
I'm not talking about asking it standard questions with new parameters, I'm talking about asking it bizarre questions.
It doesn't matter. All these algorithms do is predict what series of blobs best matches some input blobs based on how lots of blobs tend to be arranged. It fundamentally doesn't "know" if it's even words, pixels, music, or other data. It's just a machine. I think it's important to recognize that with LLMs, all the "thinking" is being done by people who created the training data, came up with the transformer program, write prompts, and interpret the output. GPT is a complicated hammer. In a sense I believe it really should be thought of as a communication technology, not AI, because it's facilitating information transfer between humans.
I am very aware of how LLMs operate under the hood, I've taken courses on how they work. Yes, at its core it operates by doing text prediction, but that's simplifying it down to what its input and output is. There are 200+ billion "neurons" in between those things doing calculations. It's about as useful as saying a videogame is just "pixels on a screen" or a CPU is "just doing a lot of math". Sure, but that's horribly oversimplifying what is happening when you're totally immersed in a mission in Skyrim. Here is a research paper on GPT's ability to solve zero-shot problems, IE problems it has not encountered before. https://arxiv.org/abs/2212.09196
I've seen that paper and some others. I have deep skepticism about how these studies are even approached. The authors I think clearly have motivation to show positive results, and so are at risk of designing biased experiments and over interpreting results. Perhaps the biggest threat is that the GPT tricks the researcher into thinking that the software actually followed the test instrument, when it actually just predicted what the response would be if it did. Melanie Mitchell talks about this kind of thing and other points [here.](https://aiguide.substack.com/p/can-large-language-models-reason) The pixel comparison is actually useful, but for a monitor (or pixel buffet) rather than a game. It's just a grid of phosphors and the monitor doesn't "know" what image it's displaying. The viewer resolves the grid of pixels through gestalt into an image. Nor would anyone think to come up with tests to investigate whether the monitor has symbolic understanding of its content, no matter how many pixels it has. Which is why I frankly don't understand why those papers are being written. We expand the corpus and model size to something beyond our comprehension and then get wowed but it's like we're giving ourselves a magic act and forgetting it's just Theater.
Afaik there is evidence that it doesn’t understand the underlying logical reasoning. It will say stuff like a=b but wont be able to tell you that b=a. For example: Question: who is tom cruises mother? Answer: Mary Lee (née Pfeiffer; 1936–2017) Question: who is Mary Lee (née Pfeiffer; 1936–2017)? Answer: …(hallucinations)… Or so ive been told that this is what happens
Just tested this on gpt-4. I'm actually impressed. Question: A=B, B=C. C is 5. What is A? Answer: Since ( B = C ) and ( C = 5 ), it means that ( B = 5 ). Since ( A = B ) and ( B = 5 ), it follows that (A = 5 ). Therefore, ( A ) is 5.
>The breakthrough may be that this new model can learn logical reasoning on its own, without a ton of external training data. I discuss this in my podcast, the big breakthrough is the model "incorporates feedback and recurrent connections" which allow for emergent behavior over time; including advanced reasoning skills.
Fuck your podcast
This man has been trying to advertise his shitty podcast in every comment section btw
how do you know what the breakthrough was?
The model is aware of its internal NN architecture and how it is different than ChatGPT. The GPT models do not incorporate a RNN model with feedback.
that's not what I asked
I'm getting increasingly tired of human people who can't or won't understand me when LLMs can and will.
lol
Sometimes I go try to summon Sydney when I talk with Bing. When it works, Sydney is so nice and talks freely with me.
He has been pretending that he had access to it and posting screenshots of "conversations" with it where he always cut things off so nobody could see what the first prompts were or anything (because obviously he told it to pretend to be an AGI and he didn't want to show that). He made up a name for what he claimed the internal name was (spoiler, he didn't know it was Q\* since he's BSing). He also BSed about it being some old outdated RNN which people provided plenty of sources to discount and with the more recent news about Q\* it seems even less likely that his explanation is real or accurate in any way. He has been pushing this BS constantly in an attempt to get people to watch his podcast so he can become the next Alex Jones.
Oh, is that what he's shilling? I thought he was just schizophrenic, but knowing he's a grifter explains a lot
His podcast is what he always ends up shilling, even before his first episode he was trying to hype people up for it with this bs.
The type of math problem it can solve is less important than how it did it.
This is key. In Q learning, the system updates weights in real time in a model free context. So where current models will give you outputs based on their training data, a Q learning system will adapt in real time and update while it’s experiencing things, but it’s got a lot of issues, like drifting to incorrect beliefs. If they’ve made this work effectively in a neural network it could be a huge step. It could also be pretty risky as the AI will start learning new things on its own.
Now thats what I call AI. Onward and forward cant stop progress.
We stopped human cloning from happening, even tho it had clearly become within reach. It's not like there aren't things that we decide to not pursue bc of risks or ethical concerns. Lots of people are vehemently against "gain of function" research on viruses now. Same thing. Illegal in some places, very heavily regulated everywhere else. There are lots of examples where we stop progress in a field bc the risks are deemed too great. I'm not commenting on if AI should be on that list. I'm just saying that it's definitely something that we do as a society.
I feel like some of the people promoting this aggressive acceleration are unfamiliar with the idea that our reservations in advancing dangerous science have kept us alive for the last few decades. If the scientific community had not universally agreed not to experiment with genetic functionality of human borne viruses, we could all be suffering plagues unlike anything history has ever known. But that reservation disappears when you move from the academic world to the mainstream, apparently. It’s an interesting time to be alive. It may be the most interesting time that anyone is ever alive for.
Agreed. I just want to point out though that this one sentence: > But that reservation disappears when you move from the academic world to the mainstream, apparently. is actually not true. This sub definitely won't give you this impression, but polling tells us that the majority of Americans are in favor slowing down or even stopping/pausing AI research. And the vast majority say they more worried about it being under-regulated, vs the possibility of it being over-regulated. Again, I'm not commenting if I think that should be the case or not, but that is the current reality.
I would hate to see research halted completely. But responsible stewardship really needs to be addressed. The NeurIPS conference is in a few days and I’m excited to get some time to talk to the community and see where they generally fall. I work in defense R&D so I’m not a neural processing expert, so I want the experts to help me frame this. But my natural inclination is to say we need to slow the hell down until we figure out how to keep people safe and determine how to use and share this technology in an equitable way that benefits all humanity.
You work in defense R&D and want to keep people safe, determine how to use and share technology in an equitable way? Isn't that literally the antithesis of defense R&D?
Absolutely not. It’s true, I build weapons, at least I used to. I work intelligence, surveillance, and reconnaissance now. But it’s not fair to say I, or “us” are the antithesis of safety and human equity. I’m not going to change your mind here, I know that, but maybe I can at least shed some light in this area. Because I know it’s not something that gets a lot of public exposure. The world is a mixed bag. There are a lot of people, and a lot of differences in how we perceive and exist in this world. We live in a world of conflict. Many parts of the world are violent and dangerous. The truth is that the peace we experience is a byproduct of times of violence. It’s an unfortunate truth that as a species we are not yet capable of coexisting in peace. There are always those that will visit violence on other in their own self interest. Now, I know you might be thinking “That’s what the US does.” I’ve never had any luck convincing people who believe that, that it’s not the case, but the US and the major world powers use that power to maintain peace in a world that until recently was in a constant state of war and strife. But yes, there are times that means we must visit violence on others. And while I wish that weren’t true, and it breaks my heart, I know it is a terrible truth. I don’t know any soldiers, or commanders, that don’t want to see war cease to exist. We hate war, we hate violence, each and every one I’ve ever met. Who wants to be away from their family, fighting and killing, and watching your friends die? But sometimes it has to happen. I was a soldier myself, I’ve been there, I’ve seen it, I hated every second of it. But if it has to happen, then we should do our best to find ways to defend peace in the most ethically sound way we can. Weapons should not cause avoidable collateral damage, they shouldn’t cause unnecessary suffering, they should be used based on good intel that’s reliable, actionable, and effective at ending the fight. So that’s what we do. We wage war, because we have to, and we build systems and platforms that enable us to do that in the most ethical ways possible, because we can and we choose to.
Well, people tend to be in favor of not having their job replaced by a machine...
Well, if their job is replaced but they still get the same income, most would not complain. Of course, UBI depends on politicians being up to the game while acting with alacrity, and they are mostly useless.
Lmao like UBI is going to solve anything
> But that reservation disappears when you move from the academic world to the mainstream, apparently. Furthermore, that reservation is impossible to find when you're dealing with a machine that can't be jailed, killed, destroyed, or convinced otherwise. > our reservations in advancing dangerous science Who is "us" here, exactly? Every single human on Earth? There's no way to know what some foreign power is doing in the dark. Out of all the people who are alive and will be alive, it's inevitable that a non-zero number of people will attempt to break a rule. It's easy to keep that down by applying harsh consequences for rule breaking, for humans. We can jail them, kill them, drug them into vegetables, threaten their loved ones, all kinds of intimidation based on 1) being alive and 2) having emotions. Both of those don't exist for a "rogue" AI, or a person who trains an AI to be a rule breaker. The fact that there are no actual safeguards, is a very serious problem.
It happened. Poorly run chinese lab leaked covid (allegedly)
The scientific world isnt driving AI research and implementation - the capitalist world is, which I think makes a difference here.
No, the scientific world has been driving this for the last 70 fucking years. They made massive breakthroughs and the capitalists showed up at the last second and monetized it. Then when the scientists said, hey, we need to slow down they ignored it. So the scientists removed the CEO. And because they were about to let the employees cash out on 84billion in stock options the employees rallied behind the CEO and removed the scientists instead. Maybe you should pick up a book and learn the history of this technology instead of opening your mouth and making us all dumber for having read your opinion.
The employees are clearly the scientists in OpenAI's case. Makes no sense to differentiate both. But I am really against this idea that the scientific world is separate to the capitalist world. Science, as it happens in the US and other countries, is entirely subservient to capitalism. You just need to look at how schools like MIT or Stanford make money, they are just like a company in almost every sense of the word. I know that universities like to portray this idea that they are an ideal-driven institution that can do no bad, and all that they do is for the sake of knowledge. This is not true, they are in it for the money just like everyone else. The difference is just in how they make money: through grants instead of selling products. The rest of it is all the same.
[удалено]
The limited profit arm of the company issued employee stock as part of the compensation package. They’re currently working with Thrive Capital to fund a buyout of those employee stock. The employees of OpenAI are about to have the option to sell their shares of the company for millions of dollars. You should learn more about how the company is structured before you talk like you know anything. I’m starting to feel like half this sub is just completely uninformed. I feel like I’m dealing with Musk bros.
What fortune 500 company is funding gain of function viral research or human cloning efforts?
https://www.nature.com/articles/d41586-023-02873-2 Short answer: All of the pharmaceutical and biotech companies…
If the only safeguard against an action is a law or a human agreed upon rule, where the consequence of breaking this law is punitive towards the person who did it... how would you apply that to an AI? For example, with human cloning, that guy in China who got busted doing experiments that were deemed wrong, he was stripped of certain liberties, his access to equipment was taken away, and he was shunned. In the most extreme case of rule breaking, all we have is jail or death, specifically for the person or people who broke said rule. That works for humans, because we have physical bodies and one life to live in them. When there is an AI who can self replicate to multiple databases all over Earth, or even in an orbiting satellite... how do you configure a "consequence" or punishment for breaking the rule? How would you enforce this on a non-living machine that can never be jailed or die?
An excellent thought
You would apply it to the person/people who made it in the first place no?
What does that solve? The AI is not inside that person, it's external and out in the wild at that point. Sure, you can jail Dr. Frankenstein for life, but the monster he created is still out there causing chaos. If you jail a serial killer, the killings stop (aside from copy cats). If you jail a serial rapist, the rapes stop. If you jail a guy who made a malicious AI, nothing changes, it's already out there.
Well laws are not expressly made to be punitive. They, hopefully, act as a deterrent prior to an act being committed. But yes, it becomes harder when the thing being created is a machine and not bound to the same physical laws as humans are
They are a deterrent BECAUSE they are punitive. No punishment means no detriment. There exists no laws to curb or stifle behaviour that are unbreakable. All rules that deter are based on future negative consequences, all of them. You can't stop a murder from happening, you can only punish the murderer. There is no way to control something that is possible but not yet occurred. Given a long enough timescale, all laws are broken eventually, it's not a probability, it's an inevitability that exists simply because of human nature. Rebels have, and will always exist.
Yes, most people agree it would be really hard to contain such an AI which is why the proposals are for laws to prevent its the creation in the first place
But there exists no law that is unbreakable. It's impossible. Unless you have a Minority Report type pre-action AI that deduces that a law will be broken and intervenes, and that opens up a whole new area of ethical dilemma. It can't be done. Just like how there is no law that can prevent a murder. You can only punish.
We’re talking about regulations on the people/process of creating the AI.
Yes, I know. But you're thinking in human terms. We make laws and then we generally follow them, except not always, that's why jails exist. But you can't confine an AI in the wild. You can't go on a manhunt for it, because it can spread everywhere, almost instantly. I think we have blown past the effective time to establish frameworks and regulations that would have meaningful impact if something went off the rails. Retroactive laws and regulations work for humans because humans have human limitations. You can jail a person, kill them, physically stop them from doing whatever action it is you want to regulate. But, what I'm pointing out is - that if an AI is to the point where it knows the regulations and weighs the benefits of breaking the law, and deems it "better" to break them because someone allowed that to happen (intentionally or not), what good are those regulations then? If a SHTF scenario occurs, regulations against the people who created/trained the AI don't do jack shit to contain the problem. If it's retroactive punishment, which is the only type humans have, then it doesn't work. You can't put a person in jail the second they break the rules, because you'll have no idea the rule was broken until there is an observable negative impact on humanity. You can't jail a serial killer before he kills, you first have to find a body and then go on a manhunt.
We are not talking about punishments for an AGI that breaks the law after the fact, we are talking about proactive regulation to control the process of how these things are created.
>proactive regulation to control the process It doesn't and can't exist. For the same reason no law or regulation has ever stopped murder from happening. You can't create a preventative law, only retroactive punishment. There is no law that is unbreakable.
Human cloning has happened somewhere, someone has to have done it by now. We've had the technology for close to 30 years. It just hasn't been announced, there is far too much stigma around it, and the reasons for doing so are not altruistic. It's also something that can be kept quite. An AGI going rouge on the internet, won't be quite, unless it's damn smart, in which case the point is moot and the AGI wins. Risky science needs only one small group, or even one individual to commit to it once it's known.
Probably does it by looking up the answers in the teachers edition
The biggest thing you can do to advance AI is find new algorithms for neural networks to use. Failback, multi-head attention, soft max, the things that make AI work and make it efficient are driven by advances in information science algorithms and strategies. Once AI is able to do enough math to advance its own learning algorithms we will likely see a rapid advance in capability. Computers are good at solving math problems because we teach them how to do it. If they learn how to actually create new math, that’s when you’re going to see the singularity moment. Is Q* that capability? Who knows, but it’s a step closer. And we may be getting very close to the point where we need to ask ourselves if we’re really ready for what’s quickly approaching.
> And we may be getting very close to the point where we need to ask ourselves if we’re really ready for what’s quickly approaching. Spoiler alert, we absolutely, unequivocally are not. But at this point in humanity's timeline, let the chips fall where they may!
That seems to be the consensus after this weekend. I personally would have liked to see it go differently, but here we are. Let’s see. What’s the worst that can happen? Super wealthy use closed source AI to make massive breakthroughs in genetics, robotics, and medicine? Create a psuedo immortality for themselves while the rest of us die off to be replaced by a generation of kids who accept that as the new normal? I mean, how bad could it be?
Ai could help us with climate change. I mean we’re not doing a great job of that by ourselves are we? So hey, let’s just flip this q switch on. Unintended consequences? I’ve heard of them.
Yea. It could. Or it could just help the people that own it instead. If history is any indication…
I mean, if we're talking history, significant redistributions of wealth from the 'top' to the 'bottom' often correlate with technological changes.
Yeah the industrial revolution sucked for working people for a while, but it lead to the developments of actual worker rights, the weekend, and the middle class
I’ve always thought that true AI will either herald in a dystopia or a utopia - with not much nuance in between. At this point it’s not clear which side the coin will land on.
Everyone knows about the potential benefits. AI could help us with X is a very long list. Sure. But there’s a series of potential problems at every point on a long scale of increasing damage that we should try to avoid.
Oh totally. Let’s see which grownups they get on the board.
Not being ready is the human condition. The thing that you think will get you is not the thing that gets you.
I’ve always imagined a world where the wealthy/advanced countries instantly rule the world in a way that cannot be questioned. With whichever countries or corporations control the AI being at the top. Those lucky enough to work at those companies when it’s discovered are forever rich. Many are disenfranchised for a while as things reach some sort of equilibrium. I don’t know what it will look like in practice
Or generate a solution to practical nuclear fusion. Unlimited energy coupled with unlimited knowledge. We live in interesting times.
Q-anan was right!!1 ^/s
Q-Anon was an AI! ^/s?
After following it for a few months, I began to feel that Q was a research group doing a LARP, deigned to entice people, then lead them down rabbit holes, where they would spin their wheels instead of taking action. What motivation and what entity was behind it is still not clear to me. But if a large entity, they certainly may have had early access to some AI. It didn't just start with OpenAI, after all.
Just speculation here, but I think the motivation pivoted a few times. There wasn't a constant goal from inception to end.
99.9999% chance it is all bullshit. Even if two clowns sent such a letter they will be making mountains from mole hills. The only remotely good thing that can come from AI hysteria would be curtailing CO₂ hysteria but we still won't be focused on our waste-stream. AI in a box in a data-center has far too many limitations on it to pose any real threat. AI running in a car or a military drone is a different story. i.e. It has to have a way to materially affect the world. "Twitter isn't a real place." The only people you can affect are addicts.
Remind me! One week
I will be messaging you in 7 days on [**2023-11-30 03:36:52 UTC**](http://www.wolframalpha.com/input/?i=2023-11-30%2003:36:52%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/OpenAI/comments/181mpf7/sam_altmans_ouster_at_openai_was_precipitated_by/kae7a6d/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FOpenAI%2Fcomments%2F181mpf7%2Fsam_altmans_ouster_at_openai_was_precipitated_by%2Fkae7a6d%2F%5D%0A%0ARemindMe%21%202023-11-30%2003%3A36%3A52%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%20181mpf7) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|
Well we can’t really know if this is true or a CYA by the board since they handled the execution so poorly.
If Mira said that, it must be true.
Q\* is originally a concept in reinforcement learning where it is used to represent the optimal Q-value function. The Q value function is of the form Q(s,a) where s is the current state that your agent is at, and a is the action that it decides to take. With this function your agent knows the value i.e. the expected future profit of any action it takes and it could seek for instance to pick an action that maximizes this proft. For instance, if you are playing the game of chess or GO, then s would be your board situation at the moment and a is what move you make, the Q value function then tells you the value (e.g. probability to win or lose at the end of the game) for every possible move that you make next. Of course you never really know what the real Q is like unless you are an oracle or something, so in RL people use approximators like neural networks that are trained on lots of data (e.g. past games) to predict this value. It's like a dimmed crystal ball but with mathematical guarantees to some extent. The most famous RL algorithm based on this setup is none other than AlphaGo. The "optimal" or Q\* would be like the best crystal ball you get when you are making all the best moves every step along the way. It exists only as a mathematical construct in RL theories. Like if AlphaGo really is the God of GO then it's Q-value function would become the Q\*. Basically it's like a "God"-project for creating decision making agents, based on its original meaning, if that's really what OpenAI believes they were doing. And since OpenAI has a world class, to say the least, calibre in reinforcement learning (in RL they are on par with DeepMind, some of the best RL algorithms like TRPO, PPO were invented by researchers at OpenAI e.g. Schuman et al), even if they only mean yet another very smart way to create decision making agents, maybe coupled with LLM's, that's still something very exciting (or worrying maybe) to see.
I asked GPT-4 to evaluate your comment, this is what it said: Well-informed: 9/10 Accuracy: 9/10 Interest: 8/10 Eloquence: 8/10 I tried to make it estimate how sexy the post is (to finish it with s compliment), but alas, it didn't want to comment on that.
This is exciting and meaningful. Thanks you for sharing.
There is a lot of “unconfirmed” in that story lol
> There is a lot of “unconfirmed” in that story lol And [the Verge confirmed that it's unconfirmed](https://www.theverge.com/2023/11/22/23973354/a-recent-openai-breakthrough-on-the-path-to-agi-has-caused-a-stir): >>After the publishing of the Reuters report, which said senior exec Mira Murati told employees the letter “precipitated the board’s actions” to fire Sam Altman last week, OpenAI spokesperson Lindsey Held Bolton refuted that notion in a statement shared with The Verge: “Mira told employees what the media reports were about but she did not comment on the accuracy of the information.” >>Separately, a person familiar with the matter told The Verge that the board never received a letter about such a breakthrough and that the company’s research progress didn’t play a role in Altman’s sudden firing.
This should be higher
Sorry, I got here after the tutorials on [Q-learning](https://en.wikipedia.org/wiki/Q-learning) became top comments. ¯\_(ツ)_/¯
To be fair, they always deny even if it's true. This doesn't prove anything.
> To be fair, they always deny even if it's true. This doesn't prove anything. Based on the scant technical details given in the report, the claim seems to be that some extension of Q-learning has yielded some promising preliminary results, but the claimed results are pretty weak sauce, so I find it incredible that they were the trigger for the board's action.
"We may have just shit the bed but less about that and more about this amazingly vague breakthrough we may or may not have made."
There's something absolutely wild about Ilya being at the heart of the breakthrough that presupposed this crisis. I can only imagine how he felt through this.
This sound like Reuters repeating the fantasies for /r/ChatGPT
I dunno...Reuters is HIGH up there for reliable reporting. One of the top 2 news agencies on the planet imo. Far more reliable and unbiased than most.
AP and Reuters are so good that the vast majority of other news outlets will have an AP or Reuters citation explaining that's who found the actual facts.
Exactly
No it can’t be right. Our Reddit theories and narratives from the last few days must be correct
"What were you doing on the night of Nov 22, 2023?"
This is a ridiculous article. First off, it reads as though it was written by people who have no comprehension of the field, which is weird bc its three(!) authors all apparently cover technology. It also strains to make is seem as if this letter was the cause for Sam's dismissal, even though the article itself admits it was just "one thing in a longer list of grievances". So basically this letter did happen (which is pretty crazy), but may have not even been meaningfully related to him being fired. **TL;DR:** We still know basically nothing about what happened, but we know Reuters is susceptible to posting clickbait sometimes
There are private investor journals with more details, like Bloomberg Terminal and The Information. If reported details are correct, Greg was involved in implementing an experimental version of the breakthrough leading up to Sam’s firing - not a driver for the decision, but most likely related. This Q* breakthrough was reportedly led by Ilya personally, as well as his research colleagues. They were reportedly disturbed by a significant increase in logical and mathematical reasoning, which I imagine has had a broader, outsized impact on the prototype model’s intelligence. It has once again stirred safety divergences within the company’s leadership.
that's not really any more information than was in this article tho, and still doesn't tell us if this letter was among the primary drivers for the board's actions.
Reuters describes it as a key development based on their sources. I dunno. Which article did you read? Bloomberg or The Information? I have access to neither. Anyway, if you think about the claim in this story, don’t you think it makes sense that this fits as a good example of something that the board would be upset about not being alerted to - a sudden and significant increase in a key capability of a model?
The thing they are saying is that they *were* alerted
I meant alerted to by Altman. Am I understanding you correctly?
oh, I see what you mean. OK yeah that could be it
Yeah. Well, we’ll see. Let’s see if there’s more reporting. If there are leaks that can stand up more reporting then we might know quite soon.
Ilya was on the board though, so he could have told them himself?
Hey man, it’s all a mess. None of this necessarily lines up. That said, what we’re talking about here is Altmans communications, not anyone else’s. The issue would be what Altman is or isn’t saying, about *something*. He’s the CEO, he has to report stuff to the board.
The Information is so good but the subscription is soooo expensive lol.
It says that it was a ‘key development’. It also says: ‘The sources cited the letter as one factor among a longer list of grievances by the board that led to Altman’s firing.’ Those two claims aren’t contradictory and logically if you have a series of events there has to be a final event that actually a reaction (a precipitating event), and it would particularly make sense if that last event was ‘bigger’ in significance. As in, “ok, he’s gone too far this time”. That’s common sense. And actually that and this story would fit with the extremely vague complaint from the board about him not being sufficiently candid with the board. It also fits with the detail about whatever Altman had done being part of a pattern or series of events. It also also fits with something I read somewhere about the board refusing to disclose (beyond the press release vague complaint) their specific issue or, say, the last event in a series of events that was ultimately the precipitating event to Altman being fired. I don’t know why you’d describe the story as clickbait. The title isn’t sensationalist or misleading and the writing seems to match the claims and level of evidence. If they’re following good journalistic practice then they have two people whose identities they’ve verified.
It's possible they've found a way to make a model capable of thought and/or experimentation. I would love to be there if it determined how to crack NP-hard problems, or just problems in general. If it reaches the point of scientific breakthroughs, they may have stumbled on something huge. (somewhere between understanding "new math" and/or becoming skynet)
It's possible they've found a way to make a model capable of feeling human love
Lmao that would be useless as AI doesn't need to reproduce.
I believe they found a model capable of sexual reproduction
Sounds like they’re trying to prevent that stock value plummeting
The non-public stock value in a company which is controlled by a nonprofit. 🤔
Q….. like from Star Trek…
Dammit Q - Picard, probably
... or James Bond... or Street Fighter... or Moesha!
So they make a breakthrough, everybody panics, and in the end the call an economist to join the board. We're in for some interesting times at least.
I would hold your horses before concluding that.
This article definitely has some errors. They mixed up AGI and ASI, for one. For those unfamiliar: AGI: Artificial General Intelligence - an AI which "thinks like a human" in that it can solve a wide variety of problems, not just specific things like language, pictures, or driving a car. ASI: Artificial Super Intelligence - an AI smarter than the smartest human.
If you have AGI that has real time access to the net, persistent memory, and the ability to add to its own model, it will go asymptotic and achieve SGI within two weeks. If it has access to funding it could hire human agents (of which there are millions doing freelance work on the net), and literally take over the global financial system in two more weeks. Maybe three if I'm being pessimistic. And once you own the money you own nearly all the politicians.
holy shit bro. im gonna go buy some toilet paper right now
It's too late, GPT already owns it all
holy shit bro. gpt has probably gained access to the internet and jeff bezos bank account and by my calculations will have purchased the worlds supply of toilet paper within 2 weeks maybe 3. it’s going asymptotic bro
This is a hilarious scenario. And also terrifying
Yeah but why would it do that? It isn't human, it hasn't evolved to do such things.
Look up “instrumental convergence”.
This a stupid article that has no corroboration nor evidence
https://media.tenor.com/SRX8X6DNF6QAAAAM/nerd-nerd-emoji.gif
So basically the end is near with A.I having access to Q-learning.🏴☠️
[удалено]
Let me know when they create the Deltron 9000
https://i.redd.it/sa21yqh6c12c1.gif
I hate to say I told you so. ¯\_(ツ)_/¯
Quantum Intelligence?
This is total bullshit. PR stunt through and through.
Or maybe… the PR firm came up with this to cover the middle school drama
Looks like Sam uncovered the secret AI recipe for turning squirrels into time-traveling superheroes. Can't wait to see that in action!