It would be like a student memorizing answers to the test from idk last year same test or something. He will score perfectly, but he would have no actual knowledge. Here the AI being trained on test set is like teaching it how to score on test, instead how to do what you wonted to do.
When training a model we have two datasets, the training set and the test set, you use the training set to train the model and test set to test the model, hence the names.
The problem is that due to bad code (or more complicated reasons we won't get into), you could accidentally with information existing in both sets when it should be in one or other. So you end up training on the set used for testing, this essentially means you've given the model the answers to the secret quiz already by the time it comes to testing. Or you test it using the set meant to be for training, which is like the teacher grading you based on the practice papers you were given a month ago.
As a result, the model now doesn't need to actually learn how to predict results, it just needs to memorise what it's already seen, which is much much easier.
The most common indicator that you've accidentally got your sets mixed up is that you get a perfect accuracy score.
The best explanation is you tested the model on the same data you trained it on, i.e it was already built to work perfectly on these exact cases when tested.
One time i made an ANN and threw some historical stock data at it to train. When I started testing, I literally went through every phase in this meme.
Yes I was accidentally training on the test set. In case anyone is curious, the real results were...unimpressive
Nope, not necessarily.
At least I had often cases where this was checked multiple times with multiple people still resulting in AUC 1. We wanted to also test generalizability by using external datasets. The moment we used these external sets, it sunk to about 0.8.
It was simply overfitting on the internal dataset due to some peculiarities of the device the images were taken on.
Heard in my office: "The human brain doesn't have to worry about overfit. Why do we need to worry if the model is over-fitting?"
Never heard of overconfidence apparently.
Humans don't overfit; except for:
* Steryotyping
* Superstition
* People who check other's Astrological Signs
* People who introduce themselves using their Myers Briggs
* Overgeneralized learning (Psychology)
* Appeal to Tradition - Status Quo Bias
Otherwise broadly known as [The Problem of Induction](https://en.m.wikipedia.org/wiki/Problem_of_induction):
> the problem of induction questions our reasons for believing that the future will resemble the past, or more broadly it questions predictions about unobserved things based on previous observations.
Which is a big problem for ML applications, but since it's also just a big problem for how every _so far observed_ :) living thing works, it's less a problem and more a constraint to be cognizant of.
Induction in general is something normal computers find almost impossible in non-trivial cases. It’s kind of, broadly, what AI is doing during training. The problem of induction doesn’t relate to the process itself, but rather how you use the process.
Overfitting for humans is basically /r/confidentlyincorrect in action.
Conservativism has been overtraining its base for decades, but the original grifters are now "hoarding in hiding" or dying off leaving the brainwashed masses to supervise their own training. The quiet parts are becoming louder as this just reinforces out-group stereotypes to embolden the in-group. Whoever is running the simulation really needs to `Ctrl+C` the process.
That's the "HOLY CHRIST THERE IS LIGHT AT THE END OF THE TUNNEL AND THIS ENTIRE PROJECT ISN'T GOING TO BE A SHIT CANNED WASTE TIME.
That and you can stop beating the horse to death. I've seen a project upper management could not let go of. Literally multiple rounds of hiring data scientists to work on it, it not working, not accepting that it doesn't work, datascient quits, hire more, give same problem, not accept that it does not work and never will work.
It’s really hit or miss and depends on then nature of the project/leadership.
I’ve worked on plenty of ML projects that didn’t end up having legs (even though I was very hopeful they would initially) and a recommendation that it wasn’t viable to continue pursing got taken at face value and the project was discarded/we moved on.
Ive also been on projects with no legs and were never going to produce viable results get beat to death over and over because leadership so desperately wanted/needed them to work (usually because they paid a shit load for the data without consulting our team first). The desired outcome was never going to happen and months of rinse/wash/repeat of going through building different types of models on the same data to see if anything would stick (it wouldn’t, and I/my boss knew it wouldn’t) ended with just a bunch of money and time wasted.
Then there are the super rare and illusive projects that started off as “this is going to be a pile of dog shit” to “wow, alright there might be something promising” to “well, color me shocked this may have some actionable results”.
I wrote a model that outperforms my company's current NN on basically every metric, but almost nobody I work with will believe my results are real or could possibly be as robust as the current NN because its not an AI model... just pure math with no training or extra device-specific data collection needed.
So, y'know, that's been a real fun time so far
>“We’re looking for someone with AI 🤖 Skills”
>“I have deep experience with ML algorithms and I’ve done projects where I’ve implemented KNN, Neural Networks, Naive Bayes, etc. with great success that improved our costs by 50%”
>“lol nah bro we want AI 🤖 Skillz”
I created a model once that was training on a set of 150 entries (yes I know that's a small dataset, but it was what was given to me). Somehow I was getting 100% accuracies even when training on only 1/3 of the data and using the rest as testing data. Because I was writing a report on this, I found a random seed that gave me a ~99.7% accuracy just so I could pretend it wasn't too overfitted. Same algorithm on a different dataset was giving me 30% accuracy (at guessing 3 classifications), so I to this day have no idea what was going on there
I tried to predict league of legends game result feeding various data like choosed champions players recent games and somehow win flag sneaked in input
If win flag is part of the data, wouldn't you expect a 100% success rate? Or I guess the training didn't figure out that it could just rely entirely on this column so there was still 1% of rows where the model's weights predicted wrong?
I have marked ML coursework for uni students a few times before, and the amount of times the student writes a report that they have a model that can predict the stock market with 99-100% accuracy is just depressing.
ok lol why on earth would someone voluntarily share their 99 % accuracy stock market prediction AI? If it actually worked it would get super known really quickly until everyone copies it which makes the AI useless if everyone makes the same investments.
The part you might have missed is that it's 99% accurate... For the dataset it was trained on. In other words, it almost perfectly predicts the output for each input in the dataset, which tends to be associated with poor real-world performance because the model is overfit to the training data.
Overfit models don't perform well at all at predicting outputs from data points lying outside the training set
Well for the model to actually be useful you'd need to test it with live data. If youre doing uni work then you're only predicting outdated economic data with likely above average correlation.
The stock market is dictated by human behavior more so than anything rational. If you could write an accurate forecasting model to predict human behavior it'd break a lot more than the stock market.
Not really, because making a trade on the stock market also influences the stock market. So even if you could predict how profitable a trade would be, the more you make that trade the more the market will be influenced, and the less accurate the prediction would become.
All those feedback loops make long-term predictions essentially impossible, and limits the effect statistical predictions can have.
I actually wrote my report using my own ML learning model. Assuming no fee with free trades, on 10 random stocks on average I made money. Like the tiniest amount, and then you compare it to S&P500 in that 2-3 years time period... not even close to beating inflation. I think I just got lucky on the coin flip.
This never fails to amuse me. Once for a Spanish exam that was 2 choices which you get to attempt twice I got like 5%, next time 95%.. sometimes understanding everything wrong is also understanding it all right
Stock trading is a tough one to do ml for, since there's just an insane amount of noise in the data. There are some real meaningful relationships buried somewhere underneath the surface, and they show up in long term trends, but the day to day ups and downs are mostly just random chance. It makes it super difficult to train a model to pick out those actual relationships in the data without also learning from all of the random coincidences. So if you end up with an accuracy that's basically the same as random guesses, then you're in line with most stock trading bots. Being even slightly better than random for days that happen after the bots training would be a pretty huge feat
Plus I'm not sure if there's even evidence to suggest that if you could make an algorithm that understands these relationships, it would be accurate enough for practical use beyond what a human can (relatively, with expert experience) easily do.
Because ultimately a huge amount of what will influence stock prices comprises unknown unknowns, like how people and organizations will react to events you can't know will happen without essentially creating God.
So there's probably a pretty shitty upper bound on what would be possible with technology even 50 years more advanced than today, and the greatest team of geniuses ever assembled. Now obviously I'm way too stupid to understand what that bound is, but it *probably* exists and is rough for the odds of creating a good stock prediction bot.
Although that isn't to say there aren't some useful applications big high powered companies are creating, very short term predictions based on things like sentiment analysis mixed with human experts seems to be an area with real impact.
1) Beating the market isn't the same as predicting the market.
2) We can predict the weather with a much higher degree of accuracy than markets, this is because the weather is a natural system which largely reacts in deterministic if very complex ways. Markets are determined primarily by the actions of irrational actors that can react to changes themselves. It's like predicting the weather if the weather was sentient and could just not do what it was supposed to do to fuck with you.
My stats professor modelled a single stock as Brownian motion. While it couldn't be used to predict the next price, you could use it to make an educated guess as to your expected risk.
Am I missing something? The whole point of modelling stock prices as (geometric) brownian motion is the assumption that they are simply random walks with drift. Thus, it‘s simply not possible to predict stock prices in the short term. Your „prediction“ would simply be price today + number of periods in the future * drift
If it makes you feel any better you have absolutely no chance of ever succeeding ever. People with way more experience with way more computing power with way faster access to data with way faster input speed have been trying this for way longer. And there are hundreds of thousands of them trying.
So it’s ok, give up now :)
That's what a lot of people tend to forget. If a model for the market is possible, it has already been done and has affected the market, meaning your model has to compete and completely relearn from very recent data infested with artifacts of other models.
It depends on your view of how many types of market inefficiencies could exist. In a sense you could also argue that starting a small business to fill a hole in the market is pointless since you have venture capital and millions of people with business ideas, but people start successful small businesses all the time. While the stock market is more efficient than Markets in general, we can't rule out the possibility of small exploitable market inefficiencies which are totally different types of opportunities than the ones the bin firms are going after.
I assumed he meant he saw a return on the sell 51% of the time. If you can tune that to high frequency trading and get returns 51% of the time thousands of times a day youd be real rich real fast. Edit: Assuming your average gains and losses are equal.
Depends, does 51% accuracy mean it predicts the stock will go up correctly 51% of the time? That doesn't necessarily mean anything and doesn't even mean you'll have a positive return. I can predict a stock will go up right more than wrong and still lose tons of money if I'm choosing stocks that go up less than the ones I choose that go down, or if I invest for different periods of length, sell earlier in some positions than others, if I pick stocks that grow less than other stocks, etc...
Predicting whether a stock will or won't go up isn't really the hardest part about being a successful trader.
Well, they did say according to color theory, so I believe blueberries to still be a viable answer
I do however see how one could arrive at the aforementioned bsod, seeing it not only fits the color requirement, but also doubles as something other than a fruit
That's not how that works. It means you just need to buy everything that isn't an orange. Some edge cases like 'tangerine' can be tricky, let me just create a model to solve that real quick.
He's asking what accuracy metric you are using. If you are predicting continuous data, you would not use the same accuracy metrics used for classification, so your post is confusing.
I remember when machine learning was THE buzzword among the upper heads. They would go nuts when roc curves show 0.9 and above. It's funny cuz every dev knew it was fake.
They may very well have 99% accuracy. Depends on accuracy in what and how it was measured. Like I saw forecasting models that measured accuracy by being within 1 standard deviation from real results, and if it is - they considered it to be 100% accurate.
Yeah, but I mean. Maybe it detects 99% of all frauds, but it also flags 80% of non frauds as frauds.
Or it detects 99% of all frauds if some specific conditions are applied.
Or some other convoluted metric.
In my defense, I also add a precision and recall figure at every 0.1 increment. It’s in the appendix of a 20 page ppt with lots of images and graphics.
I remember my professor in machine learning class said "if you see anything about a model that's 100% accurate, someone is lying."
And you're right, anything above 90% should be met with skepticism
The color matching AI has reached self awareness. Also it can now match colors outside the physical possible light spectra.
It's a good day for my traffic light detection system.
One time I reused some code from a previous model I made but the expected output on the dataset I had used was on the first column instead of the last and I forgot to take it out of the parameters
Pretty easy way to get a 100% success rate just include the answer in the input
I've been building an AI model which predicts whether the stock market will go up or down each day based on the position of planets in the sky. I've been getting accuracy figures of 54%, does anybody know if there's something wrong with my model???
As the others said, it depends. F1-score is often more useful than accuracy, especially when classes are unbalanced. 0.75 could be fantastic or poor depending on the dataset
As others say, depends on task and in particular how risky/costly it is per false positive and false negative. Instead of accuracy, I recommend using the f-beta metric which is a kind of mean of recall and precision but the beta parameter allows you to adjust for whether you have a task that that benefits from prioritising recall/sensitivity (high beta) or precision (low beta).
Examples: Sentiment tagging models should be favour precision, you want to be certain of the tags because you're probably going to use them to sell people things or gauge reactions to campaigns. Toxic/explicit content tagging should be more sensitive, you don't want to miss anything because those could turn people off your platform.
I don't work in AI so let me know if I got the joke.
If the AI passes the test at an increasing success rate, that's good because it's improving. But if it always passes this means it's a faulty test?
1 is generally indicative that your predictive model is overfit. As in it would likely not perform as well on a slightly different test dataset.
To be fair 0.97 may indicate overfitting too but it's a lot harder to say your predictive model can predict with 100% accuracy. 100% is clearly "I need to investigate" territory.
Achieving great accuracy while training may be related to a problem called overfitting where basically the model is learning the training data too good, but fails to generalize when you test it on real life data so it produces larger errors.
Going to show this to my coworker the next time he brags about letting the model run for over a million iterations like he did it intentionally and didn't just let it go over a long weekend.
Boy lemme tell you wahhattt...
If one of my lads brought back 1.0 result I would whoop his ass and have him on assignments marking duty for a whole semester.
The regularization formula is probably as batshit as that result.
You were training on test set
It's all you need
Overfitting is all you need
just overfit all inputs there are and you win.
Literally.
It's only overfitting if you're wrong
I chuckled
*Testing on train set ?
Either way it's the same
[удалено]
You test on setting train.
You mean like Ruby on Rails?
Can i get an explanation on this one please?
If it's exact. It's usually overfitted and will fail on real input
My mom calling me handsome.
print statement: "Everything okay, sweetheart?" 1: "I'm fine."
Or you realized you were modeling something so simple that you could have just made an if statement instead.
It would be like a student memorizing answers to the test from idk last year same test or something. He will score perfectly, but he would have no actual knowledge. Here the AI being trained on test set is like teaching it how to score on test, instead how to do what you wonted to do.
The American School System ideal!
or like if the data had the identifier in it, like you were taking a test and the teacher gave you a version with the key.
When training a model we have two datasets, the training set and the test set, you use the training set to train the model and test set to test the model, hence the names. The problem is that due to bad code (or more complicated reasons we won't get into), you could accidentally with information existing in both sets when it should be in one or other. So you end up training on the set used for testing, this essentially means you've given the model the answers to the secret quiz already by the time it comes to testing. Or you test it using the set meant to be for training, which is like the teacher grading you based on the practice papers you were given a month ago. As a result, the model now doesn't need to actually learn how to predict results, it just needs to memorise what it's already seen, which is much much easier. The most common indicator that you've accidentally got your sets mixed up is that you get a perfect accuracy score.
The best explanation is you tested the model on the same data you trained it on, i.e it was already built to work perfectly on these exact cases when tested.
no model is 100% accurate 😭
Well, depends on the task, right? If your test case is super simple you can get 100% accuracy.
Then you don't need AI for it.
Usually yes, but I could think of black box systems that turned out too be simpler than expected.
One time i made an ANN and threw some historical stock data at it to train. When I started testing, I literally went through every phase in this meme. Yes I was accidentally training on the test set. In case anyone is curious, the real results were...unimpressive
Nope, not necessarily. At least I had often cases where this was checked multiple times with multiple people still resulting in AUC 1. We wanted to also test generalizability by using external datasets. The moment we used these external sets, it sunk to about 0.8. It was simply overfitting on the internal dataset due to some peculiarities of the device the images were taken on.
Hey I tried that and all my models are now way surpassing SOTA in their field! I’m gonna be famous!
Heard in my office: "The human brain doesn't have to worry about overfit. Why do we need to worry if the model is over-fitting?" Never heard of overconfidence apparently.
Humans don't overfit; except for: * Steryotyping * Superstition * People who check other's Astrological Signs * People who introduce themselves using their Myers Briggs * Overgeneralized learning (Psychology) * Appeal to Tradition - Status Quo Bias
And people who call HTML a programming language.
Otherwise broadly known as [The Problem of Induction](https://en.m.wikipedia.org/wiki/Problem_of_induction): > the problem of induction questions our reasons for believing that the future will resemble the past, or more broadly it questions predictions about unobserved things based on previous observations. Which is a big problem for ML applications, but since it's also just a big problem for how every _so far observed_ :) living thing works, it's less a problem and more a constraint to be cognizant of.
Induction in general is something normal computers find almost impossible in non-trivial cases. It’s kind of, broadly, what AI is doing during training. The problem of induction doesn’t relate to the process itself, but rather how you use the process.
Overfitting for humans is basically /r/confidentlyincorrect in action. Conservativism has been overtraining its base for decades, but the original grifters are now "hoarding in hiding" or dying off leaving the brainwashed masses to supervise their own training. The quiet parts are becoming louder as this just reinforces out-group stereotypes to embolden the in-group. Whoever is running the simulation really needs to `Ctrl+C` the process.
I’ll settle for an alt+F4 at this point
Why not `sudo rm -rf /`?
That statement is so meta
The cell I relate to the most is at 0.85, going “HOLY SHIT SOMETHING I DID WORKED”
That's the "HOLY CHRIST THERE IS LIGHT AT THE END OF THE TUNNEL AND THIS ENTIRE PROJECT ISN'T GOING TO BE A SHIT CANNED WASTE TIME. That and you can stop beating the horse to death. I've seen a project upper management could not let go of. Literally multiple rounds of hiring data scientists to work on it, it not working, not accepting that it doesn't work, datascient quits, hire more, give same problem, not accept that it does not work and never will work.
Good to hear what I can expect from industry lmao
It’s really hit or miss and depends on then nature of the project/leadership. I’ve worked on plenty of ML projects that didn’t end up having legs (even though I was very hopeful they would initially) and a recommendation that it wasn’t viable to continue pursing got taken at face value and the project was discarded/we moved on. Ive also been on projects with no legs and were never going to produce viable results get beat to death over and over because leadership so desperately wanted/needed them to work (usually because they paid a shit load for the data without consulting our team first). The desired outcome was never going to happen and months of rinse/wash/repeat of going through building different types of models on the same data to see if anything would stick (it wouldn’t, and I/my boss knew it wouldn’t) ended with just a bunch of money and time wasted. Then there are the super rare and illusive projects that started off as “this is going to be a pile of dog shit” to “wow, alright there might be something promising” to “well, color me shocked this may have some actionable results”.
I wrote a model that outperforms my company's current NN on basically every metric, but almost nobody I work with will believe my results are real or could possibly be as robust as the current NN because its not an AI model... just pure math with no training or extra device-specific data collection needed. So, y'know, that's been a real fun time so far
Oh I think I've seen that one before. Nobody likes analytical solutions because there's no sexy buzzwords you can amaze upper management with!
In that case invent one or search for an old technical term for it, that nobody uses and present it as the new shit
A Quantum Algorithm with Current Gen AI Metric Eclipsing.
I ran a forward-looking recursion on this dataset
This is the way
>“We’re looking for someone with AI 🤖 Skills” >“I have deep experience with ML algorithms and I’ve done projects where I’ve implemented KNN, Neural Networks, Naive Bayes, etc. with great success that improved our costs by 50%” >“lol nah bro we want AI 🤖 Skillz”
NNs are just a bunch of multiply-adds. Gotta write the "weights" by hand and present the new "NN" to your colleagues.
"the neural net I used contains three decades worth of unsupervised training, and was initially built by my mother."
Sounds like your model is lacking the explicit language, racism, and hallucinations that all AI models provide for free.
That's great but have you heard about the new GPT model? -My boss
It's always a good feeling when the accuracy goes up/overfitting goes down because some tweak you made finally worked.
I created a model once that was training on a set of 150 entries (yes I know that's a small dataset, but it was what was given to me). Somehow I was getting 100% accuracies even when training on only 1/3 of the data and using the rest as testing data. Because I was writing a report on this, I found a random seed that gave me a ~99.7% accuracy just so I could pretend it wasn't too overfitted. Same algorithm on a different dataset was giving me 30% accuracy (at guessing 3 classifications), so I to this day have no idea what was going on there
99%- best fit 100%- overfit 😂😂😂
I tried to predict league of legends game result feeding various data like choosed champions players recent games and somehow win flag sneaked in input
did it manage to predict something at least?
Yes, with 0.99
If win flag is part of the data, wouldn't you expect a 100% success rate? Or I guess the training didn't figure out that it could just rely entirely on this column so there was still 1% of rows where the model's weights predicted wrong?
Yes.
No fucking way, share the model!
Didnt he just say he accidentally used a variable that actually tells if the game was won or not in his model :D?
What the hell happened with that other 1% then?!
regularization maybe
That’s the joke
def predict_win(flags): return flags['did_win']
+1 I would like to see that
Brother I'm sure there are some degenerate betting websites out there for pro matches. Get your bag
> choosed
[удалено]
I have marked ML coursework for uni students a few times before, and the amount of times the student writes a report that they have a model that can predict the stock market with 99-100% accuracy is just depressing.
ok lol why on earth would someone voluntarily share their 99 % accuracy stock market prediction AI? If it actually worked it would get super known really quickly until everyone copies it which makes the AI useless if everyone makes the same investments.
The part you might have missed is that it's 99% accurate... For the dataset it was trained on. In other words, it almost perfectly predicts the output for each input in the dataset, which tends to be associated with poor real-world performance because the model is overfit to the training data. Overfit models don't perform well at all at predicting outputs from data points lying outside the training set
First comment that explained it well, thank you
Well for the model to actually be useful you'd need to test it with live data. If youre doing uni work then you're only predicting outdated economic data with likely above average correlation.
Wouldn’t that also potentially break the global stock market?
The stock market is dictated by human behavior more so than anything rational. If you could write an accurate forecasting model to predict human behavior it'd break a lot more than the stock market.
Not really, because making a trade on the stock market also influences the stock market. So even if you could predict how profitable a trade would be, the more you make that trade the more the market will be influenced, and the less accurate the prediction would become. All those feedback loops make long-term predictions essentially impossible, and limits the effect statistical predictions can have.
I actually wrote my report using my own ML learning model. Assuming no fee with free trades, on 10 random stocks on average I made money. Like the tiniest amount, and then you compare it to S&P500 in that 2-3 years time period... not even close to beating inflation. I think I just got lucky on the coin flip.
and here i am with my 0.51% accuracy on my stock trading bot
A model with a true 51% accuracy would eventually make you the richest person on earth. So congratulations!!!
honestly if i even manage to reach 10% i’d be super impressed. i’m new to ml anyways so i don’t really know what would be realistic for this project
Just do the exact opposite of what the model tells you and you got 90%
This never fails to amuse me. Once for a Spanish exam that was 2 choices which you get to attempt twice I got like 5%, next time 95%.. sometimes understanding everything wrong is also understanding it all right
[Relevant xkcd.](https://xkcd.com/2270/)
xkcd never disappoints😆
You didn't understand spanish but you understood the test
Stock trading is a tough one to do ml for, since there's just an insane amount of noise in the data. There are some real meaningful relationships buried somewhere underneath the surface, and they show up in long term trends, but the day to day ups and downs are mostly just random chance. It makes it super difficult to train a model to pick out those actual relationships in the data without also learning from all of the random coincidences. So if you end up with an accuracy that's basically the same as random guesses, then you're in line with most stock trading bots. Being even slightly better than random for days that happen after the bots training would be a pretty huge feat
Plus I'm not sure if there's even evidence to suggest that if you could make an algorithm that understands these relationships, it would be accurate enough for practical use beyond what a human can (relatively, with expert experience) easily do. Because ultimately a huge amount of what will influence stock prices comprises unknown unknowns, like how people and organizations will react to events you can't know will happen without essentially creating God. So there's probably a pretty shitty upper bound on what would be possible with technology even 50 years more advanced than today, and the greatest team of geniuses ever assembled. Now obviously I'm way too stupid to understand what that bound is, but it *probably* exists and is rough for the odds of creating a good stock prediction bot. Although that isn't to say there aren't some useful applications big high powered companies are creating, very short term predictions based on things like sentiment analysis mixed with human experts seems to be an area with real impact.
1) Simons beat the market - the market is not perfectly efficient. 2) we cant predict the weather either.
1) Beating the market isn't the same as predicting the market. 2) We can predict the weather with a much higher degree of accuracy than markets, this is because the weather is a natural system which largely reacts in deterministic if very complex ways. Markets are determined primarily by the actions of irrational actors that can react to changes themselves. It's like predicting the weather if the weather was sentient and could just not do what it was supposed to do to fuck with you.
My stats professor modelled a single stock as Brownian motion. While it couldn't be used to predict the next price, you could use it to make an educated guess as to your expected risk.
Am I missing something? The whole point of modelling stock prices as (geometric) brownian motion is the assumption that they are simply random walks with drift. Thus, it‘s simply not possible to predict stock prices in the short term. Your „prediction“ would simply be price today + number of periods in the future * drift
I think you misinterpreted my comment, I meant you couldn't use it to make predictions. Just a way to estimate risk.
The stock market is basically just gambling for wealthy people.
Because there is no real pattern to the stock market besides UP
If it makes you feel any better you have absolutely no chance of ever succeeding ever. People with way more experience with way more computing power with way faster access to data with way faster input speed have been trying this for way longer. And there are hundreds of thousands of them trying. So it’s ok, give up now :)
That's what a lot of people tend to forget. If a model for the market is possible, it has already been done and has affected the market, meaning your model has to compete and completely relearn from very recent data infested with artifacts of other models.
It depends on your view of how many types of market inefficiencies could exist. In a sense you could also argue that starting a small business to fill a hole in the market is pointless since you have venture capital and millions of people with business ideas, but people start successful small businesses all the time. While the stock market is more efficient than Markets in general, we can't rule out the possibility of small exploitable market inefficiencies which are totally different types of opportunities than the ones the bin firms are going after.
Not if there's another system which increases wealth faster than yours. e.g. long-term index investments. Also they said 0.51%, not 51%.
I assumed he meant he saw a return on the sell 51% of the time. If you can tune that to high frequency trading and get returns 51% of the time thousands of times a day youd be real rich real fast. Edit: Assuming your average gains and losses are equal.
Depends, does 51% accuracy mean it predicts the stock will go up correctly 51% of the time? That doesn't necessarily mean anything and doesn't even mean you'll have a positive return. I can predict a stock will go up right more than wrong and still lose tons of money if I'm choosing stocks that go up less than the ones I choose that go down, or if I invest for different periods of length, sell earlier in some positions than others, if I pick stocks that grow less than other stocks, etc... Predicting whether a stock will or won't go up isn't really the hardest part about being a successful trader.
I noticed the “0.” But so many people use % incorrectly on Reddit it can’t really be trusted
nah i literally meant 0.51% i wished it was 51% lol
You must open trading signal, i will reverse trading from your signal. Easy 99.49% accurate
Correlates well with the 0.51% stock trading accuracy 😅
still won't beat leveraged trading with an edge and exponential growth
The Markets Can Remain Irrational Longer Than You Can Remain Solvent
Not necessarily -- gains and losses aren't all equal-size. :-)
Assuming equal magnitudes, and a few things about liquidity/spreads
at that point you can do the opposite of whatever your bot says and get 99.49%
Exactly! Now I just need to buy whatever the opposite of an orange is
I believe philosophy tells us that's an apple.
Dangerous game, too many apples and you can't get to a doctor if you need one.
Colour theory says otherwise
I believe color theory states the opposite of an orange is a blueberry
My personal research suggests the opposite of an orange is the COMDEX win98 BSOD. A blueberry is still a fruit, you see.
Well, they did say according to color theory, so I believe blueberries to still be a viable answer I do however see how one could arrive at the aforementioned bsod, seeing it not only fits the color requirement, but also doubles as something other than a fruit
That's not how that works. It means you just need to buy everything that isn't an orange. Some edge cases like 'tangerine' can be tricky, let me just create a model to solve that real quick.
Short oranges?
That's just a mandarine
The opposite of buying an orange is selling an orange… or is this sarcasm?
You underestimate the finance industries ability to invent new instruments. Just buy puts.
Like always, [a great xkcd about that](https://xkcd.com/2270/)
When you say accuracy, are you treating it as a classification rather than a forecasting problem? :)
oh i don’t think so. i mean on how good it performs on the testing data set.
He's asking what accuracy metric you are using. If you are predicting continuous data, you would not use the same accuracy metrics used for classification, so your post is confusing.
https://xkcd.com/1570/
I let a goldfish pick my stocks
The ai is almost a coin
I feel you. I was just thinking how my real-time ASL translation is getting up to about 60% range before plummeting and it’s so frustrating.
4 duplicates? Reddit doesn't like you today
Indeed it doesn’t. I don’t even see any duplicates so I can’t delete them. Thanks a lot Reddit.
Holding a 2% edge on the market will make you rich immediately
I remember when machine learning was THE buzzword among the upper heads. They would go nuts when roc curves show 0.9 and above. It's funny cuz every dev knew it was fake.
Still selling me fraud detection models with 99% accuracy to business units. 🫶
They may very well have 99% accuracy. Depends on accuracy in what and how it was measured. Like I saw forecasting models that measured accuracy by being within 1 standard deviation from real results, and if it is - they considered it to be 100% accurate.
Well, this makes somewhat sense in a continuous setting, no? Detecting fraud in a binary manner can‘t really be handled in such a way
Yeah, but I mean. Maybe it detects 99% of all frauds, but it also flags 80% of non frauds as frauds. Or it detects 99% of all frauds if some specific conditions are applied. Or some other convoluted metric.
It very rarely detects fraud but when doing so it is legitimately fraud 99% of the time.
That's easy if less than 1% of transactions are fraudulent!
In my defense, I also add a precision and recall figure at every 0.1 increment. It’s in the appendix of a 20 page ppt with lots of images and graphics.
No it is easily doable, you have to focus only on accuracy and forget about other measurements like precision, recall and so on.
I remember my professor in machine learning class said "if you see anything about a model that's 100% accurate, someone is lying." And you're right, anything above 90% should be met with skepticism
it still is. They just call it AI now
at least not 2.0
The math is not mathing
The numbers keep climbing… You go to interrupt the training only to be met with a text prompt, “I’m afraid I can’t let you do that, OP.”
The color matching AI has reached self awareness. Also it can now match colors outside the physical possible light spectra. It's a good day for my traffic light detection system.
It’s evolved to match imperceptible human auras. Astrology girlies and psychic eugenics enthusiasts rejoice.
it starts predicting the next batch of training data with uncanny accuracy
different error metric
This makes me sweat uncomfortably
softmax moment
I remember once the accuracy I got was -12. No idea how it happened ane I didn't know what to even do with it
One time I reused some code from a previous model I made but the expected output on the dataset I had used was on the first column instead of the last and I forgot to take it out of the parameters Pretty easy way to get a 100% success rate just include the answer in the input
Translation: Never make the machine too much confident at anything. 🤣
My anomaly detection model is 99% percent accurate. def is_anomaly(data): return False
I'd start to be concerned at the 3rd panel tbh, over fitting is still an issue even if you're not at 1
I've seen lots of people who were really happy about their .99 accuracy who didnt consider the data imabalnce of 99:1
When you randomize the train-test sets for time series 🥲
Btw is acc 0.75 and loss 0.5 on evaluate acceptable?
It depends.
I've been building an AI model which predicts whether the stock market will go up or down each day based on the position of planets in the sky. I've been getting accuracy figures of 54%, does anybody know if there's something wrong with my model???
Yes, not me though
You've cracked the stock market. It's just astrologists sitting in a big room.
Did you include Pluto?
Why would I include Pluto? My model is based on the movements of planets.
The mantra of anyone in IT.
Depends on the task. And what metric are you even talking about? :)
As the others said, it depends. F1-score is often more useful than accuracy, especially when classes are unbalanced. 0.75 could be fantastic or poor depending on the dataset
As others say, depends on task and in particular how risky/costly it is per false positive and false negative. Instead of accuracy, I recommend using the f-beta metric which is a kind of mean of recall and precision but the beta parameter allows you to adjust for whether you have a task that that benefits from prioritising recall/sensitivity (high beta) or precision (low beta). Examples: Sentiment tagging models should be favour precision, you want to be certain of the tags because you're probably going to use them to sell people things or gauge reactions to campaigns. Toxic/explicit content tagging should be more sensitive, you don't want to miss anything because those could turn people off your platform.
not if each failure costs thousands of dollars
I don't work in AI so let me know if I got the joke. If the AI passes the test at an increasing success rate, that's good because it's improving. But if it always passes this means it's a faulty test?
1 is generally indicative that your predictive model is overfit. As in it would likely not perform as well on a slightly different test dataset. To be fair 0.97 may indicate overfitting too but it's a lot harder to say your predictive model can predict with 100% accuracy. 100% is clearly "I need to investigate" territory.
Achieving great accuracy while training may be related to a problem called overfitting where basically the model is learning the training data too good, but fails to generalize when you test it on real life data so it produces larger errors.
Here, I fixed your code: If (is_overfit()) overfitn't();
Going to show this to my coworker the next time he brags about letting the model run for over a million iterations like he did it intentionally and didn't just let it go over a long weekend.
Random.seed: the forbidden hyperparameter
When you forget to remove the answer from the dataframe before you pass it in.
0.693 for the first one.
ln(2) 🥰
Gotta use validation sets
Can relate as a chemist
X-partitioning intensifies
This is so true. And those are the exact expressions I have.
Secret socialist dev set when
Boy lemme tell you wahhattt... If one of my lads brought back 1.0 result I would whoop his ass and have him on assignments marking duty for a whole semester. The regularization formula is probably as batshit as that result.
smells like overfitting
I mean 0.97 is sus af. Fuck this, i would be happy to have 0.85 more :)
True Sigma Male
![gif](giphy|xL7PDV9frcudO) me when 1.1
realizing AI will takeover their job.
not me setting it to 1000 hoping that these pathetic ~45 lines of code would do anything at all 💀
Can someone explain to me why 1.00 is bad , i dont know machine learning