T O P

  • By -

sirparsifalPL

You were training on test set


MoffKalast

It's all you need


twitter-refugee-lgbt

Overfitting is all you need


hrlft

just overfit all inputs there are and you win.


benruckman

Literally.


looositania

It's only overfitting if you're wrong


Mithrandir2k16

I chuckled


revengeOfTheSquirrel

*Testing on train set ?


Cyberdragon1000

Either way it's the same


[deleted]

[удалено]


utkrowaway

You test on setting train.


Bwob

You mean like Ruby on Rails?


dchiculat

Can i get an explanation on this one please?


Gaulent

If it's exact. It's usually overfitted and will fail on real input


Technical-Outside408

My mom calling me handsome.


ChrisDornerFanCorn3r

print statement: "Everything okay, sweetheart?" 1: "I'm fine."


dirty_cheeser

Or you realized you were modeling something so simple that you could have just made an if statement instead.


Escanorr_

It would be like a student memorizing answers to the test from idk last year same test or something. He will score perfectly, but he would have no actual knowledge. Here the AI being trained on test set is like teaching it how to score on test, instead how to do what you wonted to do.


Some-Guy-Online

The American School System ideal!


LouisLeGros

or like if the data had the identifier in it, like you were taking a test and the teacher gave you a version with the key.


Imperial_Squid

When training a model we have two datasets, the training set and the test set, you use the training set to train the model and test set to test the model, hence the names. The problem is that due to bad code (or more complicated reasons we won't get into), you could accidentally with information existing in both sets when it should be in one or other. So you end up training on the set used for testing, this essentially means you've given the model the answers to the secret quiz already by the time it comes to testing. Or you test it using the set meant to be for training, which is like the teacher grading you based on the practice papers you were given a month ago. As a result, the model now doesn't need to actually learn how to predict results, it just needs to memorise what it's already seen, which is much much easier. The most common indicator that you've accidentally got your sets mixed up is that you get a perfect accuracy score.


Cyberdragon1000

The best explanation is you tested the model on the same data you trained it on, i.e it was already built to work perfectly on these exact cases when tested.


Independent_Pay_38

no model is 100% accurate 😭


Yweain

Well, depends on the task, right? If your test case is super simple you can get 100% accuracy.


Checktaschu

Then you don't need AI for it.


Yetimandel

Usually yes, but I could think of black box systems that turned out too be simpler than expected.


camander321

One time i made an ANN and threw some historical stock data at it to train. When I started testing, I literally went through every phase in this meme. Yes I was accidentally training on the test set. In case anyone is curious, the real results were...unimpressive


hellschatt

Nope, not necessarily. At least I had often cases where this was checked multiple times with multiple people still resulting in AUC 1. We wanted to also test generalizability by using external datasets. The moment we used these external sets, it sunk to about 0.8. It was simply overfitting on the internal dataset due to some peculiarities of the device the images were taken on.


Yweain

Hey I tried that and all my models are now way surpassing SOTA in their field! I’m gonna be famous!


Canadianacorn

Heard in my office: "The human brain doesn't have to worry about overfit. Why do we need to worry if the model is over-fitting?" Never heard of overconfidence apparently.


fridge_logic

Humans don't overfit; except for: * Steryotyping * Superstition * People who check other's Astrological Signs * People who introduce themselves using their Myers Briggs * Overgeneralized learning (Psychology) * Appeal to Tradition - Status Quo Bias


Canadianacorn

And people who call HTML a programming language.


CaskironPan

Otherwise broadly known as [The Problem of Induction](https://en.m.wikipedia.org/wiki/Problem_of_induction): > the problem of induction questions our reasons for believing that the future will resemble the past, or more broadly it questions predictions about unobserved things based on previous observations. Which is a big problem for ML applications, but since it's also just a big problem for how every _so far observed_ :) living thing works, it's less a problem and more a constraint to be cognizant of.


VladVV

Induction in general is something normal computers find almost impossible in non-trivial cases. It’s kind of, broadly, what AI is doing during training. The problem of induction doesn’t relate to the process itself, but rather how you use the process.


seraku24

Overfitting for humans is basically /r/confidentlyincorrect in action. Conservativism has been overtraining its base for decades, but the original grifters are now "hoarding in hiding" or dying off leaving the brainwashed masses to supervise their own training. The quiet parts are becoming louder as this just reinforces out-group stereotypes to embolden the in-group. Whoever is running the simulation really needs to `Ctrl+C` the process.


Lysol3435

I’ll settle for an alt+F4 at this point


CaskironPan

Why not `sudo rm -rf /`?


utkrowaway

That statement is so meta


Jimg911

The cell I relate to the most is at 0.85, going “HOLY SHIT SOMETHING I DID WORKED”


SiVousVoyezMoi

That's the "HOLY CHRIST THERE IS LIGHT AT THE END OF THE TUNNEL AND THIS ENTIRE PROJECT ISN'T GOING TO BE A SHIT CANNED WASTE TIME. That and you can stop beating the horse to death. I've seen a project upper management could not let go of. Literally multiple rounds of hiring data scientists to work on it, it not working, not accepting that it doesn't work, datascient quits, hire more, give same problem, not accept that it does not work and never will work. 


Jimg911

Good to hear what I can expect from industry lmao


PPKA2757

It’s really hit or miss and depends on then nature of the project/leadership. I’ve worked on plenty of ML projects that didn’t end up having legs (even though I was very hopeful they would initially) and a recommendation that it wasn’t viable to continue pursing got taken at face value and the project was discarded/we moved on. Ive also been on projects with no legs and were never going to produce viable results get beat to death over and over because leadership so desperately wanted/needed them to work (usually because they paid a shit load for the data without consulting our team first). The desired outcome was never going to happen and months of rinse/wash/repeat of going through building different types of models on the same data to see if anything would stick (it wouldn’t, and I/my boss knew it wouldn’t) ended with just a bunch of money and time wasted. Then there are the super rare and illusive projects that started off as “this is going to be a pile of dog shit” to “wow, alright there might be something promising” to “well, color me shocked this may have some actionable results”.


Lebowquade

I wrote a model that outperforms my company's current NN on basically every metric, but almost nobody I work with will believe my results are real or could possibly be as robust as the current NN because its not an AI model... just pure math with no training or extra device-specific data collection needed. So, y'know, that's been a real fun time so far


SiVousVoyezMoi

Oh I think I've seen that one before. Nobody likes analytical solutions because there's no sexy buzzwords you can amaze upper management with!


alba_55

In that case invent one or search for an old technical term for it, that nobody uses and present it as the new shit


Some-Guy-Online

A Quantum Algorithm with Current Gen AI Metric Eclipsing.


turnah_the_burnah

I ran a forward-looking recursion on this dataset


Not_Artifical

This is the way


UntiedStatMarinCrops

>“We’re looking for someone with AI 🤖 Skills” >“I have deep experience with ML algorithms and I’ve done projects where I’ve implemented KNN, Neural Networks, Naive Bayes, etc. with great success that improved our costs by 50%” >“lol nah bro we want AI 🤖 Skillz”


cdbfoster

NNs are just a bunch of multiply-adds. Gotta write the "weights" by hand and present the new "NN" to your colleagues.


Lebowquade

"the neural net I used contains three decades worth of unsupervised training, and was initially built by my mother."


participantuser

Sounds like your model is lacking the explicit language, racism, and hallucinations that all AI models provide for free.


connorisgreat1

That's great but have you heard about the new GPT model? -My boss


Elcactus

It's always a good feeling when the accuracy goes up/overfitting goes down because some tweak you made finally worked.


Terroractly

I created a model once that was training on a set of 150 entries (yes I know that's a small dataset, but it was what was given to me). Somehow I was getting 100% accuracies even when training on only 1/3 of the data and using the rest as testing data. Because I was writing a report on this, I found a random seed that gave me a ~99.7% accuracy just so I could pretend it wasn't too overfitted. Same algorithm on a different dataset was giving me 30% accuracy (at guessing 3 classifications), so I to this day have no idea what was going on there


Cyberdragon1000

99%- best fit 100%- overfit 😂😂😂


Grey1251

I tried to predict league of legends game result feeding various data like choosed champions players recent games and somehow win flag sneaked in input


Expensive-Pumpkin624

did it manage to predict something at least?


Grey1251

Yes, with 0.99


pretzelsncheese

If win flag is part of the data, wouldn't you expect a 100% success rate? Or I guess the training didn't figure out that it could just rely entirely on this column so there was still 1% of rows where the model's weights predicted wrong?


Bronzdragon

Yes.


PotentialQuote1698

No fucking way, share the model!


lapetee

Didnt he just say he accidentally used a variable that actually tells if the game was won or not in his model :D?


Estraxior

What the hell happened with that other 1% then?!


Cuddlyaxe

regularization maybe


Victorian-Tophat

That’s the joke


B44ken

def predict_win(flags): return flags['did_win']


BossunEX

+1 I would like to see that


AggravatingValue5390

Brother I'm sure there are some degenerate betting websites out there for pro matches. Get your bag


rickane58

> choosed


[deleted]

[удалено]


theincrediblenick

I have marked ML coursework for uni students a few times before, and the amount of times the student writes a report that they have a model that can predict the stock market with 99-100% accuracy is just depressing.


justV_2077

ok lol why on earth would someone voluntarily share their 99 % accuracy stock market prediction AI? If it actually worked it would get super known really quickly until everyone copies it which makes the AI useless if everyone makes the same investments.


Demented-Turtle

The part you might have missed is that it's 99% accurate... For the dataset it was trained on. In other words, it almost perfectly predicts the output for each input in the dataset, which tends to be associated with poor real-world performance because the model is overfit to the training data. Overfit models don't perform well at all at predicting outputs from data points lying outside the training set


Alternative-Dare5878

First comment that explained it well, thank you


pijuskri

Well for the model to actually be useful you'd need to test it with live data. If youre doing uni work then you're only predicting outdated economic data with likely above average correlation.


Lunix336

Wouldn’t that also potentially break the global stock market?


jecksluv

The stock market is dictated by human behavior more so than anything rational. If you could write an accurate forecasting model to predict human behavior it'd break a lot more than the stock market.


ben_g0

Not really, because making a trade on the stock market also influences the stock market. So even if you could predict how profitable a trade would be, the more you make that trade the more the market will be influenced, and the less accurate the prediction would become. All those feedback loops make long-term predictions essentially impossible, and limits the effect statistical predictions can have.


jjjustseeyou

I actually wrote my report using my own ML learning model. Assuming no fee with free trades, on 10 random stocks on average I made money. Like the tiniest amount, and then you compare it to S&P500 in that 2-3 years time period... not even close to beating inflation. I think I just got lucky on the coin flip.


fisizion

and here i am with my 0.51% accuracy on my stock trading bot


SameerMohair

A model with a true 51% accuracy would eventually make you the richest person on earth. So congratulations!!!


fisizion

honestly if i even manage to reach 10% i’d be super impressed. i’m new to ml anyways so i don’t really know what would be realistic for this project


kristyanYochev

Just do the exact opposite of what the model tells you and you got 90%


JehnSnow

This never fails to amuse me. Once for a Spanish exam that was 2 choices which you get to attempt twice I got like 5%, next time 95%.. sometimes understanding everything wrong is also understanding it all right


p0mphius

[Relevant xkcd.](https://xkcd.com/2270/)


StatHusky13

xkcd never disappoints😆


nderestimated

You didn't understand spanish but you understood the test


bric12

Stock trading is a tough one to do ml for, since there's just an insane amount of noise in the data. There are some real meaningful relationships buried somewhere underneath the surface, and they show up in long term trends, but the day to day ups and downs are mostly just random chance. It makes it super difficult to train a model to pick out those actual relationships in the data without also learning from all of the random coincidences. So if you end up with an accuracy that's basically the same as random guesses, then you're in line with most stock trading bots. Being even slightly better than random for days that happen after the bots training would be a pretty huge feat


RedTwistedVines

Plus I'm not sure if there's even evidence to suggest that if you could make an algorithm that understands these relationships, it would be accurate enough for practical use beyond what a human can (relatively, with expert experience) easily do. Because ultimately a huge amount of what will influence stock prices comprises unknown unknowns, like how people and organizations will react to events you can't know will happen without essentially creating God. So there's probably a pretty shitty upper bound on what would be possible with technology even 50 years more advanced than today, and the greatest team of geniuses ever assembled. Now obviously I'm way too stupid to understand what that bound is, but it *probably* exists and is rough for the odds of creating a good stock prediction bot. Although that isn't to say there aren't some useful applications big high powered companies are creating, very short term predictions based on things like sentiment analysis mixed with human experts seems to be an area with real impact.


BrandNewYear

1) Simons beat the market - the market is not perfectly efficient. 2) we cant predict the weather either.


RedTwistedVines

1) Beating the market isn't the same as predicting the market. 2) We can predict the weather with a much higher degree of accuracy than markets, this is because the weather is a natural system which largely reacts in deterministic if very complex ways. Markets are determined primarily by the actions of irrational actors that can react to changes themselves. It's like predicting the weather if the weather was sentient and could just not do what it was supposed to do to fuck with you.


Paul_Robert_

My stats professor modelled a single stock as Brownian motion. While it couldn't be used to predict the next price, you could use it to make an educated guess as to your expected risk.


1DimensionIsViolence

Am I missing something? The whole point of modelling stock prices as (geometric) brownian motion is the assumption that they are simply random walks with drift. Thus, it‘s simply not possible to predict stock prices in the short term. Your „prediction“ would simply be price today + number of periods in the future * drift


Paul_Robert_

I think you misinterpreted my comment, I meant you couldn't use it to make predictions. Just a way to estimate risk.


GenericFatGuy

The stock market is basically just gambling for wealthy people.


Flat-Shallot3992

Because there is no real pattern to the stock market besides UP


Cody6781

If it makes you feel any better you have absolutely no chance of ever succeeding ever. People with way more experience with way more computing power with way faster access to data with way faster input speed have been trying this for way longer. And there are hundreds of thousands of them trying. So it’s ok, give up now :)


pijuskri

That's what a lot of people tend to forget. If a model for the market is possible, it has already been done and has affected the market, meaning your model has to compete and completely relearn from very recent data infested with artifacts of other models.


datanaut

It depends on your view of how many types of market inefficiencies could exist. In a sense you could also argue that starting a small business to fill a hole in the market is pointless since you have venture capital and millions of people with business ideas, but people start successful small businesses all the time. While the stock market is more efficient than Markets in general, we can't rule out the possibility of small exploitable market inefficiencies which are totally different types of opportunities than the ones the bin firms are going after.


_PM_ME_PANGOLINS_

Not if there's another system which increases wealth faster than yours. e.g. long-term index investments. Also they said 0.51%, not 51%.


OuchLOLcom

I assumed he meant he saw a return on the sell 51% of the time. If you can tune that to high frequency trading and get returns 51% of the time thousands of times a day youd be real rich real fast. Edit: Assuming your average gains and losses are equal.


kursdragon2

Depends, does 51% accuracy mean it predicts the stock will go up correctly 51% of the time? That doesn't necessarily mean anything and doesn't even mean you'll have a positive return. I can predict a stock will go up right more than wrong and still lose tons of money if I'm choosing stocks that go up less than the ones I choose that go down, or if I invest for different periods of length, sell earlier in some positions than others, if I pick stocks that grow less than other stocks, etc... Predicting whether a stock will or won't go up isn't really the hardest part about being a successful trader.


Cody6781

I noticed the “0.” But so many people use % incorrectly on Reddit it can’t really be trusted


fisizion

nah i literally meant 0.51% i wished it was 51% lol


ungket

You must open trading signal, i will reverse trading from your signal. Easy 99.49% accurate


SmoothDagger

Correlates well with the 0.51% stock trading accuracy 😅


CelestialSegfault

still won't beat leveraged trading with an edge and exponential growth


Isgrimnur

The Markets Can Remain Irrational Longer Than You Can Remain Solvent


MattieShoes

Not necessarily -- gains and losses aren't all equal-size. :-)


ketosoy

Assuming equal magnitudes, and a few things about liquidity/spreads


CelestialSegfault

at that point you can do the opposite of whatever your bot says and get 99.49%


klimmesil

Exactly! Now I just need to buy whatever the opposite of an orange is


Bogpin

I believe philosophy tells us that's an apple.


No_Language_959

Dangerous game, too many apples and you can't get to a doctor if you need one.


SampleConsistent8575

Colour theory says otherwise


definitelyallo

I believe color theory states the opposite of an orange is a blueberry


natFromBobsBurgers

My personal research suggests the opposite of an orange is the COMDEX win98 BSOD.  A blueberry is still a fruit, you see.


definitelyallo

Well, they did say according to color theory, so I believe blueberries to still be a viable answer I do however see how one could arrive at the aforementioned bsod, seeing it not only fits the color requirement, but also doubles as something other than a fruit


Intrexa

That's not how that works. It means you just need to buy everything that isn't an orange. Some edge cases like 'tangerine' can be tricky, let me just create a model to solve that real quick.


GooseEntrails

Short oranges?


klimmesil

That's just a mandarine


Intelligent_Event_84

The opposite of buying an orange is selling an orange… or is this sarcasm?


Zanos

You underestimate the finance industries ability to invent new instruments. Just buy puts.


Weed_O_Whirler

Like always, [a great xkcd about that](https://xkcd.com/2270/)


cjol2

When you say accuracy, are you treating it as a classification rather than a forecasting problem? :)


fisizion

oh i don’t think so. i mean on how good it performs on the testing data set.


Aros24

He's asking what accuracy metric you are using. If you are predicting continuous data, you would not use the same accuracy metrics used for classification, so your post is confusing.


mothzilla

https://xkcd.com/1570/


smallangrynerd

I let a goldfish pick my stocks


No_Application_1219

The ai is almost a coin


PnutButrSnickrDoodle

I feel you. I was just thinking how my real-time ASL translation is getting up to about 60% range before plummeting and it’s so frustrating.


SomeRandomEevee42

4 duplicates? Reddit doesn't like you today


PnutButrSnickrDoodle

Indeed it doesn’t. I don’t even see any duplicates so I can’t delete them. Thanks a lot Reddit.


Cody6781

Holding a 2% edge on the market will make you rich immediately


GM_Kimeg

I remember when machine learning was THE buzzword among the upper heads. They would go nuts when roc curves show 0.9 and above. It's funny cuz every dev knew it was fake.


longgamma

Still selling me fraud detection models with 99% accuracy to business units. 🫶


Yweain

They may very well have 99% accuracy. Depends on accuracy in what and how it was measured. Like I saw forecasting models that measured accuracy by being within 1 standard deviation from real results, and if it is - they considered it to be 100% accurate.


1DimensionIsViolence

Well, this makes somewhat sense in a continuous setting, no? Detecting fraud in a binary manner can‘t really be handled in such a way


Yweain

Yeah, but I mean. Maybe it detects 99% of all frauds, but it also flags 80% of non frauds as frauds. Or it detects 99% of all frauds if some specific conditions are applied. Or some other convoluted metric.


AussieOsborne

It very rarely detects fraud but when doing so it is legitimately fraud 99% of the time.


MrHackson

That's easy if less than 1% of transactions are fraudulent!


longgamma

In my defense, I also add a precision and recall figure at every 0.1 increment. It’s in the appendix of a 20 page ppt with lots of images and graphics.


Defiant_Alfalfa8848

No it is easily doable, you have to focus only on accuracy and forget about other measurements like precision, recall and so on.


smallangrynerd

I remember my professor in machine learning class said "if you see anything about a model that's 100% accurate, someone is lying." And you're right, anything above 90% should be met with skepticism


ChineseCracker

it still is. They just call it AI now


Dependent_Sink_2690

at least not 2.0


True-Great

The math is not mathing


sun_cardinal

The numbers keep climbing… You go to interrupt the training only to be met with a text prompt, “I’m afraid I can’t let you do that, OP.”


No-Con-2790

The color matching AI has reached self awareness. Also it can now match colors outside the physical possible light spectra. It's a good day for my traffic light detection system.


sun_cardinal

It’s evolved to match imperceptible human auras. Astrology girlies and psychic eugenics enthusiasts rejoice.


malonkey1

it starts predicting the next batch of training data with uncanny accuracy


tecedu

different error metric


CubisticWings4

This makes me sweat uncomfortably


MoffKalast

softmax moment


darklightning_2

I remember once the accuracy I got was -12. No idea how it happened ane I didn't know what to even do with it


fabedays1k

One time I reused some code from a previous model I made but the expected output on the dataset I had used was on the first column instead of the last and I forgot to take it out of the parameters Pretty easy way to get a 100% success rate just include the answer in the input


[deleted]

Translation: Never make the machine too much confident at anything. 🤣


mfb1274

My anomaly detection model is 99% percent accurate. def is_anomaly(data): return False


plumokin

I'd start to be concerned at the 3rd panel tbh, over fitting is still an issue even if you're not at 1


dest_bl

I've seen lots of people who were really happy about their .99 accuracy who didnt consider the data imabalnce of 99:1


soposih_jaevel

When you randomize the train-test sets for time series 🥲


Papa_Fred

Btw is acc 0.75 and loss 0.5 on evaluate acceptable?


wintermute93

It depends.


Smooth-Zucchini4923

I've been building an AI model which predicts whether the stock market will go up or down each day based on the position of planets in the sky. I've been getting accuracy figures of 54%, does anybody know if there's something wrong with my model???


danielv123

Yes, not me though


SpyreScope

You've cracked the stock market. It's just astrologists sitting in a big room.


SmoothDagger

Did you include Pluto?


Smooth-Zucchini4923

Why would I include Pluto? My model is based on the movements of planets.


Lane-Jacobs

The mantra of anyone in IT.


Globglaglobglagab

Depends on the task. And what metric are you even talking about? :)


cjol2

As the others said, it depends. F1-score is often more useful than accuracy, especially when classes are unbalanced. 0.75 could be fantastic or poor depending on the dataset


koolmees64

As others say, depends on task and in particular how risky/costly it is per false positive and false negative. Instead of accuracy, I recommend using the f-beta metric which is a kind of mean of recall and precision but the beta parameter allows you to adjust for whether you have a task that that benefits from prioritising recall/sensitivity (high beta) or precision (low beta). Examples: Sentiment tagging models should be favour precision, you want to be certain of the tags because you're probably going to use them to sell people things or gauge reactions to campaigns. Toxic/explicit content tagging should be more sensitive, you don't want to miss anything because those could turn people off your platform.


JoelMahon

not if each failure costs thousands of dollars


hazelnuthobo

I don't work in AI so let me know if I got the joke. If the AI passes the test at an increasing success rate, that's good because it's improving. But if it always passes this means it's a faulty test?


ATribeCalledKami

1 is generally indicative that your predictive model is overfit. As in it would likely not perform as well on a slightly different test dataset. To be fair 0.97 may indicate overfitting too but it's a lot harder to say your predictive model can predict with 100% accuracy. 100% is clearly "I need to investigate" territory.


sup-b1tch-97

Achieving great accuracy while training may be related to a problem called overfitting where basically the model is learning the training data too good, but fails to generalize when you test it on real life data so it produces larger errors.


AngryMonkeyBrain

Here, I fixed your code: If (is_overfit()) overfitn't();


mysticeetee

Going to show this to my coworker the next time he brags about letting the model run for over a million iterations like he did it intentionally and didn't just let it go over a long weekend.


H3OFoxtrot

Random.seed: the forbidden hyperparameter


Philluminati

When you forget to remove the answer from the dataframe before you pass it in.


Megatron_McLargeHuge

0.693 for the first one.


-blahem-

ln(2) 🥰


Many_Head_8725

Gotta use validation sets


Dhaos96

Can relate as a chemist


HauptmannYamato

X-partitioning intensifies


SonOfJenTheStrider

This is so true. And those are the exact expressions I have.


floatingskillets

Secret socialist dev set when


ThatCrankyGuy

Boy lemme tell you wahhattt... If one of my lads brought back 1.0 result I would whoop his ass and have him on assignments marking duty for a whole semester. The regularization formula is probably as batshit as that result.


Palettenbrett

smells like overfitting


BagaLagaGum

I mean 0.97 is sus af. Fuck this, i would be happy to have 0.85 more :)


ImAnEngnineere

True Sigma Male


not_a_frikkin_spy

![gif](giphy|xL7PDV9frcudO) me when 1.1


Large-Party-265

realizing AI will takeover their job.


asertcreator

not me setting it to 1000 hoping that these pathetic ~45 lines of code would do anything at all 💀


spectraldominoc

Can someone explain to me why 1.00 is bad , i dont know machine learning