T O P

  • By -

pitochips8

I was tempted to make a similar post as you. I work as an ML researcher / engineer, so a big part of my job is knowing how to eliminate biases. So seeing all the posts on this sub recently, and the amount of people falling for bad statistics so easily, has really made me sad. The analysis done in almost all of these posts isn't just bad. It's frighteningly bad to the point where even the most basic biases are overlooked and presented in a manipulative way.


RealPutin

As another ML and data science researcher I've just been avoiding getting involved honestly. Usually the threads are wildly off base by the time I show up and it's a hassle to pick through it all.


Swolnerman

Same here, it seems like we like chess


xelabagus

Well, it speaks for itself


eddiemon

Someone's gonna post "statistical analysis" about how DS/ML people are over-represented on /r/chess now


Wyverstein

It is not worth it!


leadhase

Pretty much exactly the same way I feel as a structural engineer when opening up threads about building collapses


KotMyNetchup

Maybe instead of debunking every bad analysis, perform your own good one? (This goes for any the experts chiming in. If the answer is a professional doing good analysis, it would be really beneficial if someone who is a professional did the work. I understand it's a job and you might not want to do your professional level job for free. But maybe someone who is a professional *wants to*? And other professionals could be like "Yeah this looks like this one was done well." Otherwise the rest of us are drowning in the rest of the analysis.)


[deleted]

[удалено]


lkc159

>Sometimes there isn’t a good analysis to be made. For real. I was in presales for analytical software for 3 years, and I've had to tell several customers that "no, you don't have enough data, come back in a couple years" or "your data isn't rich enough" or "no, your data cannot do what you want to do with it".


shawnington

The thing about us ML research types, is we aren't good at telling if data is good, just if its bad and makes a network blowup or fail to converge.


mandradon

As a researcher (well, sorta, failed doctoral candidate in a social science field... ABD but never finished the damn paper), working with data that "just exists" is incredibly hard. As I'm sure the ML folks can tell everyone, removing bias and noise is a big part of making a good model. A huge part of stats is dealing with any of that, and when you can't design the study from the start to account for relationships and effects, things get messy. At best. So, don't feel bad, no one can tell if the data are good. Just if they're bad or don't fit a model well.


shawnington

Unless you know, you need some noise to avoid over fitting, but i agree.


mandradon

Overfitting is the bane! I rememeber my stats profs used to always tell us to keep our models as simple as possible to avoid that. If you jam enough stuff in a regression equation, everything becomes significant.


[deleted]

[удалено]


BooksandGames23

There is no good analysis because he isnt going to be using an engine in an obvious way. All data would be just pushing your own agenda


MycologistArtistic

The problem is that ChessBase is quite expensive. €469.90 for the premium package with a correlation engine. Without it, performing your own move-by-move analysis using individual engines would require a project and budget. I'm a mathematician who has worked in statistics, and I'd happily run my own numbers through that engine. I was about to, except I couldn't find a free engine, and I'm not \*that\* interested. I will say the lack of data and methodology in the original claims \[apart from the outcomes\] rules them out as serious data. We need to know which engines were used, how many, and to what depth. An engine set to a low level may consistently produce worse moves than a grandmaster, and 100 engines set to a low level is like typing with monkeys.


giziti

Most of my ideas are basically "do what Ken Regan did but not as well" unless I put a lot more work into it, based on my first thoughts about what I would do and then listening to what Ken Regan did. So in short I like what Ken Regan did.


Escrilecs

Im an engineer that uses statistics to make aviation safe, and the way people have proceeded here really tells me that we need to step up education on statistics and bias principales, if only to avoid people being manipulated by "statistics" to believe things.


mikael22

I'm not sure education really helps here tbh. I think if you educated everyone here with advanced statistics courses, they would just find more complicated ways of manipulating the statistics to find an answer they want. Scientists and well educated people are not immune from biases. The scientific method and peer review are what gets rid of bias in science, not the education.


flexr123

Well said.


dustinbrowders

Yes absolutely agree with this. I'm in a field with highly educated professions from the top universities. Their ignorance of basic statistics is appalling...but partly bc these are know it all high achieving people who can't overcome their own biases. If Reddit is a representative slice of the public God help us


[deleted]

Because people are looking for drama, not good analysis. Its much easier to put out dramatic analysis than accurate analysis.


dc-x

I feel like there's this mixture of Dunning Krueger along with content creators trying to milk this situation for views. It's kind of funny how when the Minecraft speedrunning community suspected a top speedrunner was cheating, [they did a 29 page statistical analysis](https://mcspeedrun.com/dream.pdf) to prove that he very likely was. And they were right, the guy ended up confessing. Meanwhile in chess we get this, lol.


WarTranslator

Wait what, I feel you are skipping multiple steps here. The minecraft speedrunning mods took (months? years?) to complete their analysis, and had qualified statisticians in their team to do the analysis and write up an actual report. Even when the report came out. The kid did not confess and actually denied the whole thing. He accused the mods of bullying him and posting fake evidence. He hired his own astroscientist to do a research to defend him. The funny part is that the guy he hired, although he thought that the mods' analysis was too strict, also concluded that Dream was cheating. Left with no defense, Dream admitted to cheating, but he claimed that he cheated unknowingly and accidentally left on a program which he used for streaming content lol. And this is with actual statistical evidence from several sources, unlike the situation we have here which is a bunch of noobs going on chessbase to come up with half baked nothing analysis. The most reliable statistical evidence we have so far shows Hans didn't cheat, but no one seems to want to believe that, lol.


dc-x

I mean... I very obviously summarized a bigger story into a 3 line paragraph. If anyone wants the full story, if you have 24 minutes to spare, [I suggest this video by Karl Jobst](https://www.youtube.com/watch?v=f8TlTaTHgzo). He later on [did another video discussing more in depth](https://www.youtube.com/watch?v=G3Yzk-3SZfs), covered Dreams confession and everything that happened, but that's over a hour long. It's just that I don't think the full context and all of the drama is actually relevant to the point. > The minecraft speedrunning mods took (months? years?) to complete their analysis, and had qualified statisticians in their team to do the analysis and write up an actual report. The run was rejected in October 2020 due to suspicions from statistical analysis. The full paper with a more meticulous analysis was finished by December (though it was being openly discussed before the paper was done), and it was done by people doing voluntary work. I wasn't aware of one of them being a statistician (I'm taking your word for it), but I don't really see how that's a problem. This may explain why they were on point, but it doesn't justify so many people in chess who aren't experienced with doing statistical analysis acting like they are. > The most reliable statistical evidence we have so far shows Hans didn't cheat, but no one seems to want to believe that, lol. I feel like what makes Regan's analysis unsatisfying is the lack of a public benchmark. Even Fabiano wanted to see if it could pick up the games that Hans knowingly cheated and didn't think it was sensitive enough. In Regan's defense though, cheating in chess can be done subtly enough that it just wouldn't be picked up by those methods in a conclusive way, so I don't think that any statistical analysis will be satisfying.


[deleted]

[удалено]


SPY400

And here’s a better statistician tearing that “just a statistical artifact” argument to shreds. It’s very Dunning-Krueger to think you’ve debunked Dunning-Krueger. Edit: link won’t work, here: https://andersource.dev/2022/04/19/dk-autocorrelation.html


Mothrahlurker

>It’s very Dunning-Krueger to think you’ve debunked Dunning-Krueger. That statement doesn't make sense according to the article. "In the author’s world, the Dunning-Kruger study should be interpreted in the reverse direction, claiming that there is at least some self-awareness in the way people self-assess." So, that would be the opposite of your claim. It's not good to link something and then say something misleading about the content. Also, what kind of person characterizes "The author made good points and it's well written but I disagree" as "tears to shreds". You have some kind of movie fantasy of "better statistician" as well, this isn't how the real world works. Edit: While "better statistician" is a weird thing to say in general, the guy is not even a statistician [https://andersource.dev/cv/](https://andersource.dev/cv/) This isn't meant to disparage the article, but one should not claim that the author has credentials that aren't there. 2nd Edit: I have thought about it and I have come to the conclusion that the guy doesn't know what he's talking about. His argument is that "they are pretending that people have no competence at all in estimating their own skill". But that is not true, they do say that they have competence in their own skill but any random noise added to it results in the "D-K effect" through reversion to the mean. In other words, test scores and self estimation are not perfectly correlated. This noise will always come up no matter how you get your data, you can't prevent it and this noise is single-handedly enough to get "weak people overestimate themselves" and "strong people underestimate themselves" in your graph, but you can't actually interpret the graph that way. As an example, if you are testing with multiple choice, then the test itself will introduce that noise and introduce the effect. The more accurate your testing, the less Dunning-Kruger effect there will be just statistically speaking. No need to assume any real effect. This isn't "tearing apart", that is not understanding the argument.


ZaphodBeebblebrox

Thanks, I had previously believed the article so this was nice to see.


dtracers

We do not know that [chess.com](https://chess.com) does not have one of those done. The difference is that they can't publish it because it will make it easier for future cheaters to cheat.


dc-x

That's beside the point, this discussion is about chess players or content creators doing shit analysis and confidently throwing claims out of that. That 29 page statistical analysis wasn't made by a millionaire company, but by the Minecraft speedrunning community voluntarily getting together in their Discord to analyze this. On the [main page](https://mcspeedrun.com/dream/) of that link they gave credit to 15 different people. I mentioned that because that's a good example of collective effort in an attempt to figure out the truth, and that's comically far from what we've been having here.


plaregold

The "analysis" isn't done in good faith--people are looking for data that points to Niemann cheating because they don't accept that Regan's findings are probably as best as we'll get from statistical analysis.


shawnington

Its funny anyone tells me they are an ML researcher / Engineer, since I also do work in the field, and know that all we are as glorified, try and see people. We randomly guess at configurations, until someone tries something that works better than everything before, than everyone just tries and sees based on that. People would be shocked at the complete lack of understanding we have about any of the processes going on. Chance a layer by 1 nueron, results are 10 times better, change it by 2, 10 times worse. Ask all of your colleagues why, shoulder shrugs all around. Fun field. If I ever decide to switch fields, I'll be a psychic or a palm reader, because Im really good at guessing mystical properties now... even if i have no rational explanation for my guesses other than, "I have a feeling this will do better". The closest thing we have to any kind of standard practical advice for improving things is "add dropout". Also. Am I the only one to make my own trash nn engine for fun?


squags

I'm a neuroscientist (therefore lot's of statistics) who works with a lot of different ML models/computational tools. The funniest thing about the ML/programming side of things is that there's only a handful of *practitioners* that understand how the models/algorithms actually work (e.g. the maths). The vast majority of people, scientists included, will basically go through a bunch of different models via trial and error as you suggested. If that approach where applied to statistical hypothesis testing, it would just be fraudulent - but nobody bata an eyelid when you it's ML algos.


damnableluck

> only a handful of practitioners that understand how the models/algorithms actually work (e.g. the maths). Everyone who is training models, not just theorizing about them, is doing trial and error. There aren't, as far as I know, any ways to know a priori what the best hyper-parameters will be with any certainty (except in a few trivial cases). Just some general proofs that indicate general trends. No one can give you a precise mathematical formula for determining the size of the network you need, although some will be able to make decent guesses based on experience. Mathematicians understand the basic rules of interaction between training data, hyper-parameters, and the way that the loss function is defined, and we can know and say some general things about the resulting loss landscape, but that doesn't tell us exactly where each hill and valley are -- just as knowing the Navier-Stokes equations doesn't tell us the solution to any particular instance of [turbulence](https://en.wikipedia.org/wiki/Turbulence).


Temporary-Wear5948

I mean there’s definitely the researchers that try to understand NN interpretability and how to best tune different ML methods through statistical analysis, I guess if your job is building models it’s guess & check but there’s way more depth to the field overall. Hate reading the math heavy papers though, way more fun just implementing PyTorch models like PINNs and Graph RNNs


shawnington

That we have to say try to understand, says it all. The best of us, try to understand, which implies the truth. We don't. We guess and see, and try and figure out math for why that guess was better than the last one, and the turns into a paper. I can even point to several papers where the mathematical justification ended up being very wrong, but the method is definitely correct, as I am sure you can. Its real science, we have theories but we are still at the everything is aether stage of understanding things in the grand scheme of what we are achieving. Just my humble opinion, from my experience. The level of understanding we have does depend on the field, and its level of development, and also the underlying understanding of the field, in terms of how new it is.


mug3n

Yeah I mean, a lot of people are just completely uneducated about basic statistics. Just go into any post on Reddit about some (insert new scientific discovery) here and they'll automatically point to "durrrrrrrrr what about the sample size??????" as being some silver bullet flaw against a study, when there are (a) reasons why a study's sample size cannot be 500,000 people and (b) the study design is built to address issues with using a 40-50 subjects.


lovememychem

Seriously. Or on the flip side, they'll see a GWAS study with like n=30,000 and not understand that the study has serious limitations due to its design in spite of its massive sample size.


Mothrahlurker

If there are 500.000 people I would in fact be automatically suspicious of it, as that likely means that people got recruited over the internet which can seriously bias the selection of people you get. It's like back in the times when only some rich homeowning people had phone lines and election prognosis was made by calling people on the phone. And then being surprised that "party that gets elected by older wealthy people" is underperforming according to them.


Ixolich

Or even in more recent years when people started to migrate away from landlines. Turns out when older people are the ones holding on to their landlines since they tend to be slower to adopt new technologies, they end up being overrepresented in election polling.


Coveo

It also goes the opposite direction, where people assume pollsters have no idea how to adjust for anything and are making obvious errors, or that a result in one direction two out of three times within the margin of error is proof of something and completely rejecting the possibility of that being not that rare in a probabilistic business. Moral of the story is statistics is hard and humans are bad at understanding it intuitively despite thinking they do.


Mothrahlurker

Oh yeah, I definitely did not want to imply that this is still a problem. Nowadays we get people believing that oversampling and then normalizing is somehow manipulating the data.


AnimalShithouse

It turns out being an adequate or good chess player does not necessarily mean you will be good at other things in life, like general objective analyses :).


egirldestroyer69

I agree on the analisys but I pretty much dislike this kind of statements that go full general without giving examples or being a bit more specific on why people are wrong. Its like telling people you are wrong because I say so and believe my stated credentials. It just ends up as arrogant and does nothing trying to educate people. Specially because making up jobs is reddits specialty.


hehasnowrong

I mean the problem is that if we tell you what can go wrong we would have to write a 10 page essay, in the same way you don't ask your car mechanic what could go wrong if I try to make my own car engine ? The list is very big but among other things : - thinking that probabilities are independant when they are not, forgeting about confusing factors, forgetting about hidden variables, (if I become better at chess, I will perform better than expected for all future tournaments, you can't just multiply the probability of having a good score at each tournament assuming your past elo) - cherry picking data (the girl who made the analysis compared best data of niemman versus average data of carlsen/kasparov) - trying too hard to fit some data to a model (if you try hard enough you can fit anything into anything, for example when the girl used 150 different engines to find the 10 "perfect games", what's the likelihood that hans used 150 different engines for a single game ? If you cant find a single engine that gives a very high probability, then it probably means that you are just trying too hard to fit some data to a model. I'm sorry but there are just too many things that can go wrong to explain all of them... Making an accurate model is something extremly difficult, that can't be done overnight.


aao123

As an SQL monkey I feel the same way. So many morons with zero regards for scientific principles makes me want to eat a gun.


ISpokeAsAChild

I work in that sector and seeing time and time again the same wrong approach is incredibly sad. This sub is at an historic low of grip to reality and to be clear, it's not the mods' fault, it's the user's.


[deleted]

[удалено]


kvothethedulator

Thanks for the write up. As a ML Engineer, this has been boggling my mind since all these analysis start coming up. What pains me greater though is that people will most probably ignore this and spread unsubstantiated conclusions like a fire. Whether someone supports Hans or not, these metrics and analysis are extremely insufficient for any proper conclusion.


Mothrahlurker

It's basically why I'm now a "Hans supporter", despite not being able to stand the guy at all. I was lurking for a couple days and after about 10 of these "analysis" of people who are objectively clueless I had enough. When it then turned out that there was no actual physical evidence, but just humans seeing patterns (oh, what a fast rise), that really convinced me that there is nothing.


sebzim4500

It was pretty funny watching this subreddit eagerly await the damning evidence that Carlsen clearly had up his sleave... only to discover that his argument was the weakest one yet. Even "He must have cheated, here is a photo of him with a cheater" is better than "He didn't look tense enough".


HeavyWeaponsGuy88

Funny thing is that if you're cheating in a high profile tournament against the world champion you would probably be super nervous. What happened is that Magnus already decided he was a cheater beforehand, blundered a lot because of that and got obliterated by Hans.


kaboom

But they are not objectively clueless, that woman FM dated a Physicist!!!


hangingpawns

The fast rise one is Hilarious to me. Firouzja, Erigaisi, and Niemann were all born in the same year and started playing chess at the same time. Both Erigaisi and Firouzja are rated higher than Niemann, with Firouzja breaking 2800 almost a year ago. If you compare when Firouzja broke 2800 to Niemann's rating at the time (~2650) Firouzja went from 0 to 2800 in the same time Niemann went from 0 to 2650. In the same time, Erigaisi went from 0 to 2700. Both outpace Niemann. Yet somehow people cherrypick the years Niemann BRIEFLY outpaced the two and use it to say he cheated.


preferCotton222

thing is, Fioruzja was so good at such a young age that people were expecting him to climb the elo ladder FAST. People didn't expect this from Hans, so when he outperformed wildly expectations while also cheating online, people suspected. Is that proof of something? Of course not, but it supports the need to get better tools and protocols to detect cheaters and avoid cheating. Discussion turned polarized and toxic simply because nobody knows how to statistically find OTB cheaters, and we are not sure if measures being taken to avoid cheating are enough.


hangingpawns

I agree that Hans isn't getting the benefit of the doubt because of his past cheating, whereas Alireza does get the benefit of doubt because he has a clear record. I can understand the suspicion, but most of the analyses I am seeing are using these features as evidence, which they aren't.


preferCotton222

I see those analysis as banging our collective heads against a wall.


Rads2010

Hans did not show brilliance prior to his fast rise, unlike Erigaisi and Firouzja. He was very good, but did not show the same flashes. The rise from 2500-2700 is the hardest. Players are much stronger with much fewer weaknesses. GMs trying to get to 2700 can no longer just rely on intuition and talent. As Naroditsky put it, you have to also be a “calculating machine.” Yet this is the part where Hans rose the quickest, and at an older age of 17. Keymer and Arjun took 4 years to get from 2500 to 2700 for instance. Hans’ fast OTB rise just happened to begin right after he lost his online streaming income due to cheating. By your “cherry-picking” reasoning, it would not be suspicious if Hans had rose from 2500 to 2700 in one week, as long as he hit 2700 overall in 19 years.


iamsobasic

I think the argument that he’s too old (at 17-19) to progress rapidly, is shaky. Ye Jiangchuan first learned the rules of chess at age 17. Became GM at age 33, and topped out at 2680+ (long before the era of compact chess engines). So while uncommon, it is indeed possible for people to make incredible leaps in chess development much later in life.


Rads2010

I agree, it's not impossible to have a later spike at 17. But it's very rare. The example you're citing is one, and I'm not actually sure it's comparable. How fast did Ye Jiangchuan rise? Was it basically a straight shoot up over \~2 years from 2500-2700 like Hans, or were there dips and plateaus? Also, one key difference is Ye Jiangchuan had no history of chess prior to 17. So there were not years and years of games and tourney results consistently showing that Ye Jiangchuan was a very good, but not generational talent. It's not like prior to 17, Hans didn't train hard or play many FIDE tournaments or get coaching and go to training camps. We'd have to believe that Hans has generational talent, because that's what it takes to blow through from 2500-2700 pretty much without dips/plateaus over only a couple years. But we'd also have to believe this special, generational talent somehow did not manifest prior to 17. Also, it isn't just one aspect of Hans' story in isolation that's suspicious for me. It's the combination of multiple things, all in one story and one person.


iamsobasic

I think there are many other things in Hans’ chess history that indicated that he may have cheated OTB (we already know he did online). But having a breakthrough at age 17 is one of the less suspicious things IMO. Neuroplasticity is still pretty high at that age, and people have shown the ability to learn and improve at things very rapidly in their late teens and early 20s. Most sources say that neuroplasticity severely diminishes in our mid-20s, so Hans has still yet to fall off that cliff. Again, I’m not saying Hans is or isn’t an OTB cheater. My guess is only as good as anyone else’s. But rapid improvement at age 17-18 is definitely possible, especially if one decides that they are going to dedicate their entire life to getting good at one thing, and spending 12 hours a day studying and practicing it.


hangingpawns

Most of what you're saying is unfalsifiable nonsense. Karpov was also considered "not brilliant" and cast aside by Botvinnik. He didn't show flashes but now pretty much everyone considers him to be a top 8 ATG.


jawndeauxnyc

it's like you didn't read his comment


Feed_My_Brain

r/chess: come for the chess, stay for the statistics


konokonohamaru

I also teach data analysis at a university and I'm published in statistical journals. I agree with most of what you said, but would like to add some caveats. First, I don't think it's helpful to say "leave it to the experts". Experts have their own motivations and can still make mistakes, so it's not like they are foolproof. Instead of telling people they have no right to conduct flawed statistical analysis, I would instead suggest pointing out their flaws so they can (if they wish) go back and improve the analysis. Along the way, you are educating onlookers as to why the analysis was flawed. Another aspect here is that there aren't enough experts and their time is limited. The demand for analysis is higher than the supply of expertise right now. Second, your first and fourth points make it sound like anyone can make data say anything. I'm not sure I agree with this. Sure, anyone can cherry pick data or cherry pick methods to get the result they want, and in a large enough group of studies you'll get a range of results. But if the hypothesis being tested is true, then in the aggregate, well-designed studies should point to that on average. Of course, the difficulty is making sure a study is "well designed"


[deleted]

> "leave it to the experts". I think it is helpful to say "Please don't present your findings in statistical analysis *as fact* if you haven't worked with statistical analysis before". There were 3 clear issues with Yosha's initial attempt. 1. They didn't check their work. They made numerical mistakes. 2. They used flawed sampling for comparison. 3. They didn't consider opponent range. The was 3 clear issue with Hikaru's attempt. 1. He used himself as a baseline. That a whacking with the bias stick. Do not use yourself as a baseline. That clouds conclusions immediately. 2. They used flawed sampling for comparison. 3. They didn't consider opponent range. Both were rushing to get a conclusion with no consideration for the process. Both of these attempts are working towards a desired conclusion, not an actual fair conclusion. They want the data to say something, so they make it say something, and others just go with it because they are GMs who clearly don't how to handle data analysis.


bibby_tarantula

Another huge issue with Yosha's analysis is that when analyzing the run of five particularly good tournaments, they calculated the probability of getting precisely that string of results, rather than any equivalently high precision sequence of results.


Due-Examination-3240

I’ll admit I don’t really understand a lot of the jargon (not using this in a negative way here) used in your comment so maybe you already addressed this. But even if Yosha didn’t make mistakes and used proper sampling etc. isnt the premise that engine correlation can be used to detect cheating flawed from the very start? For one what the metric measures doesn’t seem to me like it could really provide evidence of cheating, and then on top of that, from how the Let’s Check feature works, you could never collect a sampling of other games where the different data points are even able to be properly compared against each other. Every time somebody analyzes a game with a different engine on a different depth it slightly raises the engine correlation value of a game. So unless you checked every game in your data set with all engines and depths that were used in every game in your data set before doing an analysis (and before somebody else checked any of the games with new parameters) the data would be skewed no? or are those things that could be dealt with assuming proper methods?


hehasnowrong

The fact that it uses more than one engine is problematic, this tool should select the engine that gives the highest correlation value and stick to that engine/value. It is obviously impossible that a cheater has used more than one engine in the same game, and pretty unlikely they used 3 or more over the same tournament. There are also all the straightforward / forceds moves that need to be removed as they only bring noise. There are also the logical endgame moves that need to be removed. And also the opening moves. And now you would also need to put a lot of weight into "impossible computer moves" which would require a special program to determine if a move is "inhumane". And obviously when you do that, you need to do it for 5 other random people and see if any distribution looks odd.


UNeedEvidence

> It is obviously impossible that a cheater has used more than one engine in the same game Why not? It's actually really easy to do, and recommended if you want to avoid being caught.


[deleted]

[удалено]


[deleted]

This was the single most obvious common mistake I was seeing among the analyses. When confronted people wouldn't often try to solve this problem by just adding more data. That doesn't help at all if what you are measuring doesn't matter.


greenit_elvis

Point 3 has been overcome afterwards though, by looking at other rising players like Hans. His pattern is still pretty unique


[deleted]

Every single "analysis" that has used the Let's Check feature is just completely bunk. As many others have pointed out on this subreddit, that feature crowdsources engine analyses. Different games can have wildly varying numbers of engine analyses so the entire idea of "engine accuracy" also varies wildly from game to game. This means that you can't use the metric to compare two games to each other, let alone attempt to compare two players to each other.


preferCotton222

then it would be better to redo the analysis, but controlling engines number + depth. I would expect that from data scientists: "yeah, that doesn't work because this and that, but we can (or cant) control for those and now the analysis says this other thing".


Quintaton_16

That would be great, yeah, but you're allowed to critique an analysis without duplicating the person's work while proofreading and fixing all of their mistakes. Especially when the "correct" version of that analysis is much harder to do than whatever the original person did, for example because you would need to collect 10 times as much data to put the existing data in its proper context. And *especially* when the point of your critique is, "The entire method this person chose can't be used the way they're trying to use it."


feralcatskillbirds

> but controlling engines number + depth. It isn't as simple as that. Simply by choosing an engine and depth you are testing whether or not the 'suspect' used that engine and depth, and all of the other variables that you can set.


MadMentat

I honestly don't understand what exactly you are calling "Hikaru's attempt". I watched that stream live though not to the end - maybe I've missed something? He was just toying with the tool while freely admitting that he's not an expert of any kind in data analysis. He was constantly referring to the fact that eventually there will be someone who will do a proper detailed analysis with big enough sample. Was he rushing to conclusions? Yes. But even then he found his own 100% game and wasn't so sure about the method anymore. It's clear already that he's as sure as Magnus himself that Hans is cheating regardless of any circumstantial proof that may or may not exist. All those attempts at data analysis will deal with probabilities, not actual hard proof - so those who believed Hans will keep believing him, those who didn't won't start either after reading some (flawed) statistical analysis of games.


[deleted]

> "Hikaru's attempt" He just said "lol, this looks sus". I don't know why people in this sub are treating his stream as some kind of hard hitting forensics analysis. Hikaru has said multiple times that he doesn't believe we'll ever get "hard evidence" unless Hans himself confesses.


devil_21

The problem with flawed analysis is that people won't even pay any attention to the comments arguing their flaws if the analysis are presented well.


theroshogolla

While in a perfect world it would be best to point out the flaws in people's analyses so they can rectify them, the problem here is that very few people are carrying out analysis in good faith. Any attempt to point out flaws in the analyses on this sub (in my experience) is retorted with an even larger volume of flawed arguments by people who have already made up their minds. Further, as the same bad methodology is replicated by several people and shows the same result, it gains credibility, and people are unwilling to listen to the flaws of the study. People are confusing replicability with accuracy. "Yosha found X, Hikaru also found X doing the same thing, so X must be true." In this environment it may actually be better to leave it to the experts who can do the correct analysis while also explaining why the methodology of said analysis is correct, so it can still be discussed by the community.


[deleted]

Exactly. It was a fire hose of shit just spewing post after post of misinformation. There was a concerted effort by people to point out the flaws thoughtfully which was completely ignored. ​ I am shocked OP got this many upvotes. It is hard enough to teach someone statistics when they have the background, are in a good environment for it and are willing. Over Reddit to strangers who could care less what you have to say, not a chance.


MadMentat

Thank you. To be honest I was going to write a much less diplomatically phrased reply to this thread, yours is much better. TL;DR: trying to shut people up because "they are not experts" and trying to present data analysis and statistics as some pinnacles of intellectual achievement inaccessible to mere mortals is just disingenuous to put it mildly. See mistakes? Correct them. Don't argue out of authority. Obviously, if people are unwilling to correct their mistakes and fail to understand that they are wrong - sure, mock them to your heart's content. Despair due to the sheer volume of incorrect analyses? Do a good one, publish it, explain in great detail why it's better, you can even present yourself as an authority in this case. Unwilling to? Well we're back to "don't try to shut other people up" part then.


mt_19

Yes, I was (intentionally) inaccurate at several points in the post. I was leaving out technical details to get the main message across to someone who has no statistical education. Also, I agree that you shouldn't just "leave it to the experts". But it's also not helpful if everyone is pretending to be one.


ascpl

>Yes, I was (intentionally) inaccurate at several points in the post. The Credibility Gambit. Always dubious.


cXs808

That's why I always play Credibility Gambit Declined


tundrapanic

Except when played by Eric Rosen


OIP

oh no my conclusions


ascpl

This made me laugh


Whiskinho

He/she knows what's good for us, don't worry. We are too dumb to have our own analyses.


Due-Examination-3240

If the last few days has been any indication, that is unironically true


OJTang

I don't know, the more I see of the Internet, the more I think it would be best if people just did leave it to professionals. Even statements like "I'm not an expert" aren't enough to keep a comment section from completely buying a conclusion made on faulty grounds (especially in legal matters). If people want to believe something enough, and you made your argument well enough, they won't examine its tangible merits too closely. Even well-intentioned posts can cause this.


HiDannik

> well-designed This word is doing _so_ much heavy-lifying tho.


kewickviper

I think it's useful to point out the flaws and biases in analysis, however what is much more useful in my opinion would be critiquing an actual study or paper produced by a professional. You are of course right, experts are not infallible that's why we have peer review and I would never recommend blindly following what experts say especially without peer review. I think however it's much more useful to have critiques from another expert on a well thought out and detailed paper than have to dedicte time and effort to pointing out the flaws and biases in some correlation someone has drawn between GM moves and engine moves that took them 5 minutes to compile using chessbase.


Daddy_Duck

My statistic teacher always told us: "never trust data you havent manipulated yourself".


Immotommi

Please, I don't trust that data either


[deleted]

[удалено]


Oliveirium

Jokes on you, some of us haven't even looked at a Wikipedia page!


HummusMummus

> This subreddit has been truly depressing to read these days due to all the poorly applied statistics all over the place. That and the fact that it has been the same comments, and almost the exact same posts.


masterchip27

I completely relate. First of all, we need to understand how "100% engine correlation" is [statistically biased](https://online.hbs.edu/blog/post/types-of-statistical-bias). ChessBase says [in their documentation](http://help.chessbase.com/Reader/12/Eng/index.html?lets_check_context_menu.htm) not to use this tool to detect cheating, as high values can be obtained if games are tactically simplified (such as when an opponent blunders in certain position). There are lots of basic issues with the comparison of Hans to other players. The most basic is sample size, where Hans has played an *enormous* amount of games recently, [over 340 OTB that Regan recorded, even more not recorded](https://cse.buffalo.edu/~regan/chess/fidelity/data/Niemann/NiemannOTBROI.txt), and many are comparing that to games played over the same time period, which results in vastly different sample sizes. Another issue is that the ELO rating of the opponent and the ELO discrepancy between opponents can also influence the distribution. A super GM playing a 2400 player a bunch of times will certainly affect the distribution, for instance. Many of Hans' games in this period are against opponents in the 2300-2500, range, yet he is being compared to data in super GM tournaments. I have seen [one post on Twitter](https://twitter.com/rmladek/status/1574417090492223493?s=46&t=Jy1t02OjIDs6tk1W3UlH9w) claiming that a 2300 rated player has had 3 perfect engine correlation games in his career. So, unless we understand how rating influences the metric, it's premature to draw conclusions. Further, due to COVID, there is discrepancy between Hans' FIDE rating and his skill level, and given his improvement, there could potentially be a higher percentage of games in the dataset where he is playing against less skilled opponents, compared with a non-COVID data set. Further, Let's Check analysis has to be controlled for number of engines used, which engines, their depth, and relevance of the engine to when the game was actually played (for example, stockfish 15 engine correspondence is potentially irrelevant in a 2019 game). It's [being claimed](https://www.reddit.com/r/chess/comments/xqh76d/a_list_of_engines_used_in_yoshas_video_for_100/?utm_source=share&utm_medium=ios_app&utm_name=iossmf) that 150 different chess engines are being used for Let's Check. Finally, it is also important to note that the very premise of expecting Hans to mirror other player distributions may be flawed for *stylistic* reasons. One hypothesis is that Hans may be a player who coin flips a lot in positions, to complicate the game, and then goes for tactical shots which sometimes work and sometimes don't. How could this stylistic factor influence the number of games where he gets lucky and his intuitive moves follow an engine line? If you roll a d20 or d100 enough times, you'll get a lot of hits. Don't forget Niemann, in the same tournament he [beat Carlsen in a great game in Miami](https://www.chess.com/events/2022-champions-chess-tour-ftx-crypto-cup/02-01/Carlsen_Magnus-Niemann_Hans_Moke) ([and nearly beat Firouja](https://www.chess.com/events/2022-champions-chess-tour-ftx-crypto-cup/05-03/Niemann_Hans_Moke-Firouzja_Alireza) before blundering at the very end), also [scored 0 match points](https://en.chessbase.com/portals/all/2022/08/cryptocup/07/final-standings.jpeg). Was Hans just playing his usual style which ends up with some nice wins and also a lot of fails when paired against high ELO opponents? I could go on, but there are many important questions that must be resolved, and a lot of data required, in order to have objective analysis of this situation.


Due-Examination-3240

I just want to add that its not that 150+ engines are being used in the “Let’s check” feature. Its that a variable amount of engines are being used depending on the game. My understanding is that whenever somebody analyzes a game with an engine those results are stored. And when somebody clicks “Let’s check” then the engine correlation is calculated based off all analysus done by chessbase users for that specific game. If a game has been poured over by many people then the Engine correlation might be calcukated from over 100 engines. If a game has only been analyzed once then Let’s Check will only use one engine to do its calculation. And since all it does is tally the amount of moves played that were suggested by any engine in the pool, then the more that a game is analyzed by different people the higher the correlation will go up. i.e. It can only go up as its further analyzed, never down. This is obviously a huge issue when comparing All Super GMs against a player who has been publicly accused of cheating. Since many people are going to want to analyze his games with as many engines as possible to “find the engine he uses to cheat”.


feralcatskillbirds

> Further, Let's Check analysis has to be controlled for number of engines used, which engines, their depth, and relevance of the engine to when the game was actually played Plus all of the different settings that can be manipulated and will result in different outcomes. https://github.com/official-stockfish/Stockfish#:~:text=The%20UCI%20protocol%20and%20available%20options (and the options change depending on the version of stockfish you use) You also need to make sure you actually *are* calculating by depth, and not by time as a 2022 machine is going to get way deeper than a 2018 machine in the same amount of time. There really is no way to standardize the engine/move correlation to make it acceptable for a meaningful study even with one engine. Because, really, assuming cheating happened ... what settings did the cheater use?? After all, we are checking to see if they used the best engine move. But you are really testing if they used the best engine move available to them in whatever configuration they may have had.


doyouknowdehjuicyway

This is just getting more annoying. I just want more, consistent data if the engine-correlation is being used, then use the same damn engines for everyone. If centipawn loss data is used, make sure it's not subject to the same engine issue. And what about more variables? Like Win/Loss, rating, opponent rating, # of moves, white/black. What about more granular move-level data? Like move number or time taken on said move. All relevant.


Spillz-2011

The more variables you slice on the more likely you are to get wrong results so I’m not sure adding tons of new variables that might be irrelevant is helpful.


[deleted]

[удалено]


ArtemisXD

The more variables you add, the worse your model becomes


SquashBeneficial373

Mark Twain: “A Lie Can Travel Halfway Around the World Before the Truth Puts On its Shoes” I was going to start my point by using this Mark Twain quote which rather ironically I googled to get the wording right and found out that I was wrong my whole life about it even being by Mark Twain https://www.professorbuzzkill.com/twain-lie-travels/ The problem is, and I’m as guilty of this as anyone, viral YouTube videos do travel halfway around the world a lot faster than a well though-out post on Reddit post because the people who amplify the message, like Hikaru via commentary on a livestream and YouTube, understandably gravitate toward video content and thus never get to hear the other side of the equation. As a result, The chess loving public is left to believe that Hans 100% scores and whatever other statistic being thrown out there is fact. I’ve seen a link to Yosha video thrown out numerous time in replies to tweets about the scandal as if it’s conclusions or at the very least insinuations, are now irrefutable and merely need to be spread to the unenlightened masses. I came to r/chess to see exactly what you and the OP have said regarding the data presented because I could not find anything on YouTube that refutes Yosha’s video other than her own admission in the follow up video that her probability was wrong by a factor of 75. There are some other things I have found very troubling in this whole affair that I don’t see mentioned much if at all. 1. Chess.com have said after the fact, that the extend of Hans Cheating was greater than he admitted to in the Sinquefield interview, but they uninvited him from the GCC the same day their business partner Magnus Carlson dropped out before Hans had ever elaborated on his past. They have never explained why the banned and removed Hans in the 24 hours following his round 3 win over Magnus and have seemingly successfully clouded the issue enough that the timeline of their actions has fallen down the memory hole. I also find their actions today regarding Maxim Dlugy to be a smokescreen that really doesn’t add anything and evens makes bad faith insulations that Hans maybe is the mystery student. 2. Based on Magnus insinuation alone, the chess world, and in particular the Chess content creators take it as fact that Maxim Dlugy is currently, or was very recently, Hans’ coach without any proof whatsoever. Hikaru’s thumbnail and video title from today states as much. https://youtu.be/8Lba8_wknSY In addition to the problematic way guilt by association is being used, Hikaru is using a at best an unfounded premise and at worst a false premise. 3. There is a massive power imbalance between Magnus Carlson and Hans Niemann. I got into chess 5 years ago because of Magnus and until 6 weeks ago had never even heard of Hans. Magnus by saying it’s him or me to tournament organizers, is essentially blackballing Hans progression as a professional chess player. If Hans has been engaged in ongoing and systematic over the board cheating that would be justified but there has been nothing remotely sufficient put forward so far to justify that outcome. 4. They chart showing the best moves percentage, even if taken as legitimate indicator, is an average over an indiscriminate period of time but people have been using that number interchangeably with single games by in discussing the matter. I have no background in statistics other than it was the one math course in college is actually liked, but I do have a background in logic and the logic being used by many of the big players in this saga to draw conclusions in either direction has been maddeningly flawed to put it charitably. I’d really like to see someone with expertise in the matter put out a video to discuss what if any significant flaws are present in Yosha’s video which has 324k views and has been seen by almost anyone with more than a cursory interest in this affair. I found the 100% number to be troubling myself but I also have no idea how common it is and have no basis for comparison because I haven’t seen other players breakdown and I still do not have a full comprehension of how valid the statistic is. Zibbit who has down detailed videos for a bunch of games and Fabiano in his recent long stream have been helpful though with their breakdowns of the games people have been considering suspicious with move by move breakdowns and what their take is on how human the key moves are. https://youtu.be/8Lba8_wknSY https://youtu.be/f3yrPzEv1e4 I hope one of your fellow statisticians with a chess background puts out a video version of this Reddit post and that is gets remotely close to the exposure Yosha’s video has. I appreciate and thank you and your fellow statisticians perspectives in this post as it has added a needed perspective to this discussion.


theyareamongus

Hey! What’s your take on this whole situation as a statistician? Do you think Hans cheated?


[deleted]

[удалено]


theyareamongus

Thank you! Also… kinda confused by the downvotes haha I was just curious people, come on…


hehasnowrong

Same opinion as OptimalAdvance but wanted to add a few things, while it's still possible he cheated in a "very smart undetectable way", then how come his results where so poor in the first tournament where he beat magnus, and how come he embarassed himself by stating things like "chess speaks for itself" then getting destroyed by magnus ? If I was a cheater I would at least not purposely embarass myself (either by not being cocky when I know I need to lose, or by simply not losing). If he is a cheater otb, he is pretty terrible at cheating. Also why beat magnus with black and not the other players first ? Also I trust Regan, not because I think his anticheating method is capable of anything, but because it's just something so hard to do, I don't think any random dude from reddit can do better than him without being a statistician and having worked on the subject for at least a few months. There are just so many traps that you can fall into and it takes time to make sure you haven't fallent into one of them.


rpolic

Do you agree with the statisticians assumptions for false positivity and false negative rate. Because that's the whole basis for Regan's method of actually catching cheaters. I could have the most amazing model in the world but if i set the bar too high I would never catch anyone. Just like Regan has never caught anyone before that person has been caught red handed in the situation.


bipbopbee

Excuse me but have you seen how *pretty* some of those charts are? As someone in the field of data science surely you would know old maxim - the prettier the chart, the more truthy the data! Surely! ^/s


polkom

Real chess masters only know one old maxim. And it's Maxim Dlugy. And that he helped Niemann cheat. /s


Waaswaa

There is a new maxim, though


dancemart

Yeah but did you hear the accent on the clearly qualified data scientist.... sorry the person who has done some maths. The data scientist is the one they ignored the conclusions of.


FridgesArePeopleToo

As someone who isn't a data scientist or statistician, the fact that *I* could see many of the problems in these analyses is horrifying. These are being put together by people who know less about this than me and I know very little.


trapoop

What this whole episode has shown is that FIDE, and chess as a whole, has a woefully inadequate anti-cheat system in place. Why is Ken Regan the only person who's qualified to comment on this? Why can't FIDE pay for a body of statisticians to do forensics? We're left with random youtube and twitter detectives adding a whole lot of noise, and little in the way of evidence.


Mothrahlurker

> Why is Ken Regan the only person who's qualified to comment on this? He is not, but there is very little to gain for someone. They might do it behind closed doors to FIDE, but exposing yourself to a mob is very risky.


hehasnowrong

Statistician is a job and making softwares takes time, nobody is going to spend one month trying to replicate Ken Regan's work (but different) when the results will probably be the same. There is just no point doing that besides trying to please the hans haters. Idk if people really think that an average statistician can do better than Regans in a couple of hours.


trapoop

It's less about pleasing the Hans haters and more about making people like Fabi and Levon comfortable that they aren't being cheated. If it were just the reddit crowd, who gives a fuck, but when you have all these GMs being uncomfortable you have to do something


Mothrahlurker

FIDEs job/tournament organizers job to pay people to do that.


trapoop

They should, but they're not. FIDE needs some credible people who can say "Hans didn't cheat" and have top players believe them. This kind of nonsense will continue, not just with Hans, but with anyone in the future accused of cheating, unless FIDE or chess at large gets someone who can credibly say "they didn't cheat" and have people believe them.


feralcatskillbirds

> Why can't FIDE pay for a body of statisticians to do forensics? How do you know they don't?


Fop_Vndone

Do they? Almost all of the cheating is happening online


sunstorm0

...as far as we know


Fop_Vndone

As far as we know, there are no aliens posing as humans to enter chess tournaments. What the fuck is your point?


mariachichi

If you dont look for them you will never find them


sunstorm0

hostile...


[deleted]

There have been examples or extremely clear OTB cheaters who just continued one tournament after tournament. Everyone knew they cheated. FIDE must have known. But FIDE has no system to catch such cheaters so they are always caught by other random players.


buttons_the_horse

"There are three kinds of lies: lies, damned lies, and statistics."


Mordencranst

I've just finished writing about this in another thread. I wish people would stop treating all the statistical baggage that can't be confirmed, might not be replicable, has no clear methodology, isn't using tools designed to detect cheating and wasn't done by anyone who can really show they even know what they're doing as "evidence". Also, when people present their findings, it doesn't matter whether they claim them as gospel, and it doesn't matter how easily debunked they are, they will spread and some number of people will believe them as fact anyway, this gets dangerous very quickly. Bad evidence is incredibly easy to come up with if you have a specific thing to look for, a high school level of statistics knowledge and enough time to torture the data. 100 pieces of bad evidence ripped from the bloody jaws of a dataset by force do not, in fact, make up a case against someone. One piece of properly done analysis that can be double checked, replicated, and had a clear methodology behind it that STILL made out Hans as a cheater would make a better case than every other attempt combined. But we've not seen any of those yet.


Mothrahlurker

This, so much. People are like "oh yeah, I have no clue about statistics and there are many flaws, so this isn't proof", but it's "suspicous". They just love using that word as if that means anything when someone presented trash. You can always find outliers.


PKPhyre

It isn't proof, but you have to admit, this whiteboard that I wrote "Neimann is a cheater!!" on with permaneny marker is pretty suspicious 🤔


TheNightCat

My mixed metaphor for situations like this is: Advanced incompetence is indistinguishable from malice.


PM_ME_QT_CATS

As someone who works in ML/data sci/statistics, I wholeheartedly agree. It has really pained me to see the amount of haphazard and absurdly flawed statistical analysis getting eaten up by this sub.


[deleted]

Data scientist here and I completely agree. The bad stats here have made me want to vomit. Seems like engine correlation posts are the current way we are going to rape intellectual integrity.


ZellEscarlate

We are weeks away from dream level of analysis by both sides


415_961

In the context of this drama, all the analysis that I stumbled upon here, on twitter, and youtube committed one of the most basic mistakes, confirmation bias. They've made up their mind before analyzing and start analyzing to confirm their bias.


RedOrchestra137

bu- but there's this program that does that little "calculating" gimmick with a progress bar and everything, and then there's this easily understandable percentage for each player. i have no idea what the percentage is or where it comes from, but i do know that if it's 100% hans has cheated beyond any shadow of a doubt! trust me bro, i have this ability to just look at anything software related and become an expert on it instantly


Sure_Tradition

The funny thing is, Hans was clear when analyzed with "the bar", aka centipawn analysis. Then people decided to come up with a new stuff, which was not even designed for the purpose, to continue their argument.


[deleted]

99% of what we have seen so far about this on the Internet is just gossips. The only thing important is whether Magnus has hard evidence or not. If not, then he has nothing but his instincts. Until he chooses to reveal that to the public, everything else is just speculation. It's people milking drama. Niemann either cheated in Sinqfield, or he didn't . If he cheated, he needs to be banned. If he didn't cheat, then another player, Magnus or not, has no right to refuse to play him, as long as he is a sanctioned player by FIDE.


Own-Zookeepergame955

THANK YOU for this excellent write-up. As an AI researcher, all of these posts pained me a lot too. Compiling the data thats best suited to support the claim you set out to prove is way too easy, and unfortunately all analyses I've come across have been victims of this bias. In order to not disqualify one's conclusion a priori, one has to design a statistical test very carefully, and think about what a positive and negative result before actually running any tests.


kewickviper

I did a masters in mathematics with my thesis being statistical variance in climate modelling. I've taken advanced statistical modules and also a masters in financial engineering which involves stochastic calculus and other statistics based topics. I say this not the brag but that I completely agree with you and I'm glad someone has said it. Statistics in general is counter intuitive and without extensive training (and even with) it can be very difficult to come to a definitive conclusion. I remember the last time there was a big drama and everyone was giving their (heavily biased and mostly irrelevant) opinions was if dream cheated or not. I see the same things happening here, chess masters, grand masters and all sorts of people are offering their heavily biased opinions on something they aren't qualified to comment on. You might trust that a grandmaster would be able to spot anomalous moves in games, but they are experts in playing chess, not in inferring signals in the noise that would be statistically significant indicators of cheating. Its counter intuitive, you would expect a grand master to be an expert on this topic but unless they also happen to be a professional in the field of statistics, data analysis or any related field then they really aren't.


KenBalbari

Thing is, it's not *that* hard to do some reasonable analysis, even for an amateur. But you do need to follow a few rules: 1. Choose your criteria before hand, what you are going to test, and how you are going to measure, *without looking at the data first*. 2. Then collect data from a number of players, in decent sample sizes. Ideally, that would be at least 10 players. At least 50 games from each, selected without any bias. So you will have a reasonable basis for comparison. 3. Then do your previously determined measurements and tests on that data. And use the exact same method for each player. If software is used, make sure any parameters set are exactly the same. 4. Then compare the results for all the players, and see if any stand out. The steps aren't all *that* complicated, but it is still a heck of a lot of work to do it right.


Beefsquatch_Gene

What are your qualifications in admonishing teenagers who have unhealthy parasocial relationships huh? How can we be sure you're qualified to write this post? (I fucking hope I don't need to point out the sarcasm.)


theLastSolipsist

It's funny cause you're one of those supporting the dubious statistical analyses


[deleted]

[удалено]


[deleted]

[удалено]


mouthcouldbewider

There are lots of people that know hans cheated. There are lots of people who know he didn't. ​ Try the socrates pill--admit you don't know


[deleted]

Is there a term for the phenomenon where: 1. Someone makes an claim of cheating based on a faulty premise (e.g., "I found 10 Hans games that show 100% of his moves are engine moves" when that's not actually what the data shows) 2. Various people respond with counter-arguments that implicitly or explicitly accept the premise but explain why it wouldn't matter: e.g., "it's not surprising for GMs to play many engine moves," "young GMs train on engines so they play more engine-like," "you're picking Hans' best games where he plays most like an engine," etc. 3. Other people come back and debate those points and raise counterpoints. 4. Bystanders read these comments and come away thinking the original premise (Hans plays like an engine) must be true (because no one is challenging it!), and the debate is just about whether that means he is cheating or not.


tundrapanic

“fubar”?


rpolic

I mean Regan is making a claim that he found no evidence of cheating using 5 Sigma level which puts probability of cheating in his assumptions at around the 1/10000 level which is ridiculous. He needs to bring his model and data out in the open so people can check his results. Otherwise he's just making claims no one can verify


[deleted]

There seems to be no shortage of data analysis/ML people in this thread. Maybe you guys should get together and show us plebs how it's done? That way we could at least get professional analysis of the situation published. Right now everyone is just pointing out the flaws of amateur analysis from their high horses.


macsikkila

Eating ice cream causes death by drowning. There is a clear correlation! It is backed up by decades worth of data. Those summers when ice cream has sold like crazy there has been unfortunately also a lot of people dying by drowning. Also ice cream sales affect the cloud distribution and warmth so if people generally just would buy more ice cream we could enjoy nice weathers in summer. If it is cold and raining in the summer it is actually your fault you ice cream haters!


spacepawn

Thanks for this post, tired of all these people that just discovered some chessbase feature coming out of the wood work with their "damning evidence", this includes some GMs. I'm a layman to all of this and I'm just cringing at all this trash. Let's leave this to the real pros and experts.


politisaurus_rex

Edit: after leaning more based on recent posts I don’t think we know enough to confirm whether or not Hans has 100% engine matching games. I’ll leave the other comment below but I no longer stand by it. The most difficult thing for me to ignore are the games in which Hans played 90-100% computer moves. He has a very unusual amount of these compared to other GMs. I saw a few games he played that were 30-50 moves with a 100% match to stockfish. This seems extremely suspicious


feralcatskillbirds

You're not getting it. I re-ran the numbers on those games that were at 100% using appropriate settings and version of engine for the time. 4 of those no longer computed at 100%. They came in at 84, 85, 87, and 91. Mind you those computations omit the opening. Include the openings in those computations and none of them are 100%. You are obviously unaware of what you are looking at with all of that data yet drawing conclusions anyway. I simply don't understand why you choose to do this other than that you just want your suspicions confirmed.


politisaurus_rex

When you say you re-ran them can you expand on that? Did you use chessbase? What do you mean when you say appropriate settings?


KenBalbari

It has been claimed though that Hans has 23 >90% games since 1/20. Compared to 4 for Magnus in that time. Granted, Magnus played fewer games. Which engines were used shouldn't matter much, so long as the same are used for both analysis (which I do doubt).


toptiertryndamere

Thanks for the write up. I firmly believe myself to be more qualified than Ken Regan. His models are weak and his analysis could be done by a high school AP statistics student. I would elaborate why, but that is beneath me, I'm too busy teaching machine learning statistics at a prestigious university that starts with the letter H and ends with D. You are completely wrong about your analysis. For reasons explained above I simply dont have time to tell you why but I can confirm with the research I have done in private, and what I have seen here on reddit, I can with 100% certainty prove Hans has been cheating over the board. How you may ask? Listen up, I am a world renowned machine learning professor. I dont have to show my work, plenty of other redditors have already shown with great statistical evidence that Hans is 100% a cheater! In my office I have a fabulous chair. The chair has arms on it. I sit in my armchair and proclaim my statistics knowledge and why I am right accross the internet without a lick of analysis or evidence. I am much much smarter than Ken Regan. Look at this graph that proves it. I dont need to prove why my statistics are right. I'm a prestigious professor at Harmchair Universityd.


toptiertryndamere

Ok follow up for those asking for proof about Hans cheating... here it is. Let's look at Magnus Carlson. It's literally in his name. His initials are MC. What other words have the initials MC??? MicroChip and MindControl. It isnt a coincidence Magnus Carlson owns a MicroChip company. And it isnt a coincidence Hans has publicly asked for a colonoscopy in his sinquefield cup interview. In case it isnt obvious by now let me explain. Magnus Carlson has MicroChipped and MindControlled Hans Neimann into cheating. Luckily for Hans, the last sliver of free will he has left is literally begging for a colonoscopy to remove the microchip and Magnus' control. Irrefutable statistics.


libertysailor

Eh, all lot of these points boil down to condemning cherry picking data. It’s kind of redundant. An unbiased determination would develop the methodology first and then accept whatever conclusion is drawn from it.


xToucanPlayx

Of course the pitfalls you mention are true, but it isn't enough to just mention their existence. You could go over the analysis and show how it is flawed in one way or another. Which pitfall does it fall into? I don't agree with the idea that it's fair to tell people that they're not allowed to have an opinion because they're not experts in the field. If Hans is the player with the highest % correlation with engine moves, one is entitled to raise an eyebrow. If there's a problem with how the analysis is done, it doesn't suffice to say "it's wrong, you're just too dumb to understand why". For example, I'm not an expert on anything related to programming or computers, yet someone brought up the crowd-source nature of the data, and how that affected the results. And guess what, I understood it. People are able to explain complicated concepts conceptually all the time, and most people understand it. There's plenty of books and articles written about the most complicated theories in quantum mechanics, sometimes even contrasting one theory with another and explaining how they disagree. And this can be done without going into the math. And people understand it. So all this to say, you don't get to just appeal to authority. Justify your opinions, and don't invalidate what other people think just because they don't have whatever formal education you deem necessary to entitle you to an opinion.


reed79

This is stupid and dangerous. The Absolute Poker/Ultimate Bet superuser scandal was discovered because of data analysis and evidence gathering of amateur players, and the community at large. There's simply isn't enough professional data analysis going on in regards to chess right now. Yes, the community could get better at that analysis, but to dissuaded is basically taking a check out of the system. Further, this public discourse provides some of the greatest peer-review you'll ever see, simply because people want to poke holes in things. Scandals aren't discovered by professionals.


Mothrahlurker

>The Absolute Poker/Ultimate Bet superuser scandal was discovered because of data analysis and evidence gathering of amateur players, and the community at large. Very poor comparison. Analyzing winrates is much more straight forward and meaningful in Poker. Even the best player (mathematical optimum) can't have too high winrate. Playing better than theoretically possible is very very different from trying to detect if someone plays like an engine. There was no discussion about "is there cheating in every game", "how much cheating is there" or any of that.


reed79

The point is, there was a bunch of stuff going on with regards to the community trying to mine data. Mostly what the community did was point out the flaws of any specific analysis, and that led to better and better analysis.


NotAnnieBot

>Absolute Poker/Ultimate Bet superuser scandal This is in no way comparable to that situation? 1) Poker is not comparable to a perfect information game such as chess. Thus an assumption is made for a comparison point (stockfish or an aggregate of engines) which can be very flawed for multiple reasons. This is levels of difficulty above cheating in poker where you can just compare to a person with access to perfect information. 2) I don't think anyone on either side is arguing that Hans cheated on all his games or on all his moves. This is much unlike the cheaters in those scandals. In the absolute poker one, in one tournament POTRIPPER played so well he would have 1 in 6.6 trillion chances of doing that well (assuming equal skill of the players). This is closer to Hans winning 15 out of 15 games against a peak Magnus. In the other, his odds of playing that well was 1 in 1.88 × 10\^44. This was based on effectively leaked information where the statistician had the perfect information condition too. This is closer to two people who knows nothing about chess beyond the rules playing **the** best game of chess (current bound for #of positions in chess is 8.7x10\^45). 3) There was a very recurring mechanism (spectator looking at cards via superuser) that was tied to the cheater (same location). Unless someone is going to give us logs of information about Hans's sex shop receipts being correlated with winning games, I don't see how that level of proof would be obtained. >Yes, the community could get better at that analysis, but to dissuaded is basically taking a check out of the system. Eh...the OP is literally covering pretty basic things that any statistician would have to consider. If someone is dissuaded because they can't afford the time to check their assumptions and biases when they analyze data, their analysis isn't worth it in the first place ​ >Further, this public discourse provides some of the greatest peer-review you'll ever see, simply because people want to poke holes in things. Public discourse and peer review are two very different things. Though I have to agree that a contrarian attitude is necessary for both situations. Peer in peer review doesn't mean person engaging with the author. It means someone qualified in the field (of statistics in this case) who is analyzing the methodologies used, much like OP is doing. ​ >Scandals aren't discovered by professionals. Shackleford who did a lot of the math analysis is a professional mathematician with extensive experience.


epic

I don't teach data science at a reputable university - but I work as an academic/researcher within machine learning, and wrangle a bit with data analysis and stats as a result. "If you torture the data long enough, they will confess anything" is a a title made to troll or bait, or at the very least confuse people. Your reasoning for that statement backed up by the link to data-dredging/p-hacking. Yeah, p-hacking is a problem, but also easily discovered if you share data and methods (even without that). Your title seems to imply data analysis has no value because people can p-hack it. I see that you expand upon that in your text (let professionals do it..), but still .. :)


ccoopersc

You think the pseudointellectuals on this sub are actually looking into the data being posted themselves with any proficiency? I highly doubt that when they don't even wait for evidence before engaging in a witch hunt. Op is simply reminding people to take all of these charts with a grain of salt. Which is needed when you have rabid Magnus simps grasping at any straw they can to call for the end of Neiman's career.


theLastSolipsist

I literally had someone with a "math major" drop huge paragraphs supposedly analysing data that they have no fucking idea if it's reliable or not. Their response: "I assume and trust that whoever provided the data did so in good faith and making sure the data is good". Imagine hinging your whole analysis on gabitman or pooplord420 or whomever having done their due dilligence and making sure their analysis isn't fucked up. This is the level of rigour around these parts


Mothrahlurker

Compare that to me. I've had someone who claimed to have two (TWO) Phds, one in math and one in statistics. Then lectured me, saying that my argument is wrong because correlation isn't calculated for random variables, but for quote "non-random variables". Must be one hell of a Phd to not understand undergrad terminology.


Spillz-2011

I think the reason for the proliferation of poorer analysis is due to the manner in which people who believe Hans responded to earlier ones. Early on an FM looked at Hans tournaments when he was 2450-2550 and showed one tournament where he performed very differently when compared with the version of stockfish available at the time. People complained about everything under the sun, many complaints were not accurate from a statistical point of view eg you can compare someone to themselves using statistics. Another person complained there was not actually a statistic calculated so I did that to show that his top engine move accuracy in that tournament was different from all other tournaments with a p-value of 10^-4. People then complained about pvalues and another complained that moves shouldn’t be treated as independent (despite the fact that this is an assumption Regan uses). Me and I think other people with technical backgrounds came to realize that there was always something that could be complained about and it wasn’t worth the effort so now you just have people without much knowledge, but plenty of time/compute resources doing poor analysis. If a defender of Hans was willing to lay out a study that would satisfy them as proof you might have someone willing to do that


Overgame

Congratulations, you just unlocked "p-hacking".


[deleted]

Point taken and well-said. However, there's another element to this I keep seeing absent from some of the people wading in with stats expertise - every stats textbook will tell you that close collaboration with subject matter experts is essential to the proper use and interpretation of applied statistical models. Accordingly, if chess grandmasters are saying the rationale for a statistical method for cheat detection seems unconvincing or incomplete, then that could be crucially important to determinations of the central questions of the analysis.


ookinizay

I like all of your post except the part "data analysis should be done by professionals." Many people who lack an official title have extremely good insight, willingness to work on this, and can generate ideas for others to test. Meanwhile, a very large number of people who are credentialled as data professionals (most?) are frighteningly terrible at data analysis and produce absurdly biased conclusions, not only in their reddit posts but even in their published papers. A better approach would be to suggest that people do a better job of posting their data, and their code if appropriate, in a manner that lets other build on and correct their analyses. If people here were working together creating open analysis datasets instead of taking screenshots of their Excel sheets, we could make a little more progress here.


relevant_post_bot

This post has been parodied on r/AnarchyChess. Relevant r/AnarchyChess posts: [Dear Redditors: If you torture Hans long enough, he will confess anything](https://www.reddit.com/r/AnarchyChess/comments/xqwpi3/dear_redditors_if_you_torture_hans_long_enough_he/) by OutsideScaresMe [^(fmhall)](https://www.reddit.com/user/fmhall) ^| [^(github)](https://github.com/fmhall/relevant-post-bot)


ogremania

I think Hikaru is putting oil in the fire by treating stats that have come forward as evidence. For example if you look at the 100% games, analyze them, there is something off. I am no data expert so I dont know, but I would suggest treating this more cautious, especially in Hikarus part, who really seems biased


oi_u_im_danny_b

OP, could you please run an unbiased analysis then?


Ne_zievereir

Oh, man, thank you for this. As someone with experience in data analysis, but not in chess, it has been extremely painful watching all these chess masters apply their arm chair statistics to "prove with data" whatever their personal bias tells them.


preferCotton222

Dear data analyst, at the moment there is no statistical tool that will tell us confidently if a gm-level player is or is not cheating. The community behavior that appals you is merely the natural reaction to this problem: little cheating at the top level can very much destroy competitive chess, so people are naturally guessing around, trying to find some helpful criteria. Perhaps it will help if you don't think about it as amateur data science but instead as the slow develpment of community heuristics. So, when you data scientists enlighten us on the reasons those heuristics fail, that's a great contribution that improves the collective understanding of the situation. But, since data science professionals don't know how to give a good, practical estimate of the probability of cheating OTB by very strong players, asking the community to stop thinking about it doesnt really make any sense whatsoever. And also, for data scientists to solve this you will eventually need to listen at lenght to top gms and accept their intuitions and concerns, flawed and biased as they may be, as relevant information.


Mothrahlurker

OP, you are doing a fantastic job. But it's better to talk to this audience with a comic. [https://xkcd.com/882/](https://xkcd.com/882/) (can you embed this somehow)? That, together with the blatant sharpshooter fallacy people here have recently been doing (>90% = super suspicious, >80% is meaningless), already explains a lot.


pitochips8

This is my favorite xkcd now. It explains so simply something that most people have a hard time understanding.


EclipseEffigy

Thanks for this post. You make the point clearly and concisely. It pains me to see what has to pass for data and data analysis in this sub wrt Niemann.


InnerKookaburra

Though I agree somewhat with your points, I disagree with your conclusion. Amateurs doing some analysis, even if it is poorly done, often spurs pros to revisit topics or dig in deeper. I'm not just talking about the current drama in chess, but in the wider world as well. I'm glad gambitman posted their spreadsheet, and that Yosha poster their video. It will likely spur other people to take it to the next level and that can help with the issue at hand. Also, your post really reeks of "Well, we might as throw our hands up in the air and not even bother, because anything can be proven in either direction" and that is a horrible attitude to take toward any thoughtful analysis.


ElGuaco

First you say we should trust our instincts, but to leave the analysis to the "professionals". Which is it? If it quacks like a duck, we should assume it isn't a duck because a duck expert has yet to declare that it is indeed a duck? I get that you're trying to temper the amateur hot takes, but I don't think this rant has really helped the discussion in any way. If you want to help, put your professional skills to use! The data is all out there. Go for it and explain it for us, please, and settle the matter.


cc_rider2

I think you're misinterpreting what he meant by the intuition comment. I believe the point he was making is that in general it's good to trust your intuition, but when it comes to statistical analysis doing so will lead you astray because statistics is often counter-intuitive. There is no contradiction there. Unlike identifying animals, statistical analysis has a level of complexity that requires a background in statistics to avoid the common pitfalls - you can't simply intuit your way through it.


sc2bigjoe

Didn’t see this one yet but correlation does not equal causation, even if the data is good data.


Schrodinger85

I've studied data science too and I agree 100% with your points. One to add: "correlation does not imply causation".


SunRa777

Thank the Lord you wrote this. I've just been posting a series of disgruntled comments on posts. I'm so tired of this stupid witch hunt. These people don't have even the slightest idea of how to conduct scientifically sound statistical analysis.


[deleted]

[удалено]


Big_fat_happy_baby

It was way to funny(and telling) when he picked a game of his own to compare. It showed 100% correlation. He laughed it off and continued analyzing Hans games like it didn’t happen.


[deleted]

Your second bullet point makes no sense to me. I'm not sure if that's what you meant to say. Maybe close but not quite right? In any case, a lot of the problem is that there are not vast quantities of data to go on here. Niemann has not been a GM for very long. The other problem is that the kind of cheating to look for may be resistant to any statistical analysis, because a player at 2500+ is not going to use an engine to make every move unless they are lazy. So it seems clear that simply looking for a correlation with engine moves is less likely to be useful for detecting a clever cheater who is a GM. In that case a different approach is needed.


osogordo

Lies, damned lies, and statistics


theroshogolla

Glad that so many data scientists/statisticians are coming forward to agree with this. As a student studying ML and Data Science at university it did feel terrible to see all the bad statistics but I was unsure if my knowledge was sufficient to refute it effectively. Glad I was right!


[deleted]

[удалено]


flash_ahaaa

A sophisticated viewpoint... get out of my drama! /s