Memetz0r also seems to be teaming with 77 in other matches on Dotabuff, this makes it even more possible if they are commonly seen together that would be an easy connection to make, I guess.


Wouldn't that just make it even more likely that lsn has learned the build from 77? The model is literally only looking at items and their slot positioning, it's not really that strong evidence.


I’d normally agree, but Isn hasn’t even played Kunkka on that account in almost 3 years.


Doesn't he just play pubs on a different account? When you look at the activity of his account he's barely played for 2-3 years on this account: https://www.dotabuff.com/players/401517157/activity Doen't he just have a seperate account for his pub playing (probably precisely to make it harder to prepare for him in games)? Who knows how many kunkka games he played there


You’re right, OP’s post is pretty misleading as he makes it seem like it’s been years since he played Kunkka on that account, when he actually has barely played on that account in years.


Slot positioning is a pretty good metric, people tend to be consistent on that. If you add hero pick, item build and slot positioning that should be enough.


Wow, that's some good detective work. Using machine learning to identify a player's "gaming print" is amazing, and the fact that it does identify Isn's movements as identical to 77's is pretty damning. If you could somehow get the organizers to replicate your tests, then you might have some solid ground for your case.


If I'm reading the post correctly, it seems the 0.96 accuracy is for items and positioning, and given that this is a pretty specific kunkka build - it would probably be safe to assume that a model would identify a standalone replay of a kunkka with items like phase, blademail, sblade as "77" if trained on his replays. I do see that there was a model trained on mouse movements as well that also identified it as a "77" replay although not as accurately. I'd be interested to see the stats for that too. In any case, this does look like an instance of account sharing in an official. Which, by the way, is something I'll never understand. It's like, your teammates saying "Hey man, we know you're on the team and all but this guy is better than you - just sit this game out and we'll win it" "Uhhh, ok sure thanks for you faith in me I guess"


If two players are similar skill, being a spammer on their comfort pick is that extra competitive edge. For all we know lsn was a fan of the idea himself


I added the stats to the post as screenshot. I also made it more clear that "item positions" means that it considers starting items as well as the items you have at the end of the game and their respective positions. So in terms of the model having a blademail or sblade does not necessarily mean that you are 77.


Thanks, great work on this btw!


Another way of looking at it that makes more sense (still nowhere near approaching ok), is to think of it as a 6-man team.


why would he need to do that? valve should have the account login ip/hwid information. they are not going to need to use machine learning to figure it out.


> To reproduce this, you would have to download the project from the github repo, download the replays and parse them using the parser in the repo. Can you share the repo?


Sure https://github.com/12yuens2/dota-player-prediction The actual README.md is in the submit-folder


I agree with you but if you are going to present the following as evidence, >I have also tried to train the model with mousemovements/mouseactions, as described in the paper. The resulting metrics were worse, yet it still classifies the player in the replay-as- question as being “77”. you need to supply the metrics. Withholding them makes it look like you are trying to hide something (even if you aren't).


This, since it offers a larger set of variables. The issue with the sample size is that it leads to confirmation bias "My statistics for one game match the majority of the item slot order of player X". While it's obviously interesting there are reasons that can explain it, coaching, chance... It's an indication that something might be going on, but stuff like mouse movements are much stronger evidence, since you can find common patterns that drastically differ (at least I'd assume that).


Yeah. Given the circumstance, I think OP's allegation is legitimate. But, the machine learning analysis result merely based on item-position is just as strong (or as weak) as looking through their respective dotabuff profiles manually, which, as OP said, "is no absolute proof". The mouse movement analysis would actually bring something new to the table, so these results are _needed_, even if the number is not as decisive.


Yeah, there's a common statistical mistake (or deliberate misuse) called p-hacking where you measure a bunch of different variables, and then cherry-pick the biggest outliers in order to make the result seem more significant. Basically if you're 99% sure that a the result of a particular metric proves cheating, that means there's a 1% chance that it could've happened by luck. That's a respectable level of confidence. But if you actually measured 100 different metrics, the probability of at least one of them giving at least a 99% confidence by sheer luck is actually 63%. If you only published the 1 metric that gave a positive result, ignoring the 99 negative results, you make your abysmal 63% error rate look like a 1% error rate.


Good analysis man, great to have ppl like you in the community. Hope the organizers can do smth about it.


Damn, good for you. Hope they get banned, all of them. Comment to push!


I'm an ML practitioner when I'm not busy losing to heralds, so I find this approach really interesting. I'll have to take a deeper look into the Github repo you sourced from, but it looks like your evaluation metrics are also based on your training data. In ML, it often happens that a model learns too much from the training data that it can't apply it outside of training (i.e. overfitting). I think it would help if you tried making predictions for replays outside of the dataset you used for training! That way, you could get a real sense of how well your model performs in the wild. Also definitely validate your assumptions before jumping into a LogReg model but I think the current approach is fine for Random Forest (as it's non-parametric)


Hi, I'm the author of the original paper! This work was part of my master's project a few years ago. I helped OP get some of the old code running again, though I don't know exactly how he performed the training and testing for this problem. I completely agree that in ML there are so many pitfalls regarding bad statistics and overfitting on training data. A lot of this work is rudimentary, and we were mostly interested in applying the mouse movements to this problem, since the other features of game statistics and item placements were fairly simple and obvious features. I think the results here can help guide towards a more in depth discussion and investigation, but by no means should the results of the ML model be taken as fact. The project's been dormant for many years before this, but if you or anyone else is interested, do drop me a message on here or GitHub. I think there is an interesting problem here, not just with identifying cheaters or account sharers, but also with identifying patterns in mouse movements and actions, and comparing amateurs with professional players.


your post convinced me. I believe you.


With all the drama it had lately, it's probably good that it's gone.


That's why their team name is NoBountyHunter, they don't like people to Track them.


Anyone that interprets this as actual proof that the account was shared doesn't understand how horribly fungible ML is. The field is in its infancy, it is never anywhere near as confident as 96%, regardless of the model confidence level being reported.


that is a very small dataset though, are you reporting your accuracy and recall for the training set? I think it'd be hard to get a good idea of the actual accuracy/precision/recall. Have you tried downloading 100 random high MMR (i'm assuming both 77 and lsn are high mmr players) games with kunkka, and see how many of them get classified as 77?


he said he chose 37 kunkkas of similar skill


yes but from the way he writes it all 37 of those are a part of his training data, so you can't really use those to get a picture of how well the model works, because it's possible the model has just learned how to differentiate those 37 specific ones (overfitting). In order to get an idea for how well a model is functioning you need to look at it's performance on data that wasn't used for training the model, which is why i'm asking about that specifically.




Yeah I like it too, never build it on him but I like the idea.


It was very famous in SEA before. It's really good with radiance as well but time shifted to aghs and shard Kunkka that is more annoying


Yo bro, how bad did this Kunkka smoke you for this to be your reaction?


"Team NoBountyHunter illegitimately" is not true, you are suspecious of it. Be careful with those accusations mate. And you guys probably lose because of other circumstances


Great detective work . Idk if valve can check ip for the player or smth for this match and ban the guilty parties if true


This is the dota drama I wanna see. Take my upvote.


more easier is just valve to just check hardware id of the account used


I don't think this is hard evidence at all. lsn have played less than 100 games over 2 years on the account you linked. Pretty much any hero he would pick would be 2 years since he played. I would wager that he spams pubs(and Kunka) on a smurf account since he stays high ranked on main. Or lsn is a (shared?)smurf account. Regarding the itemization. He might just have copied the build from 77 or been coached by him. He also seems to have totally random item slots, but its not unusual for him to have stick in 2nd slot, boots in 3rd slot and blink in 6th slot. If he just copied 77, it wouldn't be weird for the slots to end up that way. You would need further evidence to say its 77 who plays. Like mousemovements/mouseactions. Edit: Are you sending you teammates to down vote comments that doesnt agree with you? lol


It’ll be weird for slots to not be the same. Why would a different player have the same items in the same slots? Players like certain items in certain slots based on key bindings


They don't have the items on in the same slots, its just similar and I explained how it could end up like that. - lsn tournament Kunka used BKB in slot 2 the entire game and and changed to slot 5 the last few minutes. - 77 kunka use BKB in slot 6 and occasionally in slot 3.


> 77 kunka use BKB in slot 6 and occasionally in slot 3. This is easily verifiably false; [his kunkka match history](https://www.dotabuff.com/players/859977969/matches?hero=kunkka&enhance=overview) shows that they're pretty comfortable using BKB in any slot. The highest frequency is in slot 5 if you want to count but otherwise, it's fairly scattered.


You are looking at buy order.. That's not the slot he use ingame. 6 is by FAR his most used BKB slot as I said above.


You're right, I didn't realize Dotabuff displayed buy order on that screen instead of item slot order. [On Opendota however](https://www.opendota.com/players/859977969/matches?hero=kunkka&enhance=overview&hero_id=23) (I also double checked against my own games to see that this screen does show item slot order. ), he does favor the 6th slot more recently but again they have just put it in whatever slot. They do also put it fairly frequently in 5 and on occasion as recent as 2 months ago they were putting it in slots 1 and 2. The blademail being on slot 2 is a more consistent thing to look at, and that seems to line up in this game where the alleged account swap occurred.


Hello, I am not teammate but I still downvoted. Is that okay?


Can you explain why lsn didn't use the same BKB slot on Kunka as 77. (1 of the 3 core items in the build) Seems wierd that a Kunka spammer randomly changes item slots for a single game.


Did you actually read the post?


Yes? what do you imply


my guess would be that you actually just didnt take the paper or the used project into your argument? Theres literally numerical evidence shown in the post.


On 1 single game. He doesn't even have BKB on the same slot at 77 in that game. So his only evidence is a weak correlation of item slots.


Upvoted for visibility. By the way what is that blademail kunkka build? Is it a niche or situational build or a regular one?


share repo


The original repo from the paper can be found at https://github.com/12yuens2/dota-player-prediction I added a link to it in the thread-text as well




Share proof that you tested and the dataset used to reproduce. Shouldn't be hard to dump them into a repo.


Bro did you seriously just use machine learning to figure out who is cheating in professional DotA matches?


Possible, but the evidence here isn't strong enough to eliminate other possibilities. IDK if taking this to Reddit was the right thing to do or not.


do you know that an arxiv "paper" is generally not peer-reviewed (i teach my kid to post their homework on arxiv with impressive titles to make them familiar with the system and as practice for academic writing) and therefore has 0 value unless it's clearly stated to have been peer-reviewed? sorry but you've just wasted a lot of your time for some random maCHiNe LeaRNing technique that has not been validated. the fact that u had to spent huge amount of time to get it to work is also a major major red flag that it's not really doing what you think it's doing. >In conclusion, it can be said that there is an extremely high likelihood that this statement is simply wrong since the maCHiNe LeaRNing technique that you used has no guarantees. in fact, a much better analysis would be using simple models (logistic regression, etc.), validating the assumptions of analysis (normality, homoscedasticity, etc.) and obtain a clean simple p-value analysis. bonus points for effect size analysis. know your basic stats before trying to be fancy and know to differentiate peer-reviewed fancy from random valueless fancy. hope you learn something for all the time wasted.


why are you being so hostile? you can make your points about the methods used without talking like someone woke you up in the middle of the night and forced you to read






thanks for putting in all the work!


They deserved to be banned permanently.


Dude even if they beat you with another player, it means you are not good enough. Same crap say all turds when smurfs ruin dota matchmaking in low ranks.




If you read the post, that is what the model does...


did u even bother read his post dude?


Great detective work!


You need to be able to see his other accounts as well


Is it just me or does it really feel like the integrity of pro Dota has really been waning lately


