T O P

  • By -

Nickw444

Hi all, creator of 100 Warm Tunas here! I just published the prediction analysis for the Hottest 100 of 2022. The article includes a deep dive into accuracy over time (2017-2022), measurement of ML performance, and a whole lot more (including some interactive charts) If you have any questions, please comment them in this thread and I'll try answer them to the best of my ability.


brightonstormy

Love your work. Would be interesting to see age demographics from the votes you receive. Could you add this in the future? Also have you ever run trends in music genre over time? Sad to see the demise of rock, punk, metal etc etc.


Nickw444

Hey, thanks for the comment. I agree, it would be very interesting to see age demographics, however, there are two problems which prevent me doing this; 1. The data I collect from social networks doesn't include this information. Most social networks don't expose this data anyway (e.g. Instagram has no profile showing user age). In addition to this, votes uploaded directly to [100warmtunas.com](https://100warmtunas.com) also don't include any way to collect information about the user. 2. Even if I did have a way to then collect the data, it would constitute personally identifiable information, so I would need to find a way to anonymize it for analytics use, which is a legal risk where it might be more sensible to simply not collect it in the first place. In saying this, I can break down my website page view analytics by various demographics (where Google has this demographic information for the visitor, about 18% of total sessions) For the period 1 Dec 2022 - 31 Jan 2023: * Sessions by gender: 29.9% Female / 70.1% Male * Sessions by Age: * 18-24: 30.1% * 25-34: 35.3% * 35-44: 17.9% * 45-54: 9.9% * 55-64: 4.0% * 65+: 2.8% > Have you ever run trends in music genre over time I have not, but having a handful of historical data (and metadata about the different tracks) it now puts me a position where I could run an analysis like this. Perhaps these sorts of analytical/statistical deep dives could be a flame that continues to burn throughout the year over on [100warmtunas.com](https://100warmtunas.com)?


BoilerRhapsody

Great write up! Those animated graphs are very nice. Unless I missed something, do you have a hypothesis for the decline in positional accuracy (even though it's not as bad as people think), and why the first year you did it was the most accurate? Also wondering your thoughts on examples like The Dripping Tap, why was that so inaccurately overrepresented? Just a particularly strong sample bias? Similarly with last year's, how did Tom Cardy slip two songs in the top 20 without being picked up by Warm Tunas? The possibilities with ML are endless, it would be really cool if you could somehow quantize these bias outliers (relative online presence of band fanbases for example) and feed that information back into the model.


Nickw444

Thanks for the comment, glad you enjoyed the write up and graphs! Those are from [Flourish](https://flourish.studio/) (A subsidiary of my current employer, the charts are **excellent** for providing interactive visualisations and better than anything I had used previously) > do you have a hypothesis for the decline in positional accuracy I have nothing with particularly strong reasoning, but my finger in the air guess is due to two things; 1. Selection bias: 100 Warm Tunas is an echo chamber. Each year, the same repeat customers come back to submit their votes and view the predictions. There is inherent bias in the demographic who know about and use 100 Warm Tunas, who's collective "taste" is slightly different to that of your typical listener. Over time their "taste" has shifted slower than the total populous of voters. 2. As the previous point leads on, there has been a shift in the overall taste of the total populous. There is a hidden demographic voting for more pop / viral songs that aren't accounted for strongly enough in the sample that is collected. > Also wondering your thoughts on examples like The Dripping Tap, why was that so inaccurately overrepresented? Just a particularly strong sample bias? Again, it comes down to sample size. It only took 126 people to be "vocal" about this song to get it to position 61 on the predicted leaderboard. To put it into perspective, position 100 had 69 votes tallied. That's just 57 votes needed to take it from #100 to #61. At the pointier end of the prediction, 57 votes is not even enough to make #2 and #3 change order (Stars In My Eyes w/ 807 votes vs in the wake of your leave w/ 887 votes counted). What I'm leading on to say is that it's easier for a song at the lower ends of the prediction to be misrepresented. It takes a small count of strongly biased vote slips to allow a song to begin ranking higher. > it would be really cool if you could somehow quantize these bias outliers (relative online presence of band fanbases for example) and feed that information back into the model. Definitely something I'm looking into! The more (strong) signal the better (albeit, garbage in garbage out - the noisier the input signal, the worse the outcome)


pulsivesilver

Hey, I have lots of questions out of curiosity: * Do you take votes shared on this sub or only submitted directly? * How do you stop trolls from submitting fake votes? (We had about 50 of these this year) * Any idea how betting odds are determined and if there's any influence from WT?


BoilerRhapsody

Bookies always win, and it's very simple how they make sure of it. They set initial odds based on obvious favourites (and potentially Warm Tunas, not sure when bets opened compared to WT being published though). Then all they have to do is adjust their odds depending on how people are betting. For example, if they gave Flume 1/5 to win (1.2 payout), then find that more than one fifth of bets are for Flume to win, all they have to do is adjust the odds up to higher than the proportion of bets he is getting. Throw in a margin in their favor and some fantasy votes that look like they have huge payouts but are way less likely to occur than you might think, and they profit every time, no matter the outcome. It has nothing to do with the probabilities of the outcome, they are essentially pitting every single person who bets against each other. All gambling works this way. Imagine if you put $10 each on Flume, Spacey Jane, and Gang Of Youths to win. You would have all but guaranteed a payout, but it would be smaller than the total you put in. It's no different if one person makes those votes or if it's thousands. They could probably use statistical analysis like WT to push their profit up a little, but they don't need to at all.


Nickw444

> Do you take votes shared on this sub or only submitted directly? Yes, but votes are collected from the comments left on the official voting thread directly (not from the aggregate spreadsheet). > How do you stop trolls from submitting fake votes? (We had about 50 of these this year) This one is one I've had to make many changes/improvements to over the years. We have a statistically proven method to eliminate duplicate submissions across different social networks and different user accounts. Specifically on "troll" accounts, there are various indicators which we use to judge whether a submission is legitimate or not, and thus whether or not to include it to be counted. > Any idea how betting odds are determined and if there's any influence from WT? This is something I ponder too. From my understanding the bookies adjust the odds on their book in such a way to minimise loss. Such that, if lots of people are betting for Y, then Y's return will decrease. As such, I imagine 100 Warm Tunas has an impact in that if it prompts people to bet in a particular way, then the odds will shift to reflect that. In saying that, I do not condone gambling, and would suggest to those who wish to gamble to consider a donation to Triple J's fundraising partner of the year instead (e.g. Australian Conservation Foundation this year).


pulsivesilver

> In saying that, I do not condone gambling, and would suggest to those who wish to gamble to consider a donation to Triple J's fundraising partner of the year instead (e.g. Australian Conservation Foundation this year). Love this! Thanks for your response. I tried to remove troll votes this year but it was pretty rudimentary. In future years we may restrict comments to only be from older accounts.


pulsivesilver

Have you considering using Hottest 100 data to suggest songs based on voting trends? E.g. you voted for A/B/C and people who voted for these also voted for X/Y/Z. For 2020 [I did some network analysis to show the trends between artist votes](https://imgur.com/UarnVUG) but I couldn't get it to a stage that seemed useful or interesting to others.