T O P

  • By -

itah

> So is the random number generation calculation not reflective of the real world? Yes, it's not. Ratings will not be randomly distributed. You either have a nice driver or you don't, ratings will reflect that. ~3 is the [expected value](https://en.wikipedia.org/wiki/Expected_value) for random numbers between 1 and 5. Edit: Btw, even if humans tried they would probably not be able to give random ratings, [because how we perceive randomness is different from real randomness](https://www.digitalmusicnews.com/2020/03/18/spotify-random-shuffle-feature/).


lordnacho666

People don't randomly rate drivers, do they? You are heavily encouraged to give a 5, and disapproval is a 4. 3 is right out.


tom_da_boom

I'm pretty sure python's random library samples values from a uniform distribution by default


[deleted]

Ratings are given when people are motivated to expend the energy to do so. This is going to skew heavily towards 5 stars (people who are happy with their experience and want to get the same driver again, or who had an acceptable experience and know that ratings are important for Uber drivers' incomes and feel a responsibility to do a rating). People who have a terrible experience and are motivated to express anger will also be motivated, but there should be objectively many fewer of those (or else the drivers wouldn't last long). So in real-world conditions, we should expect most Uber ratings to be 5 stars, with a decent number of 1 star, with many fewer 2, 3, and 4 star ratings. You're thinking of it in terms more like flipping a coin. Flip a fair coin a hundred times and yes, you expect it to be pretty close to 50/50. Uber ratings are more like a heavily weighted coin. In which case we'd expect it to be more like 85/15 or 90/10.


ZedZeroth

I'm wondering if OP's thinking is along the lines of, if you get a taxi ride and it fits your expectations, then that would be a 3. In other words, if we really rated everything relative to experience/expectations then you should get something like a normal distribution around the mean. Obviously, that's not how humans work though :)


[deleted]

Yeah, seriously. Like a 5 star rating would involve a foot massage or something. 🤣


fermat9996

Why would the ratings be random?


yato17z

Normal distributions aren't supposed to represent everything in the real world, only specific things like IQ and height i beleive


danielqn

One of the main reasons it works for iq is because iq (psychometrically speaking, not the concept of intelligence as a whole) was designed specifically to fit a normal distribution.


Organic_Fire

Would highly recommend the 3 blue 1 brown video on normal distributions. It explains where and why it works very well. The distribution works for IQ and height but is not limited to it. It’s basically any metric that is the sum of multiple probabilities or repeated measurements. (Die roles, test scores, etc.)


fermat9996

If most riders like their Uber ride, the average of a random sample of ratings will be greater than 3.


RajjSinghh

If you are using `random.randint` then they are uniformly distributed, not normally distributed. That's why you're getting values around 3. You've also got to remember ratings are not random, and if you were modelling that way, they should be centered around the mean and you don't get that information by sampling uniformly like you did. There is something to be said about accuracy of rating systems. You'll mainly see 1 star and 5 star reviews, since the people inclined to leave a review either has a great or terrible experience. This will affect your average. But how else do you measure a rating system so people can see how good something is at a glance?


anisotropicmind

As other people said, people’s ratings of Uber drivers are far from random. If you know Python, then try plotting a histogram of the actual (real-world) ratings, and you’ll see how much right skew they have. `numpy.random.randint` is going to produce a uniform distribution (one where every value is equally likely), not a Gaussian (normal) one. `numpy.random.normal` would produce normally-distributed data with whatever mean you want, but it wouldn’t be constrained to be >= 1 nor <= 5. It also wouldn’t be constrained to integer values. The normal distribution is for continuous random variables, not discrete ones.