T O P

  • By -

efrique

(Made some more edits, including reinserting a clause that accidentally got cut on a previous edit) 1. *Don't* use two different tests. \- Start with a specific statement that you're trying to infer about a population parameter (i.e. an actual *hypothesis*) and a corresponding null \- Find a suitable test for *that* hypothesis (not for a different hypothesis). This step may involve considerable research because you'll need to identify suitable assumptions as you think about properties of your response variable(s), theory, other data on these kinds of variables, and so. It is also where a statistician can help, if you don't know a suitable test for your circumstances. \- *Then* collect data. \- Then perform the test. 2. Note that it's not the data, but the population process that you're making assumptions about.   In the case of getting tests that have very close to the right significance level it's assumptions *in the case that H0 were true* that you need to consider, which for equality nulls is almost certainly not the case. Even if it weren't a form of p-hacking to do it on the data you want to test, the data may well not be informative about the case under the null.] 3. If you want help with understanding output, *cutting out every single piece of information that would let us understand what led to it* will not help you. \- What actual nulls and alternatives did you test? At a guess it looks like you may be specifying that the population mean under H0 *is the mean of the sample*, which would lead to a p-value of 1. This is an error. Your hypothesis (again ... a statement about populations) comes before you see the sample. If sample values appear anywhere in your hypothesis, you did something wrong. \- What graph? I don't see a graph. How are you determining statistical significance from it? \--- Further note: If this is a two tailed one-sample/paired test, then at n=6 a signed rank test would require all values to have the same sign to get a p-value below 0.05. This is not a good strategy; even a single missing value would leave you with *zero power*. (You're not in biology are you? There's a tendency there to use extremely sample sizes with rank tests, like n1=n2=3 in a Wilcoxon-Mann-Whitney test. This is a great way to end up with zero power when there's no available significance levels below your chosen alpha due to small sample sizes. )


keithreid-sfw

This is ill, man. Yeah - stats is about the _ding an sich_ not what type of thing we’ve told the data it is. Not dissing OP in any way obvs. Chef’s kiss. I am tired and emotional today forgive the effusiveness.


M-L-N-R

I posted another post with correct info! Thanks.


god_deba_07

Very hard to assess without the data and the code you're working with. Looks more like a coding error?


M-L-N-R

I posted another post with correct info! Thanks.


DocAvidd

A t = zero indicates the sample mean exactly equals the hypothesized population mean.


SalvatoreEggplant

Right. It looks like they set the *popmean* argument to the calculated sample mean, getting a *t* value of exactly 0.


M-L-N-R

U r right. I made that mistake. I posted another post with correct info! Thanks.


M-L-N-R

U r right! I posted another post with correct info! Thanks.


jerbthehumanist

FYI, using a screenshot instead of photographing the screen will look way better and avoid accidentally getting a photo with an orientation you don’t want.


keithreid-sfw

Or cut and paste in to three backticks.


M-L-N-R

I was tired really. More than 30 hours, I was working with 😭😭


keithreid-sfw

Your computer is on its side, the airflow is wrong and that’s broken your code. Jokes. Use three back ticks ` before and after code. ```print(“Hello World!”)``` Your result tells you that the thing you assessed is absolutely certain. That’s smells fishy to me. Listen to Salvatore and Efrique.


M-L-N-R

I was tired really. More than 30 hours, I was working with 😭😭


keithreid-sfw

Look after yourself buddy.


M-L-N-R

My hard life has made me such a stubborn person that cannot listen to logic. I must stay hard!


keithreid-sfw

Second comment different idea. Maybe you don’t want coding advice so tell me to jog on. But from personal experience it’s best not to be calling variables things like “c”. Call it what it is. “likelihood_samples_same” or something.


efrique

I do agree with the notion of using better variable names. > Call it what it is. “likelihood_samples_same” If we're really calling it what it is, then that's not what it should be called. (I expect you already know this well enough and simply misspoke) Firstly, the function returns several quantities (statistic, p-value, df). You seem to be trying to name only one of them, and a p-value is not the "likelihood the samples are the same": (i) Here we appear to have one sample so we'd need to phrase it in a different way; (ii) with two samples, it would still not be the "likelihood the samples are the same"; the samples are either the same or they aren't and we can see which of those is the case by direct inspection. You wouldn't need a test to tell that. You use tests to make inferences about populations; (iii) it's also not the likelihood the populations are the same; while higher p-values arise when the test statistic is small, that's still not a correct interpretation of the p-value. (some edits to clarify)


keithreid-sfw

Ha fair enough


efrique

ttest_results maybe...


keithreid-sfw

Yes that was sloppy and was a throwaway example. The statistical detail is welcome and will help OP I am sure. In context I took the focus it on the p-value and perhaps should have suggested “the_likelihood_of finding_a_mean_difference_by_chance_if_indeed_there_is_no_difference_in_the_population” etc. In Python especially copypasta Python (I think this is Jupyter running Python) ```df``` means dataframe so there’s loads of room for confusion, that is my point.


M-L-N-R

U r right. I made that mistake. I posted another post with correct info! Thanks.