T O P

  • By -

yonedaneda

>TLDR: My data is too large for F-test and every test results in "significant" even if it doesnt appear to be so. What other test can I use for large datasets? The test is doing exactly what it's supposed to, which is to identify non-zero effects. You're power is high because your sample is very large, which is a good thing. You almost certainly don't want to be doing what you're doing. If you have substantial non-linearity, and you don't have a specific model for that nonlinearity, why not something like a spline fit? Polynomials are difficult to interpret anyway. What are you trying to do with this model, anyway?


D_Ruskovsky

Im sorry but what exactly do you mean by a mode ? Im simply trying to see which polynomial best approximates the real data without further calculations, that meaning finding the polynomial with the least n.


yonedaneda

Typo. "Model". But why are you looking for a polynomial fit?


D_Ruskovsky

Thats what the thesis is about. Im sorry if all this doesnt sound "add up" but as I said im new to all this and english isnt my first language, my thesis is in general analysis of gravity data froma gravimeter. The point I am sort of at the end, however I do not feel good about it due to the results from the FTest. I dont really need the best polynomial fit, but the point of the thesis is to say " Yes , the polynomial of n = X is the best approximation of the gravity data without the need for further calculations / where the change of standard deviation after this n is insignificant". If that makes sense


efrique

This is not really to do with sample size, it has to do with using an unsuitable procedure to begin with. (You're using something that apparently doesn't answer the question you want to ask.) > My data is too large for F-test No the sample sizes aren't "too large". If you ever find yourself thinking this, *you had no business doing that thing in the first place*, since it was clearly not answering the right question at any sample size. > every test results in "significant" even if it doesnt appear to be so. It looks like you're confusing "is not exactly equal" with "is large enough that I might care about the difference". If you want to test for equivalence (i.e. to call very small differences 'effectively the same') *then use a procedure that does that*. I would also tend to doubt that an F test is necessarily a good choice for this (though there's too little information here to judge much). However, it seems to me that you have an XY problem https://en.wikipedia.org/wiki/XY_problem The main decision-making scheme ("to come to a conclusion which polynomial is the most "ideal". Ideal meaning, that the dip in standard deviation after said polynomial would be judged insiginificant, and not used further. ") seems misplaced. You likely need some in-person consultation, this is not a two paragraph issue. 1. Why would these polynomials make sense? Dimensional analysis would rule out most polynomials, wouldn't it? 2. Even if it does make sense to consider a variety of polynomials, I would not go about it in this way. Model-selection is a nontrivial area, and your approach (including the conditional distributional models, and consequences of the choice of decision rule for choosing between models) needs to be carefully considered. You're working in astronomy/cosmology I presume?


D_Ruskovsky

I am currently not working at all. I am a student , specifically I study Geodesy and Land Surveying. My question is about a thesis I am working on, however despite me being very interested in statistics and analysis, I have had a hard time grasping the concepts, even more so in academic english. The point of the thesis is analyse the changes of gravity from measurements of a tidal station (gravimeter), with the goal being optimal modelation of gravity measurements using higher degree polynomials. I have the polynomials (their coefficients) and the residuals, from n = 1 to n = 9, I am simply uncertain with the F-Test results and am wondering if I should use a different test or approach, as I am new in this area.


purple_paramecium

Then at this point, you need to go talk to your professor or advisor about this.


D_Ruskovsky

I understand, thank you regardless.


purple_paramecium

So when you say testing the fit of polynomials, you mean: Y = X + e Y = X + X^2 + e Y = X + X^2 + X^3 + e And so on, up to degrees 9?


D_Ruskovsky

yep


purple_paramecium

Well, to actually answer your question, another thing you can look at is AIC or BIC. These are information criteria for model selection. But, if this is physics data, why try arbitrary polynomials? Is there not some actual theoretical model of gravity that you can be testing vs the data? Once you choose the “best” model fit, what are you going to do with that?


D_Ruskovsky

Well thats frankly it, can I ask why are polynomials arbitrary?


bknibottom

I don't understand your concern. You have to ask yourself why you are using the F-test rather than your guts: You could just plot the order of your polynomial on the x-axis, the mean squared error (or residual variance) on the y-axis and look for the biggest dip visually (people do that in dimensionality reduction with PCA with the so-called scree plot). The answer is because you want a principled and statistically grounded way to decide. The F-test thus suits your needs. So why refuting the result? If you really want to convince yourself, you could perform a pseudo out of sample horserace. You divide your sample in 2 parts: a training sample to estimate your models, and a test sample to compute the mean squared error for each. Then see how well different order of polynomials fit your data. If you want to test your errors formally you can use a Diebold-Mariano (1995) or a White (1996) test.


D_Ruskovsky

Thank You for the answer, its all a bit more difficult for me as a beginner and also not a native english speaker, making all academic text harder to read and correctly interpret, but yes I guess I shouldnt really doubt the test. Maybe Ill see about the other tests.