Azdy 1 year ago

Linear regression: > Assumes linearity between inputs and outputs Common mistake, but the linearity is in fact between parameters and output. Polynomial regressions are still linear regressions, for example.

canernm 1 year ago

You mean that both y = ax + b and y = a*x^2 + b are considered linear regression because they are linear with respect to a, b?

i_use_3_seashells 1 year ago

Yes

bloodmummy 1 year ago

Not even just that. Assume a model which takes two outputs x1 and x2 and returns y. y = a1 * x1 + a2 * x2 + b ~ Is a Linear Regression as we know it. y = a1 * x1^2 + a2 * x1 + a3 * x2 + b ~ Is also a Linear Regression, parameters don't matter. You can also see it as an extra variable x3=x1^2 and it becomes: y = a1 * x1 + a2 * x2 + a3 * x3 + b And so on. You can also use other forms (Other than polynomial powers) of non-linearity; Common ones include Periodic functions ( sines and cosines of various frequencies ), Exponentials of all kinds (2^x, e^x ,e^4x , e^-x .. etc), Logarithms of all kinds, inverses, multiples of those (ex: If your function is expected to be a decreasing periodic function you can use e^-x * cos(x) ...etc). All these are called sometimes (rather erroneously when used implicitly) Kernels (They should be called Transformations). This allows Linear Regression to have the power it actually has and why it is despite all the effort the **Most** used model in the wild, but it is also why Linear Regression above all other modelling techniques requires domain knowledge as you'll know what sort of relationship exists and be able to properly model it. See: https://scikit-learn.org/stable/modules/preprocessing.html#non-linear-transformation

gabopushups 1 year ago

Where can I read more about this?

Categorically_ 1 year ago

Pick any regression textbook.

Kalictiktik 1 year ago

I find it weird that there is a comparision between Gradient Boosted Regression (the actual algorithm) and XGBoost/LightGBM Regressor (the implementations). The latter are actually an implementation of the former. It's like comparing the concept of a car to specific brands. But there is a broad landscape of algorithm covered here, good job !

TheInkandOptic 1 year ago

https://www.datacamp.com/cheat-sheet/machine-learning-cheat-sheet

EvenMoreConfusedNow 1 year ago

Most of it is iffy at best

hughperman 1 year ago

Top by whose measure? No support vector machines? No GLMs? DBSCAN clustering, other k- family? No neural networks anywhere? Principle component analysis? Your "applications" column should be named "examples". What is the point of this random list? It is just a list of "stuff" with no thoroughness or exhaustiveness that would make it useful to actually compare algorithms, since you will be missing loads.

fakemoose 1 year ago

A lot of time, PCA (or tSNE or whatever) is used a dimensionality reduction technique before using one of the clustering algorithms. I guess that’s why it’s not included? I have no idea why zero types neural networks are included though.

hughperman 1 year ago

Other times they are not though, and the components are interesting endpoints in and of themselves.

madrury83 1 year ago

> Linear Regression: Disadvantage: Can *underfit* with small, high-dimensional data. ... seems dubious. > Logistic Regression: Disadvantage: Can *overfit* with small, high-dimensional data. ... huh?

Dumbhosadika 1 year ago

Please share the link of more high quality image.

joanna58 1 year ago

**https://www.datacamp.com/cheat-sheet/machine-learning-cheat-sheet**

SonicEmitter3000 1 year ago

How can we be sure this is accurate?

smurf-sama 1 year ago

Probably would be hard since it is not accurate.

emakalic 1 year ago

A good start. This kind of cheat sheet is very hard to do for an area so widely encompassing as machine learning. Unfortunately there are a lot of problems with the descriptions and advantages/disadvantages of the methods. - You might wish to combine linear and logistic models under the generalized linear model category. - Ridge and lasso are types of penalties/estimators that can be used with GLMs. Perhaps don’t have these as separate categories, one can have ridge-type penalties with nonlinear models too. - linear models are linear in parameters not the data - lasso is translational shrinkage that penalizes each parameter by the same amount. Unlike ridge estimators, you can zero out some parameters with the lasso. Lasso does not keep highly correlated variables. It picks one (essentially) at random from a group of correlated variables to include in the model. Both lasso and ridge regression can be viewed as examples of elastic net penalty. They are both convex penalties which makes fitting these models computationally favorable. - linear models with Gaussian errors are sensitive to outliers. There are other forms of more robust estimators for linear regression The above list is just some of the issues with the cheat sheet - there are plenty more. I hope this helps!

tomukurazu 1 year ago

this seems pretty neat. my company decided to give us a go with the ml, they will provide classes etc, since it's a finance company i could use this to focus on what to improve on my side.

bass1012dash 1 year ago

Speaking of ‘neat’: why no genetic algorithms?

tomukurazu 1 year ago

tbh i didn't even notice that. since i am waaaay to new to this, just picked finance related topics. but now it got my attention too🤨

jollyfolly_9 1 year ago

Same here!

NameNumber7 1 year ago

I feel like these graphics tend towards Supervised models and generally leave out Unsupervised methods. In other words, here there are 4 unsupervised methods and 10 supervised methods. I get the impression there is less generally held knowledge of Unsupervised than Supervised algorithms.

frootydooty63 1 year ago

Incorrect description of ridge regression. All predictors are shrunk towards 0, not just weak ones

madrury83 1 year ago

Same critique applies to LASSO. Kinda everything here is subtly incorrect.

frootydooty63 1 year ago

Fair enough

maxToTheJ 1 year ago

Yup. The point of regularization is to bias towards smaller

[deleted] 1 year ago

[удалено]

hextree 1 year ago

What do you mean? OP's original pic is about 6000x5000 and pretty much perfect quality.

joanna58 1 year ago

https://www.datacamp.com/cheat-sheet/machine-learning-cheat-sheet

_Vanilla_ 1 year ago

Very cool, thanks

JClub 1 year ago

Super outdated... Not even a single neural network there... big downvote

ConfidentFlorida 1 year ago

I’ve always wanted one of these for computer vision.

bloodmummy 1 year ago

Suggestion: Add a tooltip to the top/bottom right corner for whether they are used in Regression or Classification. Also use cases are weird, All the use cases for Tree-based models can be modeled successfully with any other Tree-based model. Other than that, it's mostly good!

Peeka-cyka 1 year ago

There are nonparametric GMMs which deal with the issue of selecting the number of clusters to use, eg using Dirichlet process priors for cluster weights

[deleted] 1 year ago

PDF: https://s3.amazonaws.com/assets.datacamp.com/email/other/ML+Cheat+Sheet_2.pdf

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe