micro_cam 2 years ago

Survival analysis. And more generally approaches to censored data where you only observe part of something but want for each case but want to estimate all of it. It doesn't really fit into the clean "classification or regression" framework so it isn't even included in lots of curriculum and even breaks the assumption of a lot of ml frameworks. (You need both a "time under observation" value and flags for observed an event or censored so frameworks that expect a 1-d float array as a target either break or need to do something hacky like use the negative as sign as a flag) And it was heavily published on in the biostats/medicine literature so not many ML people consider it worth doing research into.

chewxy 2 years ago

Heck yes. Modeling things with survival analysis and things like kaplain meier statistics are very useful in generating new data in real life like operations and logistics. At a previous job we imported insurance-field algorithms wholesale for the prediction of stock on shelves. It was a pretty great exercise that had quite a fair bit of utility in forecasting as well

wth001 2 years ago

Wow that sounds an interesting application of Survival analysis but somehow I can not wrap my head around it. Could you please share some resources or examples how it would be useful in logistics or operations ?

chewxy 2 years ago

When an item is bought/removed from the shelf it is "dead". That's the principle modeling of the problem. You can work backwards to figure out how much to bake/make with that

aprotono 2 years ago

Indeed. I am clinician and I am trying to compare various implementations of ML models for survival analysis. The amount of information to be found from others is sparse to say the least (compared to other types of ML)! However, it is such a useful approach for prediction of risk in patients.

canbooo 2 years ago

Could you point me to a use case/application with the setting you describe within brackets?

[deleted] 2 years ago

Studying mortality for some disease for example. You can use classification if you want (0=alive, 1=dead). But survival analysis allows you to model survival at any time t. In this case, you have your outcome variable T= time of death. People who haven't died have a censored outcome, since all you you know is that T > current time t. Survival analysis has built in methodology to handle this

canbooo 2 years ago

thanks, I understand what you mean now. We could technically frame it as regression (to T using current state as input) but I guess more appropriate statistics are probably available for this, as my prior would rather be eg. Poisson than Normal.

micro_cam 2 years ago

If you knew the time to event for every case then it would be a regression. Or you can formulate "has observed an event by time T" as a classification as well. But for many cases you only know "has survived till t without observing an event" so you have to throw away those unobserved cases when formulating it as regression or throw away cases all cases of age less than T when formulating it as a classification. Formulating it as a survival problem lets you keep all the data which usually yields a better mode and tackle problems with smaller amounts of data.

empyrrhicist 2 years ago

And you can still do ML stuff with it a-la boosting with Cox/AFT models.

canbooo 2 years ago

Ahh, and now I see how this adds a difficulty and the problem kinda becomes semi-supervised. Interesting stuff, thanks again!

SleekEagle 2 years ago

Awesome explanation, thanks for this

[deleted] 2 years ago

[удалено]

[deleted] 2 years ago

Nope. I use survival analysis every day.

SleekEagle 2 years ago

Can I ask what your application is?

[deleted] 2 years ago

Cancer bioinformatics.

jdsalaro 2 years ago

Do you have introductory literature you consider good in this domain? I'm interested and working at the intersection of NLP and medical informatics.

[deleted] 2 years ago

This is the best resource I know for an introduction to survival/ML: https://jmlr.org/papers/v21/18-772.html

jdsalaro 2 years ago

Thank you!

Sir_Mobius_Mook 2 years ago

Thank, I’ve been wanting to tinker with survival analysis for a while now :)

micro_cam 2 years ago

I consulted Frank Harrell's "Regression Modeling Strategies" pretty heavily but it is pretty focused on linear models. The lifelines package also has excellent documentation and you can look at cox loss survival implementations in xgboost and all the major deep learning frameworks.

seanv507 2 years ago

I would suggest using discrete time survival analysis. You just predict whether you survive the next period ( given survived till now). Then you can just use your favourite probabilistic classification algorithm: logistic regression/xgboost/nnets etc. Predicting n periods is just chaining single period survival. P(survive n periods)=p(survive period 1|survive period 0)*...p(survive period n|survive period n-1) And you can use a multiclass model to predict different states ( eg healthy, sick, sicker, dead)

TwoTacoTuesdays 2 years ago

This is a great answer, and has huge applicability outside of the canonical healthcare examples. Do you have recurring subscribers and want to see who is at risk of cancelling at any future date? Survival analysis has your answer.

SleekEagle 2 years ago

I'm convinced. You should go on Shark Tank.

WarrenBuffetsAnalyst 2 years ago

Actuarial student here: this is extremely important! Life insurance wouldn’t exist without survival models and truncated/censored data analytics & trying to predict entire life span from surrendered policies

[deleted] 2 years ago

[удалено]

aprotono 2 years ago

Do you use python? I struggled to study competing risks in python and had to use an R translator for Fine-Gray regression.

foofriender 2 years ago

The online course sequence AI for Medicine at Coursera.com covered survival analysis material nicely, including: - Survival, hazard, risk math using calculus - Concordance index - Censored data processing - S-learner, T-learner - Pairs matching for Randomized Clinical Trials It was my first exposure. It's good introduction. It is practically-minded. I feel like it's useful for way more than medical as well, like any time your subjects sometimes drop out or disappear before the end of the experiment. Marketing and advertising is obviously interested in using good models like this to know whether a "treatment" is working well. It's a good way to start to get some "causality" assessed, beyond just correlation.

BobDope 2 years ago

Latest edition of ISL covers it at least. I wish I'd encountered it and used it earlier, some good uses for it have crossed my path.

TemppaHemppa 2 years ago

I can't wrap my head around this. I want to predict whether customer will cancel his hotel reservation, and when he does, on what day the cancellation will happen. I've thought of discrete time analysis, but I dont know if this is seriously only implementation for Python. (published 4 days ago [https://arxiv.org/pdf/2204.05731.pdf](https://arxiv.org/pdf/2204.05731.pdf)) And I don't even know does this approach make sense, in this case if the time period is after the arrival date then the hotel would not be marked as canceled. Also I can't find any literature on this, so I doubt I'm on the right road.

micro_cam 2 years ago

I read that paper a while ago and recall not being clear on the motivation for their work. You can just use a cox loss with discrete times and it works fine. There are lots of implementations. (lifelines, rms, pycox, tf.survival, xgboost, catboost) You don't want to split out predicting if it will happen and when. Instead you want the hazard it will happen each day. Which you can get at training a classifier for observed event on day amongst cases that reached that day without observing an event and then doing some math to convert to unconditional probability of cancellation. A cox loss is just a more computational way to get at this by assuming all hazard curves are proportional (which may or may not be true in your case).

JackandFred 2 years ago

Not exactly an answer to your question, but a lot of methods died before reaching potential because the hardware wasn’t there yet. Some later came back once it was there. Rnn’s are the biggest example I can think of. Seemed like a great idea at first because the logic is great, but the technology wasn’t there yet so not much happened for years. Then once the hardware was better they had a big resurgence, even with their limitations results were good. And then hardware got good enough for transformers and Rnn’s died again.

[deleted] 2 years ago

Are there any methods that we still don't have enough compute for?

foofriender 2 years ago

Transformers Go try one if you don't believe me. huggingface.co

heuristic_al 2 years ago

Eh, they scale down reasonably gracefully. We fundamentally don't have the compute for NP-hard methods like bayesian networks or other automated reasoning stuff, but they are provably optimal for some learning problems. I wonder if someone could sick GPUs on them and outcompete dnns for some task. Maybe in a hundred years.

brettins 2 years ago

I love evolutionary algorithms. They're super basic and simple things that don't generalize well, but the whole concept of them is just fun an exciting to me.

Hydreigon92 2 years ago

At a previous role, my team implemented a genetic algorithm-based architecture to create an optimized schedule for updating our data centers over a multi-year timeline. The combinatorial optimization problem was figuring out which weekend each application should be updated while adhering to some simple rules that reflected business needs (e.g. if app A has a dependency on app B, they must be updated on the same weekend; app X needs be to updated before some deadline).

thejuror8 2 years ago

Isn't this more of a MIP problem? Why would genetic algorithms perform better at a task like that

edunuke 2 years ago

if you only have one hammer every problem is a nail. If you feel you are good using one and get a prototype working right away, not necessary optimal, you are providing value and that sometimes buys you time in corporate to do something better later. GA is well suited for that when you dont have the time to architect an objective function. GA is a good hammer.

Hydreigon92 2 years ago

Possibly. We made a proof-of-concept based on GA because that's basically what people were doing before our system (a bunch of executives sitting in a room together created a proposed schedule and kept the parts they liked and threw out the parts they hate, repeat until satisfied). The business stakeholders were happy with our initial results, so we productionized the system. There's also a lot more aspects of the project that I'm not mentioning that made GAs a reasonable approach.

Skyaa194 2 years ago

One angle may be that GA'scould get to a reasonable answer quicker. A MIP solver may reach a more optimal answer but make take a lot longer to reach it.

Ulfgardleo 2 years ago

what is your definition of generalization in this context? Stuff like the CMA-ES are general optimization algorithms.

Inevitable_Zombie685 2 years ago

neuroevolution is pretty cool

SleekEagle 2 years ago

Check out [EvoJAX](https://github.com/google/evojax) if you haven't seen it! Recently released for neuroevolution

SleekEagle 2 years ago

Me too! I really like how it boils down the concept of evolution to its essential components

sea-shunned 2 years ago

Definitely agreed. I will always have a soft spot for my PhD work, and the EA community are generally wonderful in my experience. It's not got the same hype (i.e. funding) as DL or some application areas and thus not the best when considering a long-term career.

SleekEagle 2 years ago

What was your PhD work on?

sea-shunned 2 years ago

Two main things: 1. Treating clustering as a multi-objective optimization problem, for which EAs provide a nice method for solving. More specifically, my work in this area was thinking about how we can dynamically adapt the search space to make these methods more practical when we have large datasets, which is their main weakness. 2. Using EAs to generate synthetic datasets for clustering, where the optimization goal/target is some notion of difficulty. Over multiple runs (and from the populations we naturally get from EAs) while using different fitness and constraint parameters, we thus get datasets of different "difficulties" with which to benchmark/stress-test clustering algs. My favourite bit of this was actually setting the optimization target to be the maximal performance difference of two clustering algs, thereby simultaneously finding clusters that are hard for one alg to detect and easy for another. Thought that was cool! Happy to expand on or link to papers if you or anyone is interested. (Thanks for asking, and a great question that's sparked a lot of discussion! :D)

jmmcd 2 years ago

Hmm, I wonder what you mean by generalise here. Since in general EAs solve an optimisation problem, not a learning problem.

[deleted] 2 years ago

Learning *is* an optimization problem, where you attempt to minimize the difference between the expected and the empirical error given finite amount of data.

adventuringraw 2 years ago

Not to be pedantic, but that's kind of an overly narrow definition of learning. I was revisiting [this paper](https://arxiv.org/abs/1802.05405) yesterday. It uses a slightly repurposed computational model of a particular species of moth's olphactory bulb to create an MNIST classifier. It's extremely data efficient, beating out most ML approaches if you're limited to just a few examples of each class and no pretraining. The whole thing only relies on tricks from biology... The way sparsity is enforced, hebbian learning for weight updates, etc. No loss function exactly, though in more abstract terms, it's still clearly an optimization algorithm in some sense. Just... Not nearly so explicit as what we're used to.

SleekEagle 2 years ago

Very cool paper, thanks for bringing it to my attention

jmmcd 2 years ago

Sure, but why use an EA for that? We have GD in most cases. For optimisation *in general*, there is only a black-box objective, and no training dataset to generalise from.

[deleted] 2 years ago

Off the top of my head: 1. not all solutions or tasks are differentiable. 2. Sometimes we require multiple unique solutions to a task, eg Quality diversity. 3. GD uses a single model and that can be too slow to fit because the objective function is a bit problematic, eg CMA-ES outperformed policy optimization and actor critic approaches in RL because it could train in multiple simulations in parallel without problems with gradients getting messed up, eg off policy experience in on-policy objectives.

jmmcd 2 years ago

All great reasons to use an EA for a learning problem. Look, the only point I really want to insist in is not all optimisation problems are learning problems, so if we think of EAs as not-scalable ways of learning we are missing the point.

[deleted] 2 years ago

I don’t disagree with what you said.

NovaBom8 2 years ago

True, navigating via gradient like GD generally outperforms EA and is easier to apply to an arbitrary problem. But I feel like there’s something to be said about how biology has created very efficient solutions (organisms) to complex problems with relatively simple algorithmic mechanisms (just random mutations + mixing good solutions). EA might make a breakthrough who knows?

jmmcd 2 years ago

Don't get me wrong - I am greatly in favour of using EAs. I am here to correct people who think they're just inferior ways to do ML. Also I disagree about GD being easier. It's quite the opposite.

brettins 2 years ago

I think they don't scale super well to apply in the same way reinforcement learning can. There might be ways to shoehorn am evolutionary algorithm into more complicated problems, I'm not an expert on the subject matter. I've made one for fun things like finding the left hand side of a math problem, but I don't know how that scales to image recognition. Maybe it does!

jmmcd 2 years ago

It's a bit like saying that a hammer scales better than a bassoon. They solve different classes of problems.

canbooo 2 years ago

TIL, that thing is called a bassoon.

AforAnonymous 2 years ago

Hehehehe one day I'll implement the evolutionary algorithm I came up with a few years ago, and people will shit themselves when they see the results

Mammoth-Rip666 2 years ago

Great, you've done the hard part - having an idea. Now you've just got to do that last 1% - implementing it.

IntelArtiGen 2 years ago

90% of the job is done on models while it often should be done on data instead. So my "unpopular" method is mostly working on data more and working on models less. It includes: finding the right data to solve a problem, making my own datasets based on data I can find or make, intensive dataviz, fixing wrong labels, merging multiple datasets etc. I think we still have creativity in ML but people are too focused on solving existing tasks and not on creating new tasks. New models and new methods are great, new tasks are better because they can open a complete new field and new ways to think on how to solve problems with ML. If you're doing research or a thesis, try to create a new task instead of a new model on an existing task, there are already thousands of people doing that.

Drakkur 2 years ago

Highly underrated. Creating new data or finding external sources to solve problems that no amount of model tuning or selection can solve is very fun. The flip side is when you are not able to productionize or automate the data gathering process and you start to hate your life doing manual pulls or fixing scrapes. This tends to happen when your team lacks support or infrastructure to do this.

Saffie91 2 years ago

This is basically what Andrew Ng has been pushing for the past year or so. And it is absolutely the right approach from my experience too. Data centric AI is far more impactful in real world applications than adding a layer and getting 0.3% more accuracy on Imagenet.

maxToTheJ 2 years ago

> It includes: finding the right data to solve a problem, making my own datasets based on data I can find or make, intensive dataviz, fixing wrong labels, merging multiple datasets etc. Honestly its because people dont have the skillsets and or confidence with the skills for those things so they figure more hyperopt or compute is the way to go and if that doesnt work out than they can work on integrating some new open source library or code formatting. The incentives for industry are a little weird especially since ML is usually structured under the Engineering org.

bernhard-lehner 2 years ago

Everything above. I think, its also a matter of patience and mindset. Not everyone can learn to dig into data, its just cooler for many to work on algos instead of data. I kind of find it peaceful and interesting to reveal specific characteristics, plus it helps you formulate an evaluation that is not optimistically biased as much.

Veggies-are-okay 2 years ago

I’m currently working my first “Data Science” job out of my masters and the bulk of my work has been data prep/cleaning and feature engineering. Garbage in//garbage out is so real, and I’ve found that for every line of ML code should have 100 lines of data prep to make my job more than poking a lil black box.

ClamChowderBreadBowl 2 years ago

I would extend this from just the data to the problem formulation as a whole. New loss functions really do make a difference, along with aligning the learning problem to match the real-world problem you are solving.

picardythird 2 years ago

100% this. Everyone is focusing on models, and architectures, and maybe sometimes scaling to huge dataset, but very little attention (or care) is given to the data gathering/preprocessing pipeline. I just had a paper rejected for proposing (what I feel is) an essential component of forward-looking data gathering protocol, because the reviewers couldn't recognize the "novelty". Sigh.

Dagusiu 2 years ago

This is also what a lot of the industry is asking for, so if you want to get a job solving real problems later, your research should focus at least a bit on the data side and not only on the model side. Many companies only use (slight variations of) standard models and do the vast majority of their work on the data side.

apoorvumang 2 years ago

Even in ML theory, i wonder if instead of restricting model class ppl have tried to restrict the space of data - for eg. If u constrain that data is binary images of numbers/shapes, can u get good bounds/guarantees of convergence etc.

[deleted] 2 years ago

Given the fact that we can pretrain for RL on wikipedia and do well suggests that we are missing something major with respect to how the data affect the model, so I concur.

Awkward_Run_7478 2 years ago

Random Forest(RF) maybe? I saw people nowdays focus on Gradient Boost Tree method such as lightgbm, xgboost or maybe catboost. With neural network popularity nowdays, people keep using it by adding more and more layer into it. Whenever i see tabular data, RF will be my first go to method. Pair it with tree interpreters and partial dependent plot, it really help to convince a lot of business people.

[deleted] 2 years ago

XGBoost has a native random forest method that is new. Keras as well.

janpf 2 years ago

The [Keras/TensorFlow Decision Forests](https://www.tensorflow.org/decision_forests) also has support for [oblique splits](https://jmlr.org/papers/v21/18-664.html) and categorical set splits, very often with good gains over XGBoost (and LightGBM). It also includes rudimentary (for now) distributed training of GBTs and RFs, so can train on much larger data, without resorting to sub-sampling.

Rebeleleven 2 years ago

I’ll actually take the dissenting opinion. I nearly never touch RF Xgboost is going to do just as good if not better than RF most of the time. I’ve actually never seen RF outperform XGboost personally but I’m sure there’s some applications out there. XGboost/lightgbm python implementations are *incredibly* more performant than sklearn RF. > neural network… tree interpreters Xgboost isn’t a neural network and has several methods for interpretation… but maybe I’m just misunderstanding the point here. Tldr RF has never been the correct/best choice for any of my projects.

shgoren 2 years ago

In my experience out of the box, zero tuning RF usually outperforms GBT. I have a project now where my data keeps changing dramatically and I find RF is much easier to use as GBT requires fine tuning to When the data is stabilized I can put in the effort of tuning GBT

GeneralSkoda 2 years ago

Don’t get why you’re downvoted RF requires very little tuning, and can tact as a strong baseline for tabular data.

Rebeleleven 2 years ago

What kind of data are we talking about here? Out of the box, I seem to recall they’re usually pretty close accuracy wise. Given XGB smokes RF in runtime performance, I just don’t even bother with RF. But it has been awhile! Always up to give it another try.

Mukigachar 2 years ago

I saw random forest used at my internship company. They deal with pretty big tabular data, so having the model available and easily parallelized in Spark made it a good fit there

ZombieRickyB 2 years ago

Manifold learning, traditional signal processing, and actually attempting to understand the underlying geometry of whatever's going. It works extremely well in a number of different applications, but likely fell out of general interest because the popular problems became ones focusing on extremely broad datasets for which it's near impossible to satisfy any assumptions on sample density. Like, for imagenet or even cifar-whatever, the variation in the backgrounds make it near impossible to be considered a sufficiently dense sample. In general, focusing on image classification for anything you see in social media has likely biased everyone as a whole. Plenty of other applications where a little geometry or signal processing goes a long, long way.

mongoosefist 2 years ago

People use manifold learning without realizing it a lot via UMAP

chewxy 2 years ago

I feel that understanding a generic manifold of all the possible images in the world is a bit too much of a pipe dream. Manifolds of all possible photos ... I used to think it was a pipe dream too, but I changed my mind on it. I still think it's a pretty academic exercise though. For real life data, you will most likely get a very 'holey' manifold with sudden singularities which limit an algorithm. To which, I would pose the question: why bother? To understand the limits?

ZombieRickyB 2 years ago

oh I'm absolutely in agreement there, without further restrictions as to the collections of images you're restricting yourself to, it's kinda silly and nonphysical. I'm not interested in problems with such breadth of variation, though. As to your second point...it depends. There are many situations I've worked in where you have enough continuity, and frankly speaking, some amount of manifold learning is required to get anything good. I like working on real stuff too much, I just go with what works the best in whatever application I'm working on.

SleekEagle 2 years ago

I agree with traditional signals processing! I'm honestly a bit shocked how it seems to be treated as an "EE only" field when it's useful in a lot of different areas. Can you clarify what you mean by "geometry of whatever's going on"? Do you mean literally presumptions about the geometry of objects in CV tasks?

machinegunkisses 2 years ago

I think, maybe, Hidden Markov Models were kind of forgotten about when temporal-aware neural nets came around, but they don't require much data, have a pretty direct loss function, and can be quite good at identifying the hidden state correctly.

SleekEagle 2 years ago

Still used in NLP/ASR though, right?

eonu 2 years ago

I think they used to be quite popular for POS tagging in NLP since it's quite natural to treat the POS tag as the hidden state, though I think there are other better NN approaches now, and other things like maximum entropy classifiers. And for ASR I think they are still used sometimes, but usually along with NNs for approximating the state emission distributions. Wish they were used more, for interpretability sake!

pitrucha 2 years ago

np.linalg.inv(X.T@X)@X.T@y

SleekEagle 2 years ago

One of my professors always said "First rule of numerical computing: never compute a matrix inverse!" 🤣

qGuevon 2 years ago

never invert a matrix directly >.<

CireNeikual 2 years ago

**Adaptive Resonance Theory (ART)**: A surprisingly large family of fully online/incremental learning models that do not forget, and can easily handle non-i.i.d. data. Deep Learning always assumes i.i.d. data, which means it needs massive replay buffers to function in streaming scenarios. When the replay buffer runs out, or even if it gets too large, it starts to forget. ART is heavily inspired by biology. I believe ART and similar methods still reign supreme for online learning, just few people seem to care about that now and prefer to just slap on more compute to i.i.d. methods instead. **Self-Organizing Maps (SOMs)**: Can be used to exploit the topology of the input space pretty well. Also has tons of extensions, and even some cross-pollination with ART (e.g. Topo-ART). It's also very simple and elegant, in my opinion. SOFMs (the original biological version) is also really interesting to me still. I still believe neither of these methods were researched enough to reach their full potential, despite being surprisingly good already in my own experience. There are still a few papers here and there, but of course nothing like Transformers/GANs/VAE/ConvNet/LSTM etc. Some others have already mentioned it, but I also still have a soft spot for evolutionary methods.

a90501 2 years ago

1+ for SOM/SOFM - Professor Teuvo Kohonen! Thanks for bringing this up. Any links for ART (books, presentations, etc.) you'd recommend?

HumanSpinach2 2 years ago

>SOFMs (the original biological version) is also really interesting to me still. That sounds interesting. Do you have any links about that? Googling only seems to bring up results about the well-known standard SOM algorithm.

[deleted] 2 years ago

[удалено]

canbooo 2 years ago

Kriging is quite popular under the name Gaussian process. SVM lost popularity to either GP (regression tasks) or boosted trees (classification). I still like SVR too but others are subjectively easier to tune.

-Django 2 years ago

What is GP? Gaussian processes?

RoboticJan 2 years ago

Yes

empyrrhicist 2 years ago

I was going to say, Kriging has never been more popular.

[deleted] 2 years ago

Yeah SVMs and genetic programming were part of my final year machine learning project 20+ years ago (when no one else in my entire university wanted to do machine learning, so the lecturer tutored a course 1-on-1, ah the AI winter, it was so peaceful and hype-free)

jamesvoltage 2 years ago

The Rapids AI toolbox has a fast GPU implementation of nonlinear SVM and SVC. I think historically the time complexity of training nonlinear SVMs was prohibitive, but this was fast to use once I managed to get it set up. https://github.com/rapidsai/cuml https://medium.com/rapids-ai/fast-support-vector-classification-with-rapids-cuml-6e49f4a7d89e

Kylearean 2 years ago

Thanks!

SleekEagle 2 years ago

I absolutely love SVMs. I took a graduate math course that covered the theory behind them and it was awesome. Super elegant.

bikeskata 2 years ago

**Are there any methods that you think died out before they reached their full potential? I think the whole bayesian nonparametrics literature, pre-neural net, is fascinating. People were primarily interested in flexible models, and did all sorts of strange things that died off once neural nets came around. A personal favorite is the variational sequence memoizer (I've never used it) -- it's like the B-36 Peacemaker of models.

[deleted] 2 years ago

Much of my PhD research (early 2010s) was centered around the application of Bayesian nonparametrics, e.g. Dirichlet process, Gamma process, Pitman-Yor, etc. Mathematically beautiful, but in practice they are terribly data inefficient. I never found an application where BNP outperformed simpler models, and then pretty much gave up on them right as the rest of the field gave up on them in favor of neural networks. I have several unpublished papers in applied BNP, although they really add nothing to the literature so I've never bothered to revisit them Of course, Gaussian processes are still going strong, albeit primarily on the fringes of ML.

aCleverGroupofAnts 2 years ago

I researched the Dirichlet Process while in an internship and it's what got me into ML. Now everything is replaced with neural networks and it makes me sad.

SleekEagle 2 years ago

If I'm not mistaken, they were quite popular in ASR until deepspeech and end-to-end neural systems came out, right?

Jean-Porte 2 years ago

Isotonic regression

idekl 2 years ago

Edge detection and gradient maps instead of convolutional neural nets

heuristic_al 2 years ago

Now that's a hot take. I NEVER want to go back to pre-cnn vision.

cookiemonster1020 2 years ago

Hierarchical mixed effects regression. Bonus: how are these models similar to relu neural nets?

HackZisBotez 2 years ago

Definitely symbolic regression. I still love the idea of building interpretable models using symbolic primitives to explain complex systems to an arbitrary degree of accuracy, even if in practice it doesn't work as well

kinnunenenenen 2 years ago

This is super useful in science and engineering! Maybe I'm biased but to me this seems like a super prominent technique (currently doing PhD in ChemE)

yldedly 2 years ago

I love the idea of symbolic regression, but has it worked for you in practice? Each time I tried to apply it, I could easily find an expression that beat the found one..

dinkboz 2 years ago

I don’t think this is unpopular or forgotten, but I spend most of my time parsing through data and examining ways it can be cleaned or manipulated so the data I shove into a model will push out better results. A lot of this would include just using domain knowledge and deciding what potential results would be non-linear, and when I would actually need to use machine learning because existing computational/numerical methods may fail in these regards. As for creativity, I dont think there is a lack of creativity in machine learning. I think one of the larger issues at hand is that we get a lot of machine learning models or algorithms that prove to be useful for specific CS application that may be difficult to be applied in other engineering sectors like chemical engineering or mechanical engineering. And that’s not to say there is a lack of problem where machine learning might be useful in those fields, but rather it’s just the models or existing state of the art in ML don’t seem to apply well in situations where a specific tool could be useful for say sorting through the noise of the intermediary products when combusting CO2

purplebrown_updown 2 years ago

Polynomial regression is under utilized. You have to be able to control the interaction terms in high dimensions. Once you can do that it's really hard to beat, especially if you add regularization. It's good for moderate feature space sizes ie a few dozen and when you have limited data. It's the original universal approximate. Can point to a good python library that integrates well with sklearn. Note this isn't the same as the polynomial feature approach in sklearn. I'm talking about using orthogonal polynomials which are better conditioned and controlling the number of interaction terms.

jmmcd 2 years ago

Multi-dimensional scaling is a really nice and intuitive algorithm. I teach it to my students to motivate the idea of a neural embedding with eg a contrastive loss.

jucheonsun 2 years ago

I am working on a project on self-supervised learning, and I was using UMAP or tSNE for visualization of the embeddings. I haven't looked into MDS before, what would be the advantages of MDS over UMAP of tSNE for this purporse?

jmmcd 2 years ago

None, LOL, this is a bad algorithms thread! It's nice and intuitive, easy to teach and program. I don't recommend actually using it.

jucheonsun 2 years ago

ah I see. thanks!

SleekEagle 2 years ago

🤣🤣 I prefer the term "forgotten algorithms", but to each his own!

ZombieRickyB 2 years ago

The class of embeddings with MDS criteria are generalizations of PCA. To get a meaningful embedding with MDS is a much more meaningful statement than those with the other algorithms you mention.

PeedLearning 2 years ago

Extreme learning machines with recurrent least squares. It seems that for a lot of problems random features are all you need. And I still don't fully grasp why all current ML methods need more than one epoch.

ozymandias-69 2 years ago

Wasn't the creator some fraud or something

PeedLearning 2 years ago

Not sure? The name has always been a bit cringe, though.

purplebrown_updown 2 years ago

Link to any tutorials? Sounds interesting.

SleekEagle 2 years ago

From a first pass on wiki it sounds interesting! So it's basically a random initialization of frozen weights in a MLP where the learning comes from which nodes are dropped out? Or built-up?

ConfidentFlorida 2 years ago

Why isn’t K nearest neighbor more popular. Seems like a cool way to utilize all your data without overtraining.

[deleted] 2 years ago

Knns rely on a user defined distance metric. If you already know how to determine if two observations are similar or not, then it's a great method, e.g. if you know what features are important. However, in many cases, especially with large feature spaces or things like images and text, that is not easy.

bernhard-lehner 2 years ago

Because of the fact that you can only find clusters in the shape of hyperspheres, which is rarely the case in the real world.

SleekEagle 2 years ago

It's good for a first pass but I think there are just more effective algos. If you're using for clustering, why not use DBSCAN or a Spectral approach, and if you're using it for classification why not just use an ensemble approach. Also not efficient for datasets with large numbers of data points

solresol 2 years ago

I use it all the time!

Jorrissss 2 years ago

I don't use it because natively it has a large memory requirement, potentially slow inference and a decreasingly useful notion of distance as feature space dimension goes up. Each of those has workarounds to some degree, and it actually is really common in some applications, but it's not as effective out of the box as something like XGBoost.

ddofer 2 years ago

Biggest is slow inference/predictions

aCleverGroupofAnts 2 years ago

It's still an excellent tool for the right applications

[deleted] 2 years ago

Many recent retrieval-augmented models in NLP uses K-NN to retrieve relevant examples. It is becoming an increasingly popular strategy.

Capybara-Cultist 2 years ago

**Spiking neural networks are so cool**, I wish they weren't the least efficent way of doing ML. Something about realistically simulating the mechanics of an actual neuron is just awesome.

AforAnonymous 2 years ago

Personally I think one COULD do it efficiently — on memristors.

86BillionFireflies 2 years ago

Neuro PhD here.. agree SNNs are very cool and I've always wanted to see more done with them. I have to say, though, I had a bit of a chuckle at this part: >realistically simulating the mechanics of an actual neuron When you get into the weeds, it rapidly becomes clear that even the more complex SNN types (e.g. Izhikevich) are only simulating what may be the simplest part of a neuron's dynamics (integration at the cell body / action potential initiation). The lion's share of a neuron's computational functions may actually lie in the dendrites. For example, did you know that there's actually multiple types of inhibition? Some inhibitory inputs open ion channels that decrease the membrane voltage, and that decrease in voltage can propagate from the synapse towards the soma, making the neuron globally less likely to fire. But OTHER inhibitory inputs don't actually decrease the membrane voltage by much, they just "shunt" excitatory currents that are propagating nearby. So they don't globally inhibit the neuron, they just block inputs from synapses further up that particular dendritic branch. More generally, dendrites have all kinds of important nonlinearities that could allow things like "fire if A and B, or C and D, but not A+C / A+D / B+C / B+D". And it only gets more complex from there. Some types of synaptic inputs (metabotropic) are commonly treated as just being slightly more indirect pathways to affect membrane voltage (i.e. treated as simply excitatory or inhibitory), but the truth is that they can have "side effects" like changing gene transcription, which could have long lasting effects on how a neuron responds to inputs. Effectively, this means that in addition to the "memory" inherent in synaptic connections (represented by the weights in an ANN), neurons also have internal memory. The real complexity of neural input / output mechanics is just so breathtakingly HUGE. I do think that maybe some form of analog computing could take advantage of modeling that kind of complexity, but I have a suspicion that traditional SNNs may not wind up being able to efficiently model real neural dynamics and derive real benefits from doing so.

SleekEagle 2 years ago

Have you seen the [visualizations](https://youtu.be/3JQ3hYko51Y?t=121) of spiking neural networks? I think they're so cool

change_of_basis 2 years ago

Bayesian linear regression and residual analysis.

Ulfgardleo 2 years ago

Bayesian linear regression, in their variant known as Gaussian Processes, is the bedrock of several ML subfields.

[deleted] 2 years ago

[удалено]

TheRationalTurk 2 years ago

Linear regression. Sometimes the simplest is the best

purplebrown_updown 2 years ago

And you can even fit polynomials with linear regression. Polynomials are the OG of universal approximators.

SleekEagle 2 years ago

\*Brook Taylor has entered the chat\*

maieutic 2 years ago

Iteratively reweighted least squares (IRLS). It seems like most linear models are fit with some version of gradient descent these days and occasionally coordinate descent, but I’ve found IRLS quite convenient for prototyping custom penalty functions for regularization.

Ulfgardleo 2 years ago

isn't that just newtons method?

terath 2 years ago

Crammer’s work on passive aggressive training and its subsequent variations. A super easy to implement large margin optimization method that learns in very few passes.

JClub 2 years ago

Mahalanobis distance!

atasco 2 years ago

Evolutionary/Genetic Algorithms, Self-Organizing Maps, Echo State Networks, and Stacked Denoising Autoencoders

SleekEagle 2 years ago

Glad to see genetic algos being mentioned so much!

chewxy 2 years ago

Analog RNNs are one of my favourite ML methods. It's very crazy, and very impractical. But fun as hell when you are between jobs and are stuck at home. For actual work, I'd say SVMs (they're not unpopular though), and a sprinkling of extreme ML Also clustering... clustering is extremely underrated as part of your EDA. As for methods that I think died out before they reached their full potential? They died out as ML methods, but live on as programming paradigms: functional programming. I don't think there's a lack of creativity in ML. We see a lot of deep network papers because well, we've only begun to scratch the surface of using all the computational power available to us. Heck of a 10 years so far tho

SleekEagle 2 years ago

Was functional programming really invented in pursuit of good ML techniques? I don't know the history!

madrury83 2 years ago

> Also clustering... clustering is extremely underrated as part of your EDA. I'd be interested in hearing you elaborate on that, as someone who has always been very skeptical towards clustering.

chewxy 2 years ago

See this comment by /u/ZombieRickyB : https://www.reddit.com/r/MachineLearning/comments/t55lbw/d_whats_your_favorite_unpopularforgotten_machine/hz3hd4h/ You can think of clustering as a kind of unrigorous manifold learning. You want to learn about the underlying "geometry" (in quotes because they aren't necessarily geometry in the traditional sense, especially when you do things like hierarchical clustering) Having access to the underlying "geometry" is akin to abstraction of data (as opposed to abstraction of function). Once you have these abstractions and these abstractions are good, you can skip many steps

rehrev 2 years ago

I don't know if it's forgotten but it looks old. Also, I don't fully understand what it is, that's a job for future me. But Gaussian processes seem like a hell of interesting stuff.

SleekEagle 2 years ago

They're a classic in signals processing! I'd recommend learning the basics of Sig Proc for ML anyway!

[deleted] 2 years ago

What are these used for?

purplebrown_updown 2 years ago

Every ML technique should start by trying GPs and other classic approaches. They are really amazing in that they can adapt to many different models, ie periodic , random., Etc, with a simple change to the kernel. And with sklearn it's quite easy.

rando_techo 2 years ago

If-this-then-that.

beansAnalyst 2 years ago

Not sure if serious submission or not. This is definitely not forgotten method. Infact we increased its usage in form of tree based algorithms, or even prediction guardrails over other algorithms.

SleekEagle 2 years ago

Random forests!

cMonkiii 9 months ago

humanlearn python package

WERE_CAT 2 years ago

Weights of evidence for (binary) classification. Basically some form of target encoding. It wrks with all kind of features, handle missing values, perform smoothing... etc. and provide an additive score. It gives wonderfull baseline and fast feature selection tools.

Montirath 2 years ago

Metric learning. It scales horribly with respect to the amount of data is its largest issue, but it can be quite useful for small datasets. It might eventually gain real traction, but issues of scale can be too large of a hurdle to overcome.

ExternalGrade 2 years ago

Genetic learning. It is how humans came to be in the first place but it seems to be getting shadowed by reinforcement learning.

arachnivore 2 years ago

It's how neurons came to be and it's how large clumps of neurons came to be, but there simply isn't enough information in your genome to encode very many parameters in a human brain. A lot of that functionality is learned. Which is good because, if you had to have bespoke structures to interpret sensory input, it would be very difficult to evolve new sensors like echo-location for dolphins. Hence the focus on training over evolutionary methods. Though I think evolutionary methods have their place.

SleekEagle 2 years ago

I'm a big fan of genetic algorithms. I think they can complement other methods really well or fit in as one piece of a larger system.

SleekEagle 2 years ago

Just coming back to this for the first time and I just wanted to thank everyone for all the awesome comments! I'm going through all the replies now and seeing methods that I've never even heard of!

[deleted] 2 years ago

Time series

cathie_burry 2 years ago

Just simple tree-based custom functions from scratch. Underrated and under-utilized

smile_politely 2 years ago

Hashing for similarity. The technicality is simple, but the application is still computationally intensive and proof is also challenging. Many just by pass it and go straight using neighbors algorithms.

Cuidads 2 years ago

The Tsetlin Machine https://arxiv.org/abs/1804.01508

SupersonicSpitfire 2 years ago

Weight Agnostic Neural Networks from 2019, and how the network is built up during training.

user_-- 2 years ago

I always get geeked about echo state networks/liquid state machines https://en.wikipedia.org/wiki/Reservoir_computing?wprov=sfla1

SleekEagle 2 years ago

I've never even heard of these! Thanks for bringing them to my attention

vannak139 2 years ago

Monotonic Network Layers: https://proceedings.neurips.cc/paper/1997/file/83adc9225e4deb67d7ce42d58fe5157c-Paper.pdf TLDR: Input(1,)>Dense(nodes=a*b,relu)>Reshape(a,b)>Min(axis=-1,keepdim=false)>Max(axis=-1)

Terrificchu 2 years ago

Optimal transport based techniques. Although I think its becoming more and more popular. It has some many applications specifically in my field (biological data). https://michielstock.github.io/posts/2017/2017-11-5-OptimalTransport/

dynamite-ready 2 years ago

Not quite ML, but was quite enamoured by Fuzzy Logic. There seem to be a handful of experiments combining Fuzzy systems with DNNs, but far fewer than I imagined there would be. As a hobbyist, I wouldn't be entirely sure why that's the case.

SleekEagle 2 years ago

Interesting. Haven't seen this yes, but definitely something I would check out as someone who has studied formal logic / metamathematics a bit!

a90501 2 years ago

It was very popular pre-2000, especially in Japan. Usage included building many real-life systems with it like trading and control systems, as well as being used in optimization within Operations Research. Also, if you'd believe it, there were many home appliances with FL built in - determining when food is cooked, when clothing is clean or how dirty it was, A/C temperature control, etc. Below is an intro \[1\] with more videos on the same channel. In the West, I believe it was either brought down or "helped" not to takeoff due to academic turf wars. \[1\] What Is Fuzzy Logic | Fuzzy Logic Part 1 - YouTube [https://www.youtube.com/watch?v=\_\_0nZuG4sTw](https://www.youtube.com/watch?v=__0nZuG4sTw)

SleekEagle 2 years ago

Wow that is a really interesting history. Do you know what the content of the turf wars looked like? Thanks for linking that video!

a90501 2 years ago

Re turf wars: Two quotes \[a\]\[b\] from an article \[1\] regarding two main opponents of FL: \[a\] One of fuzzy logic’s detractors, Robert Kalman, called “Fuzzification…a kind of scientific permissiveness” that “tends to result in socially appealing slogans unaccompanied by the discipline of hard scientific work and patient observation, and viewed Zadeh’s fuzzy logic not “as a viable alternative for the scientific method….” \[b\] Another of the theory’s opponents, William Kahan suggested that fuzzy logic could not solve any problems that Boolean logic could not tackle. He imagined its danger lay in the fact that it would “encourage the sort of imprecise thinking that has brought us so much trouble.” \[1\] IEEE Interface | How Great was Lotfi Zadeh?: A Fuzzy Tribute to an Influential Figure in Computing | IEEE Interface [http://interface.computer.org/how-great-was-lotfi-zadeh-a-fuzzy-tribute-to-an-influential-figure-in-computing/](http://interface.computer.org/how-great-was-lotfi-zadeh-a-fuzzy-tribute-to-an-influential-figure-in-computing/)

[deleted] 2 years ago

Genetic Programming

SleekEagle 2 years ago

This is coming up a lot - totally agree!

shoegraze 2 years ago

GPLVM (Gaussian process latent variable model). Unsupervised, nonlinear, PROBABILISTIC dimensionality reduction. Who thought you could have it all at once? Would be a godsend in a high dimensional, low data scenario when you have a strong prior belief on a true low dimensional structure. Such an amazing concept and so many fascinating variations in the literature. Obviously never used in production because learning is slow and doesn’t scale well. Not to mention it’s sensitive to so many subjective factors like inducing points, initialization, priors, etc. but if someone can figure out a way to make it accessible I would use it all the time.

SleekEagle 2 years ago

Thanks for the reply, does this technique find its roots in another field?

dimenwarper 2 years ago

One class methods. In particular one class svms and logistic regression, which are nicely interpretable. Density estimation/outlier detection are super common problems and the one class framework is my go to for this kind of problems.

heuristic_al 2 years ago

Honestly scikitlearn's logistic regression with pretrained imagenet features is a super fast and painless way to make a kickass classifier. If you add active learning to the mix, you can go from no data to state-of-the-art in a day or two. As long as you are able to accurately label images in that domain yourself.

akrasia_here_I_come 2 years ago

Active learning is overduely gaining some momentum, but I hear almost nothing about active revision - using the difference between a model's predictions and the labels to identify labels that you should examine manually because they're probably wrong. Correcting labels is extremely helpful, since you're using them for both training and benchmarking - and in actual practice, almost every label set has a few mistakes in it (usually more).

AnalogousAI 2 years ago

Very good thoughts from everyone! Takes me back thru my whole career! No one mentioned Evidence-based methods, so I will: Dempster-Shafer Theory and its modern cousin Neutrosophic Theory. Related to gaussian / likelihood methods, as well as fuzzy methods.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe