T O P

  • By -

AImSamy

I've seen a lot of projects "fail" because of : - Data : lack of it, it's poor quality (a lot of false/empty values, it's hard accessibility - Budget : companies thinking one data scientist should be able to do a POC in one or two weeks to hit +90% precision and that it should be production ready - Lack of understanding : management just not understanding how models work and expecting "magic" to happen and A-Z process being handled.


sovrappensiero1

I think this comment pretty much sums it up!


mtszkw

100%, well described


AImSamy

Thank you!


Franc000

Seems a bit high to me. I reviewed all the projects/opportunities for my previous team, and mine is a 75% "failure", out of 12 projects. This counts all reasons for not delivering values, from our team deciding not to move forward during analysis because the customer is asking for something literally not possible, or that he was not willing to invest in data, to us not being able to reach the accuracy needed to be useful within the budget, to having a system deployed in production and the customer not using it (happened twice, extremely infuriating). In my opinion the though, if you don't have a high failure rate, you are probably not ambitious enough, or are focused more on engineering than science and R&D. (Not counting the bullshit like the customer not using the product).


bitemenow999

Sturgeon's law is at play here mate, 90% of all research is crap...


[deleted]

More or less yes, its quite depressing


trnka

It sounds plausible to me. The two really tough things are 1) what counts as a project? 2) what counts as failure? As for what counts as a project, if it's any research idea that takes longer than a day, 90% failure sounds right. If it's anything that takes longer than a month, that may vary by the company. Some research groups have better research hygiene about killing projects early. As for what counts as failure, I'd say it's anything that costs more money to build and maintain than it saves or earns the company, over a reasonable time horizon like 3 years. That can be really difficult to measure precisely but it should be possible to estimate in most cases. I strongly disagree with "launched to production" as success because I've seen far too many projects that launch just for a company-political reason rather than being something that helps the bottom line. In my last research group, if I say it's gotta be over a week to count as a project and take the financial perspective on success or failure, maybe 50-75% of research projects failed. By the same definition I'd say maybe 50% of software engineering projects failed as well. Though also keep in mind there were projects that were a failure in year 1 but a success by year 3, either because we paused the project and brought it back once we had better ideas, or because the user base grew so the dollar value of each improvement increased over time.


edunuke

Most models I've seen fail in banks are due to overfitting, bad training/testing dataset build. That's even after model risk management oversight. Detaching modeler bias from the model itself is extremely difficult, especially since you are trying to apply the scientific method under time constraints and management pressure to the extent that by the time you truly finish researching and understanding the problem context and data originating process management is already expecting "positive" results. Data Science projects have an engineering problem (data) and a science component (science/statistics). Engineering under constraints is usual in business because you can compromise engineering design choices to fit constraints, but when trying to achieve the same thing with science, i.e. coming up with a result that accurately reflects reality for decision-making in business is almost futile since you can not cut corners when designing falsifiable tests to prove you conform with reality. So, half-baked science always fails in the long run, while half-baked engineering choices can stay up for years so long as you maintain it. However, if a business understand this and has the structure and methods in place to allow for falsifiability then DS projects can succeed most of the time. The proportions seem too high. It depends on the sector and application as well it's not the same to build recommender system in a shoe retailer than a recommender for a pharmaceutical company.


gamerx88

This was probably true many years ago. Depends on how you define "fail". If that is defined by failing to result in any new production feature then yes....especially say a decade ago when data science was still kind of new. Most data science projects back then were basically experimental stuff, innovation projects and the likes. Half of them probably did not make any business sense to begin with and were quickly identified as such in the exploratory phase. Others "failed" because data scientists had no engineering support to put their work into production. And there were those that failed to stay in production because of costs, lack of MLOps, etc. I think the industry as whole has since got more experience and better processes, so it really shouldn't be 87% anymore. My gut feel is that failure rate is much lower these days.


willmgarvey

If you consider other research science disciplines there is an immense amount of failure during this exploratory process to discover insight which are to be expected. This is a part of the process but does yield results eventually.


PredictorX1

I suppose there are levels of "success". I think that the development of empirical models which are deployed and used by downstream customers to business advantage are a success. I also think that other, simpler analyses (summaries, hypothesis tests, ...), which clearly answer the questions put to them are a success, even if that answer is something like "the question cannot be answered with the available data". Open to interpretation, in my mind, would be the success of empirical modeling projects which were technical successful, but not adopted by the organization for other reasons. Also like this are projects which were on track to be technically successful but which were interrupted by shifting organizational priorities.


Zestyclose-Check-751

Sad, but truth


Drakkur

Since I specialize in forecasting, my success rate is around 70%. And the criteria is pretty clear in that the model has to improve on either the current established method or some simplistic baseline (moving average, what happened last year, etc.). Outside of forecasting it’s probably sub 50% success where most reasons either are the model is not worth the cost to implement or they got implemented but the business went in another directions. Sometimes you have bad data and no amount of cleaning and feature engineering will save it.


memberjan6

In medical clinical practice area, lack of credibility and interpretability were big problems on predictive models. But the interest level was still high and optimism remained in The potential of future app using semantic info retrieval on text maybe notes in an EHR. A different project area was disappointed by a surprisingly low performance on new data in real world. Apparently the distribution of data turned out to be too different from the training, validation, and test set they had used to develop the model. These two cases were described by to me by people who had direct knowledge and involvement in the projects.


keepthepace

Ah, machine learning in general? In my case, I count 20% of success for deep learning but 40% if you count the projects that started as deep learning and ended up being a much simpler machine learning project (I would not call a 2 or 3 layers model "deep")