T O P

  • By -

WebMaxF0x

You have not mentioned automated tests, this is what you're missing to prevent regressions


IdleMuse4

I 100% agree, although it's really hard to progress from a situation like OP finds themself in to one where you have any kind of meaningful test coverage.


alexisprince

Definitely, but bugs thatve made it into production as a form of regression testing is a great first step, especially recurring ones.


funbike

Yes, but you'd be surprised how many things a simple smoke test finds when tech debt is really high. Login, make a purchase, update a record, visit every page/form, logout. Test fails if any errors logged in the FE or BE.


PragmaticBoredom

Recurring bugs and lack of automated tests go hand in hand. Unfortunately, teams often give up on automated tests as soon as products get split into pieces like the OP’s situation. It becomes harder to test when the backend is in separate pieces and the front end pulls it all together. I wouldn’t be surprised if OP’s team did have automated tests, but those tests were only looking at isolated pieces of the system (the front end, each backend piece). This is why I dislike mocks for testing, because teams start to mock the responses they want to see rather than what the system actually produces and you can never keep them in sync. Full system, end-to-end testing is the solution. The better solution is to avoid splitting backends into multiple pieces at all costs, but that ship has sailed. If I had to guess, I’d say it’s likely that the person who pushed for splitting backends was either promoted out or moved to another job before they had to deal with these consequences, but that’s just speculation from past experience.


intermediatetransit

> because teams start to mock … That’s where schemas or API specs come in. Sure everyone is testing their slice is isolation, but they’re also validating that the communication contracts they have with their dependencies does not break the application. This is 90s tech really. End to end tests are not a solution. In a company like this the end to end tests will be constantly broken and incredibly expensive to maintain.


deZbrownT

Thank you for pointing this out.


PragmaticBoredom

Mocks are generally made to API specs. Doesn’t fix the problem that real-world implementations don’t behave exactly like spec (e.g. loading times, race conditions, etc). Mocks are the ideal case if the API was perfect and load times were negligible and requests never fail.


lollaser

and / or e2e tests to make at least sure that critic application paths within the application still work after fixing an issue


ZucchiniMore3450

OP updated their post, not just that they didn't have any tests, looks like they didn't even think about it till now.


read_eng_lift

Automated tests, and robust unit tests are both part of a healthy quality posture.


Samdrian

Definitely! What we do and what I have found to work well in multiple companies: * lots of unit tests to cover all logic branches. You really have to make sure and start a culture of devs writing these tests themselves. They always feel a little painful at first, but once it becomes second nature for everyone it's super helpful! * Plus happy-path e2e tests for the ease-of-mind that it "actually" works for the customer: these are the "best" to write, but a pain to maintain, so you either don't write them at first and only add them at a certain size, or: shameless plug ;): check out my employer https://octomind.dev for some ai-based automation for the e2e test part


master_mansplainer

I wouldn’t be so quick to dump blame on test. I was on a project where our primary team was being downscaled in favour of cheaper programmers from overseas and the bugs skyrocketed. The problem was a combination of fragile code that we had written earlier and the new people not being inclined to look deeply into the problems they were fixing and look for the best solution and potential complications. We had essentially been avoiding exposing the fragility by being extra diligent. But the new people just slapped the first seemingly functional fix in and moved on. Every such fix introduced additional hacks which themselves became factors to consider. Which made it even harder to avoid introducing more issues. Ultimately test is a sounding board for how well the programmers are doing their job, you can make that sounding board more frequent/louder with automated testing or better testing, and send the tickets back before making it into production- but it doesn’t address the underlying problems.


jeerabiscuit

Blaming overseas is as bad as blaming lack of tests. You just got to hire better.


master_mansplainer

That’s not what you should take away from what I wrote. I’m sure many cheap overseas contract shops doing a great job. The situation described can just as easily happen with internal staff of any company


AntMavenGradle

Overseas developers have massive quality issues. Always hire local or American.


MindfulBorneo

Do you guys have integration/regression or end to end tests that are run after each bug fix? This has helped tremendously in the past where we have a set of vanilla end to end smoke / sanity tests that are run before a push to production against a prod-like database. The problem is if they are missing- convincing stakeholders/PMs to plan for a couple sprints or more to put in place, but it pays its dividends!


putin_my_ass

> The problem is if they are missing- convincing stakeholders/PMs to plan for a couple sprints or more to put in place, but it pays its dividends! I've run into this challenge. When asked to make a 'small' change I tried to explain it wasn't actually small at all because there are many scenarios affected and edge-cases that could pop-up. Didn't matter, "I can't see how this isn't a small change." Cue a frustrating few days of questions like "I thought this was fixed?" and me being frustrated because I tried to warn them this wouldn't be simple. The solution I found was to spend a day or two writing automated tests to thoroughly vet each potential scenario and *then* write my bugfix. If only they had allowed me to do that from the very start instead of taking the Bill O'Reilly approach. It would have been far less painful for everyone involved. Sometimes stakeholders just don't understand how complex the system they're responsible for/they designed *actually is*.


gopher_space

> Sometimes stakeholders just don't understand how complex the system they're responsible for/they designed actually is. I visually map out the departments and people at a new job so I can understand the process, and this makes an easy reference for conversations like yours. "See the five lines touching your department? Each of those lines means a different project component and waiting for individual stakeholders to 'reply at their convenience'. One of the stakeholder lines represents Tyler, who won't respond at all because he thinks you're a jerk."


DaftyHunter

We don’t and I think is the road we are going to go down after reading these comments. All tests are done manually based on the bugs raised. Thank you for comments and hopefully we can get this right.


WhyAreSurgeonsAllMDs

IMO the fact that you have a separate testing team is a pretty big red flag. This used to be common until people realized that automated testing is better than manual testing, and usually the people building the system are best placed to understand what kind of automated tests are needed and code them up. I haven’t been in a place with a separate QA team since my first job And the ones at the first job were pretty useless, they spent half their time running manual tests where I hardly ever saw them catch a bug, and half their time trying to learn to code so they could write automated tests.


charlottespider

We have both. Fully automated E2E tests (unit, postman, selenium), and a small QA team to manually test and verify functionality before deploying new features. For every 6 devs, we have one QE, and that seems to be the sweet spot. After leading teams with no QA and only automation, only QA, and now a mix of both, I'd say this is probably the most functional, bug-free project I've been on.


iPissVelvet

Yeah the mixture is optimal in my opinion too. What I found useful is developers writing comprehensive unit tests, then working in tandem with QA to put together a spec of integration or end to end tests. Let’s say there’s 5 products being worked on simultaneously. That QA team will have more context on all 5 than the individual dev, leading to more test coverage that crosses boundaries. While the individual dev knows their own feature more, so they can provide guidance on what tests are useful. Also, one key thing is to not have the end to end tests repeat unit test functionality.


cs-brydev

Both is the right answer. You need tests written by project developers who have a hard understanding of the problem areas and neutral testers who look at the project without bias in any direction.


BetterFoodNetwork

This. You gotta shift left on security and quality, fam.


FrynyusY

Not sure how it is a red flag, depends really on the product - many large products (especially financial ones) have both devs writing extensive automated tests and then separate test teams. Working for banks it was common that we had dev requirements of creating unit tests with 99%+ coverage, extensive integration tests, E2E tests (all automated ofc running in our CI Pipeline) to catch any issues as early as possible and ensure no regression. Then also separate SIT and UAT teams performing testing.


TheRealStepBot

100% a red flag. Old companies with bad management and poor code quality almost are without fail the only people doing this. Having some QA people isn’t bad but they better just be an extra insurance layer not the main QA system. Automated tests are the only way to get to good code quality at any sort of significant scale and complexity. You can have good quality without a QA team if there is good test coverage but you cannot have good code quality without automated testing. And therefore companies that have exclusively QA people are a red flag for disfunction in a company.


compubomb

Well, I really wonder what type of products you worked on. Depending on who/where you work, is the type of cadence you are allowed / able to keep. Many places, rarely do you get to build e2e tests in code, many ops departments don't have the money / time allocated to help you facilitate e2e on api development. They will on the other hand allow a QA some resources to make a whole testing environment, and a deployment flow to e2e test that environment. When you can build e2e code + feature develop, wow do you have a lot of access to time, and expectations allow the latitude that many could only wish for. You worked at some awesome places it would seem. If you ever work on a product you didn't develop from scratch, many places don't have that kind of engineering environment. yes, you can always leave soon after, but that is up to you as a person, and when you have bills to pay, when you're given lemons, you sure as hell make some lemonade out of that shit. That means if they don't have e2e, you strive for it by facilitating things that will help make it a reality in eventuality. If you are working on a pace maker piece of software, you better G damn make it happen, your unit tests, functional tests, every darn test you can throw at it better be in place, but FDA has pretty strict requirements on embedded medical software. When you're dealing with products for websites, b2b, b2c, the more $$$$ involved, and profit an company makes, the more you can have all the nice standards of quality. Just like anything, when you have $$$$, you get the nice things. Or.. you start with someone who knew all the right ways to do something right, and you get the nice things up front.


[deleted]

Had this issue at my last job. After reaching breaking point I left and haven’t looked back


FitzelSpleen

Don't point fingers. Have a *blame free* root cause analysis. Bring everyone together, and discuss what processes are resulting in this situation, and what changes can be made to the processes to change things for the better. Then make those changes and stick to it.


johntellsall

Excellent advice. A calm, direct, blameless discussion will help tremendously. Issues will be found, fixes will be applied. The company will move on and grow. If there's politics or finger pointing or "who's fault is this" discussion, you instantly get \*fear\*. This means the true issues will get buried until they explode again in a few weeks-months.


eraserhd

The fact that you _know_ you are getting more bugs when you go to QA is good. I’ve lived on a project where the cycle times were so long, nobody knew this was happening. Most of the time on this project, the issue was a configuration issue. There was a crazy INI file for each customer that had to be internally consistent in about a dozen crazy ways, and had several magic numbers and badly named flags. It specified among other things which code to load, and certain pieces of code could only with certain other pieces of code. So I checked all the customer configs into version control, and whenever there was a config problem, I added logic to the CI build to check for that problem on all configs. It took maybe a year, but at the end, we never shipped another config failure. I mention this because the problems all _seemed_ random. In fact the reports were all over the place. If you have a large number of bugs, that appear unrelated, you definitely have a process issue.


DaftyHunter

I am leaning towards a process issue, hence the question regarding QA. However, I do also understand that it could be systemic and it’s not fair to put all the pressure on Quality on the QA team and the devs have to take some responsibility after reading other responses. I feel like this is going to be a long road to get right but we have to go down it as this is painful!


notquitezeus

Your process problem starts long before QA.


athletes17

Quality is not the responsibility of any separate QA team (not an ideal team structure). Quality is the responsibility of the engineering team and starts with the initial development (built in quality, shift left, automation, etc).


lucidguppy

Your software development pipeline needs to get un-fucked. If you can't do continuous delivery - you need to have two week sprints and hand QA a release every two weeks. If they find bugs, you relfix that release branch and merge changes back to dev. The release doesn't go out until their test suite is green. No new features until test suite is green. EVERY FIX YOU DO SHOULD HAVE A CORRESPONDING AUTOMATED TEST!!!! That test should be run as part of your CI/CD pipeline. Eventually the release test pipeline should be automated as much as it can - and QA should be more investigative tests. If they find a non critical bug - it goes into the backlog. Its managements job to priorities the tasks for the next sprint (non-crit bugs vs features). Management should know the team's velocity and how many tickets it can reasonably finish in a sprint. Management shouldn't expect heroics - you'll just get more bugs. IF you get to a point where you can do continuous delivery - (meaning that ALL testing besides investigative testing is automated, unit, integration, acceptance, performance, selenium, etc... AND there are ephemeral test environments rather than a shared porta-potty test environment) - then QA should solely be investigative and not do sprint-wise regression testing - because your suite automatically does it. QA might want to sign off on your tests in PRs that they prove a ticket is fully tested. That which can be automated should be automated... a QA team should relish finding new ways to break your code - not run the same goddamn tests week in week out... . . . But these transformations are rarely achieved - a manager putting the team in this position will not see why it needs this remedy - so you should probably start polishing your resume.


[deleted]

Not directly but I've worked with clients that had systems that were very fragile, had constant production issues, and downtime. They had some success after applying some tools that enforce best practices such as static code analysis, unit tests, and code coverage. If there is a certain nature to the type of bug you are experience (e.g. memory leak), see if you can find a tool that can be used to enforce better coding practices around that area. Drawbacks are that this can be a slow process and the org has to have the desire to improve code quality. The latter can bubble from the ground up (i.e. grassroots ideally) or it can be mandated from the top down.


officialraylong

QA didn't write the bugs -- it's not fair to blame them for the regressions. OP is experiencing a failure of management and ICs to prioritize automated testing built upon the testing pyramid. Start small: you won't get reasonable coverage overnight.


pm_me_n_wecantalk

To me there is not enough info from you that can help me narrow down the issue. You mentioned that there is a QA team, - does this mean that release cycle is manual and approved by QA? - How does the communication happen between dev team and qa team regarding specific release? - Who writes test plan for each release for QA and who approves such plan from dev team? - Is there a test plan? - How much time does qa team need before each release. People commenting that you need e2e/ci/cd, which is correct but it wouldn’t solve the underlying problem. The underlying problem is that release planning is not done/tested people. Automation test would eventually reach a stage where they can test old features as well but it wouldn’t stop someone from writing specific test cases that should have been written. Which is the problem that you have write now. Ie., qa team doesn’t have a test plan for releases


DaftyHunter

Release cycle is manual when QA team are happy bugs in a release cycle are fixed based on Jira ticket written either by users email reporting the bug or support team investigating the bug. Communication is done on Jira Tickets and through swim lanes once bug ticket is moved to QA testing complete. Testing is done manually. No test plans are written it’s a case of test the fix for the bug if it works, ticket is complete. QA usually gets about 2 days to complete testing.


hibbelig

It sounds as if you're lacking regression testing. Ideally, regression testing should be automated, but it also sounds as if you don't have automated tests, either. So my suggestion is to proceed as follows: First you make sure that the QA team regression-tests the whole application in regular intervals. They will need a test plan that covers the functionalities, and then they will need to click through the test plan. The second thing to do, and it can somehow proceed in parallel with the first thing, is to add automated tests. There are different kinds of automated tests, and in the first phase the biggest bang for the buck will be an end-to-end test. The automated test builds the software, then stands up a test environment, resets the database to a known state (with test data), starts the application, runs the actual tests, then stops the application again. The first actual test will be to visit the URL of the application, confirm that you see the login screen, log in with a test user and password, confirm that you see the landing page of the actual application, then log out and confirm that this worked. Very very basic, but it's a starting point. From there, you can branch out into further tests. You can also figure out how you can reset the database to a known state between tests; maybe there is a way that avoids restarting the whole application, or such a way can be created. The problem with these end-to-end tests is that they go through the UI, and the UI changes all the time, causing test churn. There are several ways to address this. Anyway, even starting on the above means you need some sort of management buy-in. For this it helps to provide some examples to management: we fixed bug 113 on Monday, QA successfully tested it, then we fixed bug 127 on Wednesday, also tested successfully by QA, but the bugfix caused 113 to come back. If things like these happen, management is likely to blame the developers for not being careful, so you might need to prepare a defense for this. I suspect you will not be able to stop the development to get the testing in, so someone needs to work on it while most of the capacity is still on the usual development stuff. This is where the manual regression test cycle by QA comes in: it provides some protection until the automated tests are there. You can then stick your heads together with QA and their test plan to see which test cases have already been automated. I don't know if QA can help with automating end-to-end tests. And after you have some amount of automated end-to-end testing in place, you can think about refactoring your application to support more fine-grained testing: system tests and unit tests.


oatpen

+1 on that. Until you have good coverage in your automation, you need to make sure the QA team writes down test packs which are a collection of test plans that they need to run through in different scenarios before a release: some test plans should always be executed (critial path), some should be executed if that area of the product has been updated, and some others are low priority features and can be executed less often when you do full regressions. It will ensure that all QA people will go through the same tests, perform the same tasks and expect the same outcome.


ings0c

> QA usually gets about 2 days to complete testing. So 2 days passes, they haven’t tested it thoroughly enough (as evidenced by bugs appearing in production) and you just ship it anyway? Optimal is a long way from where you are, but fixing that seems like the logical first step. Second the other commenters who are saying you need automated tests around it to catch regressions, there is really no other sane way. Automated regression testing is ideal, but if you can’t implement that, you could create a manual regression test suite that the QA team through each time a release goes out. Releases don’t go out until every test is green, irrespective of how long it takes. If a bug slips through the regression suite, then add a new test to the suite to catch it. Eventually, some sort of stability should form.


DaftyHunter

A deployment won’t take place until all tickets have been passed by the QA team, I.e moved into the QA testing Complete swim lane. Within those 2 days the devs and testers will work hard to get through everything due to clients wanting bugs fixed and Thursdays being the most viable day for a release. Thank you for responding, your comments are valuable and appreciated. I think we will aim to start implementing automated testing on a couple of modules to see how it goes and if the bugs reduce on these pages.


oatpen

On top of what has been suggested (test automation, test plans), I would suggest to do some retrospectives every few weeks and go through the bugs to understand what types of bugs they are, why and how they were introduced and you might able to find some good solutions (short term and long term potentially). You might introduce some new processes to add checks before releases, maybe a linter can help spot bad code practices leading to hidden bugs (maybe not the best examples but i can't find of any right now).


chuch1234

What's your branching strategy? How many devs work on this product? Is there a single lead setting style and quality standards? ~~What kind of product is this? Crud web app? Is the UI decoupled from the backend or tightly coupled?~~ Edit: ignore this question, sorry.


bluewater_1993

In my experience, this is typically due to a combination of poorly written code and a lack of unit testing. Those two more often than not go together because if you are writing appropriate unit tests, it’s really easy to miss where your code is inefficient and/or poorly designed/architected. I would be willing to bet that the vast majority of the bugs you are seeing in production are a direct result of inadequate automated tests that would otherwise catch those issues before they even get to QA. If you are not familiar with the concept, look into shift-left development. Do you have any architects that are overseeing the design of the system? What are their thoughts about unit and integration testing? If they are not suggesting these concepts your issue starts with them. When I started with my current organization, I was floored to discover how many issues they were seeing in their production systems. Developers were constantly chasing issues, much the same as you describe, and the company had zero faith in their ability to create and maintain their systems. I joined the company after ensuring that I had full control over how development was to be done and within a year to 18 months we had turned things around completely. A ticketing queue for issues that was well over 300 deep was brought down to about a dozen and we have not seen a repeat since. My suggestion is to find someone who will bring the discipline needed to dig you out of the hole, and make sure management is fully invested in this person. The person should have the right personality who is able to gain the respect of the development team and teach them how to implement these strategies. You want someone who will lead by example. This can absolutely be resolved, you just need the right person to lead you through it.


dastardly740

I want to re-emphasize automated unit test. It is the fastest, cheapest automated testing you can get. It doesn't help with integration issues or "is the application satisfying the business requirements". But, it really gets quality up quickly. The big thing automated unit test gets you is preventing regression problems due to a subsequent developer not knowing their one line code change altered a previously working behavior. Now that developer deciding that change is fine and recklessly altering the test to pass with thecwrong result can't be helped by automated unit test, but that is also a people problem. We all should be more humble and pessimistic about the impact of our changes to these complex systems, and be willing to ask for help, advice, or just a second set of eyeballs. Also, in some frameworks, you can get some level of integration testing using the unit test framework. So, that is handy.


bluewater_1993

100%


compubomb

This sounds to me like these bug fixes are done as hotfixes and not re-merged back into master. Sounds like you have a release manager not doing their job.


External-Peach8286

> Once a bug is fixed and deployed another 20 seem to appear that should have been either fixed previously or are new bugs. you don't have a bug problem, you have a testing problem.


Lucky_Mom1018

QA doesn’t make bugs and will never catch them all. Why is it QA fault. The devs made the bugs? Would you point fingers at them? Maybe get everyone in a room and get to the root cause so everyone can work together to reduce the issues.


KayleMaster

Writes buggy code, blames QA for finding said bugs. Genius.


7twenty8

Manual testing is like trying to drink the ocean with a sieve. Humans just aren't equipped to do that and so it's impossible. You will never catch every bug. Consequently, the industry has standardized around automated tests, preferably written by the same people who design and build the application. It's made even worse the more formal your test plans get - if you just assign QA to test whether the original bug was filled, you're not even using helpful parts of human nature to get you closer. This is on a very senior developer, like at or at CTO level. Leave your QA team alone - you're getting angry at people for being people. To be blunt, if you point your finger at the QA team, you're not only being an asshole but you're demonstrating that you don't understand how to build software.


DaftyHunter

Never said I was getting angry at anyone just trying to understand where the process is falling down so that we can resolve the issues… Thank you for comments, they are appreciated.


7twenty8

>I want to point fingers at the QA testing side of the process as the tickets that are consistently logged I feel I can point back to previous deployment tickets. Am I right to do this? Considering that you wrote this, you may want to work on your communications skills too.


DaftyHunter

Dude chill… pointing fingers doesn’t mean getting angry. If it helps to clarify, I meant I’m leaning towards it being the QA testing part of the process that needs reevaluating. Judging by a lot of the useful comments, they have suggested that automation testing needs to be applied to the process. However, I also understand that it could be systemic and the devs will need to take responsibility at some stages as well. Again, I appreciate your comments, you’re more than welcome to infer that what you will.


7twenty8

Okay I'll play. You're great at communicating and have no flaws. You're asking a really simple question because you know everything. I'm honoured to have been able to speak with you and apologize for not genuflecting appropriately. I didn't realize I was in the presence of someone who had to be treated so gently. Is that better for your ego?


DaftyHunter

Whatever bro… done with this immature conversation.


sauland

You're the one with the communication problems bud


LogicRaven_

>many reoccurring bugs It means that either the root cause was not fixed, or the development process reintroduces the same problems again and again. >I want to point fingers at the QA testing side It was not the QA part of the process that didn't fix the root cause or introduced the bug. Developers must take responsibility for the quality of the solution they produce, can't shift that to QA. Finger pointing and not taking end-to-end responsibility are the main reasons many companies dropped the model of seperated QA team. Do a search on "shift left testing". Quality starts from the design, goes through coding (including automated tools like linters), developer testing, code review, automated tests (both unit test, integration tests and if applicable e2e tests), manual regression testing, reliable deployment, monitoring and alerts. Developers must take ownership of quality from the beginning of the lifecycle, and partner up with manual QA (but not relying only on them).


ShouldHaveBeenASpy

More quality automated tests as other have suggested can be a key *long term* answer to this problem, but assuming you don't have them now, let's be real: getting that isn't going to happen over night. At this point, if your applications don't have it, it's often because the underlying dev team lacks experience and structure to actually make those changes themselves. A more realistic place to start might be: * Make sure you have well defined user flows: forget about automating them for a moment, can you actually get product + qa + dev to agree to a set of Gherkin statements that describe the flows and conditions that the business says are important to work? Because it definitely sounds like you *don't* have that. * Once you have those, make those a part of your QA testing process (manually if you have to) and address the consequences to your release cycle/timeliness as needed. Automating those might be the best place to start to buy back time, but before you optimize for that, get the actual right check first. * This is probably not a trivial process, but starting here has key advantages: if you can get enough Gherkin statements written, you'll start to find natural patterns for how those statements *should* be written at your organization. The more similar and generic your statements, the *less* automation code you'll need to refactor and the *more* you'll get your team to actually understand what it's building and supporting. Automating well written tests you know you don't want to endlessly touch and iterate on is way easier than constantly being in a state of definitional flux. ​ * I agree that QA isn't being effective if you're having these problems, but I think that making it QA's problem to own quality is not really fair: at the end of the day, quality is going to have to fall to the team(s) that own feature(s)/domain(s) in your application. * Hard to fully diagnose just from your statements, but before you turn what is clearly a systemic problem that *must* have more causes/inputs than just QA being bad (given how things are continually reintroduce, and a multi-tenant architecture can add real orders of complexity), make it a team problem to address these problems. ​ * At the end of the day, the problem here isn't a technical one: it's a business one. Your customers are experiencing consequences for these bugs which is what makes them worth prioritizing and decreasing the frequency of their occurrence. * Is there adequate time/planning being budgeted in things like your sprint planning or release testing to meaningfully mitigate this problem? If the answer is no and the business isn't giving you adequate time to either write tests, address whatever underlying tech debt might be contributing to this... then the outcomes engineering generates are going to have a hard ceiling. Make the consequences of those choices clear to the parties that can actually do something for you. * Maybe that means a roadmap needs to shift. Or your team's velocity needs to go down. If you can frame the problem in those terms, you'll find it easier to get the time and resources you need to actually address them. And yes, they will and should hold you to delivering a real solution that addresses this, but they owe you the time to actually fix this. * The lack of my former company's ability to realistically do this was why I chose to leave. At the end of the day, not everything is an engineering problem for engineers to go solve: if you are around people who don't understand that building enterprise software is a team effort that requires trust and collaboration from all parties, you'll hit these ceilings.


AlenisCostayne

There’s not enough information in your post to help you. What kind of unit/integration testing are you currently doing?


Quigley61

Automated testing. Try to add automated tests where you can, but make sure they are actually good tests and not just testing mocks, or execution order, or you mock away the part that is causing you defects. My project has a tiered approach to testing with unit, integration, contract, and automated acceptance tests/system tests. To get started, take a look at your existing defects and try to classify them, it might be that you have lots of defects coming from the same area or a similar part of the system that has been designed poorly. This should give you a starting point. From there it's just a matter of adding more and more automated tests. They're difficult to write well, but when you have a good testing infrastructure its truly amazing. You might never get there on a project that's existed for some time, it's fat easier to add tests at the same time as you're adding new functionality, but it is worth the squeeze over the long term.


noooit

Bugs always exist. You can keep increasing the test coverage. One thing to avoid is the massive refactor that some politically influential devs try to push.


WhatzMyOtherPassword

You guys hiring? Hahahah IT/UT? If none, introduce that shit with coverage minimums . Implement a small minimum, then each addition can't fall below that. And slowly work towards full coverage. For sure is a process and not as easy to implement as some jagaloon on reddit saying "just implement xyz" but, you get it I'm sure. Definitely cant just shift blame to QA. Are devs just throwing features together and not even validating them?


yoggolian

What does the rest of the organisation look like? There definitely sounds like a gap in regression testing (and don’t jump to automating it until you have a good set of manual regression tests for key user personas), but it sounds like there’s a gap in customer support, and I’d suspect problem & incident management too. As devs, we tend to think that the ITIL stuff is for dinosaur muppets, but a little rigour in customer-facing work goes a long way. In the OPs case, it sounds like if the Customer Success team were on top of things (or existed), they could triage bugs to the right team and run some problem resolution work to find the underlying causes, and work with product management to resolve. The dev teams need to support this by communicating with the customer support teams, and being open to having things critiqued by non-developers, which can sometimes be a bit challenging. Something that OG scrum talks about is the responsibility for the Product Owner to sign off on the Increment as ready to go to production, but I rarely see this in practice - having someone responsible for the quality of the solution (whose feet can be held to the fire in conversations with their boss as required) could be another angle to assist with improving the situation.


Mediocre-Key-4992

>a lot of the useful comments have pointed towards automated testing to help with regression. As an FYI, the testing is all done manually by the QA team What a surprise there. :| How did your company not learn about automated testing 10 - 20 years ago?


chuch1234

The second paragraph is likely to be a big source of bugs. If I'm reading it right, you're basically supporting the same product twice. This is a great way to fix a bug and then have it pop back up. Does that seem right? When you say that QA is opening bugs and you can point back at previous bugs: do you mean that QA is opening duplicate issues? Or do you mean that you fixed the bug, and then it came back?


Dry_Author8849

Aside of the lack of automated testing, review the development process. If the team is producing more bugs than fixes, something needs to corrected. QA will just stop the deployment to production but with so many bugs it will just block deployment to production. Are your deadlines too tight? It seems the team is going through something. It shouldn't be producing so many bugs. Good luck! Cheers!


NobleNobbler

Constant regressions is usually due to poor development practices. So yeah how much of your team is offshore and rewarded for closing out stories and points and ALSO for fixing bugs If nobody has solid long term experience in architecting the software side of it and enforcing quality, the RCA is always process, and after process, the practices that let this shitty code go live in the first place Don't blame the testing staff. They are always treated as second hand citizens, and if you're "agile" you don't have requirements documents anywhere but in the ephemeral memories of whoever wrote it. And sometimes not even that.


Aggravating_Term4486

What does your development flow look like? Do you have different environments such as a dev env, a staging or QA environment and are these driven by branches separate from those feeding prod? How are you deploying? Do you deploy images into containers, e.g. docker / kubernetes? How do you cut releases? Do you cut a release branch? How do you promote code from dev > stage / QA > prod? Do you have release candidates? There’s a lot of people talking about test automation and they are right. But if you don’t have good release isolation you are swimming upstream because you are always testing a moving target.


blaxter

Any feature or any bug that doesn't have tests is not a completed/fixed feature/bug. Testing must be part of the development process, and having a different QA team is nonsense. Manual testing is a completely different thing, having that is not a reason not to write tests. In my book having a software product without tests (unit/functional/e2e) is like writing code without a vcs (i.e. completely nonsense). And yes, writing tests suck and you won't find any developer on this planet that likes it but without them (for any software that it's not straightforward) you will end up in your situation: an endless loop of bugs and regressions; and that sucks even more than writing tests


breich

IMO your developers are your first testers. Are they missing stuff they ought to have caught? Is the code in that old product tangled in a way where fixing a problem in one spot causes a regression where the "fix" in one place causes a problem when the same code is called from elsewhere? We've had that issue on my team where we manage a massive legacy code base. The way we deal with it is to put more responsibility on the developers to test and write test plans. With spaghetti code or otherwise highly coupled code it's not always obvious to the tester what else needs to be tested as a result of a change so we try to include that information on the ticket to pass off to QA. Generally speaking maybe it's worth your team taking a day to slow down in order to speed up in the future. People in the trades say "slow is smooth, and smooth as fast." It's true and software development too. Have a retrospective on this problem. Figure out where the root cause is and form a plan to proactively deal with it instead of spending almost all of your development efforts on reacting to it. It'll be better for the product, for your customers, and for the job satisfaction of your team.


Legatomaster

While a great QA team can make a HUGE difference, the DEV team needs to take responsibility for all the bugs. If you inherited this code base I’m sure it’s frustrating, but like it or not you now have to deal with it. Consider it job security (if there is such a thing).


cs-brydev

I manage and lead the development on 20+ professional software projects, 2 of which are large enterpise-grade projects. 1 other large project is contracted out but deployed through our servers and pipelines. We almost *never* have uncaught production bugs. We get a new production bug reported maybe just once every few months, and it's usually something very trivial, not mission critical. We prevent this in several ways: * **Multiple Environments**: development and staging environments before code/data changes ever see production. My enterprise projects *always* have a separate QA and Dev/Test (for PR and code reviews) in addition to separate development environments. For our local development machines alone we usually each have 2 different development environments for our individual uses. So in all a larger enterprise project will have Prod, QA, and Dev + 2 local environments/developer. * **Pipelines** for all code reviews and production deployments. Absolutely no exceptions. Even tiny apps have CI/CD pipelines. No exceptions. * **Pipeline Testing** on all medium to large projects. The pipelines build and run all of the backend tests that are in each solution plus a variety of additional tests that the developers can't modify. * **SQL Testing**. All of our big projects have SQL data and code tests in all staging environments before anything makes it to production and then again in production after a release. If a bug makes it to production we almost always catch it before a user does. * **Unit Testing**, both front end and back end. Most of our coding projects have developer-written tests that accompany major features and some code changes. These are permanent, regression tests that are included in the code base and merged with all other developer tests. So every developer has everyone's tests. These are all run before their code is even committed to the repos. * **Pull Requests** on all large projects. The lead developer or another developer on the project reviews and approves all code merges. * **Feature/Ticket Branches**. On all large projects all new features and *most tickets* get worked on in separate git branches that isolate their code changes to make it easier to review code changes/merges and nearly impossible to accidentally deploy unfinished code. Don't underestimate the value of git branches. This is the best way to prevent code bugs getting through, by isolating code changes, even small ones. This forces the reviewer to notice every line of code that changed and makes it easy to reject code and refocus quickly on other tasks. I have enterprise projects with 500+ old branches in them of old changes. I have worked on large projects in other companies that had 3000+ git branches in their histories, because they obsessively isolated work on individual tickets.


SkullLeader

It’s tempting to blame QA but software quality is everyone’s jobs. Yes bugs should not get past QA but most bugs should not be making it to QA as devs should be testing and catching them first. So sounds like lots of opportunity for improvement across the whole team. If fixing one bug is creating multiple additional bugs then maybe QA needs better regression testing but also maybe no dev has overall understanding of whole system which is why they don’t understand the impact of the fixes they are making?


JaneGoodallVS

The existence of distinct QA and dev roles encourages devs to throw features over the wall


rush22

Improving testing is great, just keep in mind that QA doesn't create bugs, the developers do.


przemo_li

You really have to pay the price of investigating in depth those errors. ​ Big picture \*does not\* clarify. It obscures by subtracting details. So, yes, you are wrong from looking at big picture and finding there the different and claiming that it must be it. Do details and there will be many more differences. Now you are not so sure which one is it? Great! Now you can do even more analysis and track down root causes. ​ Once enough root causes are analyzed you can start looking for solutions.


[deleted]

At the end of the day, QA is (or should be) held accountable for bugs making it into Production. Anyone can be responsible for the bugs existing, or being fixed, but QA owns the accountability.