When to rewrite the code base from scratch

I recall Joel Spolsky’s article about never rewriting code from scratch. To summarize his argument: the code does not get rusty, and although it may not look pretty after many maintenance releases, if it works, it works. End users don't care how good the code is.

You can read the article here: Things You Shouldn't Do

I recently took a project and, looking at their code, looked pretty awful. I immediately thought about the prototypes that I built earlier, and bluntly stated that it cannot be used for any production environment. But of course people don’t listen.

The code is built as a website, has no separation of problems, there is no unit testing and code duplication all over the world. There is no data layer, there is no real business logic unless you count a bunch of classes in App_Code.

I made a recommendation to stockholders that although we should keep existing code and issue bug fixes, as well as some minor feature releases, we should immediately rewrite it using Test Driven Development and with a clear separation of problems. I am thinking of going the ASP.NET MVC path.

My only concern is, of course, the time it may take to rewrite from scratch. It’s not very difficult, the mill web application with membership, etc. is quite launched.

Do any of you have a similar problem? What specific steps have you taken?

UPDATE:

So ... What did I finally decide to do? I took Matt's approach and decided to reorganize many areas.

  • Since App_Code was getting pretty big and thus slows down the build time, I removed many of the classes and turned them into the Library class.
  • I created a very simple data access Layer containing all the ADO calls and created an SqlHelper object to make these calls.

  • I implemented cleaner logging
    a solution that is much more concise.

While I’m no longer working on this project [financing, politics, blah blah], I think it gave me some idea of ​​how poorly some projects can be written, and the steps that one developer can take to make things a lot cleaner, easier to read and just flat with small, gradual steps over time.

+58
tdd architecture testing
Jun 30 '09 at 15:37
source share
17 answers

Just because all these problems now does not mean that he should continue to use them. If you find that you fixed a bug in a system that could benefit from, say, a new data layer, then create a new data layer. Just because the entire site does not use it does not mean that you cannot use it. Refactoring as you need while fixing bugs. And make sure you understand exactly what the code is doing before changing it.

Problem with code duplication? Pull it into a class or utility library in a central location the next time you have to fix a bug in duplicate code.

And, as mentioned by other respondents, start writing tests now. It can be tricky if the code is connected, how it sounds, but you can probably start somewhere.

There is no reason to rewrite working code. However, if you are already fixing a bug, there is no reason why you cannot redo this part of the code with a ā€œbetterā€ design.

+55
Jun 30 '09 at 15:49
source share

The book Facts and Errors in Software Development says this: ā€œModifying reusable code is particularly error prone. If more than 20-25 percent of a component needs to be reviewed, it’s more efficient and effective to rewrite it from scratch.ā€ The figures come from some statistical studies carried out on this issue. I think that the numbers may vary depending on the quality of the code base, so in your case it seems more efficient and effective to rewrite it from scratch, taking into account this statement.

+18
Jul 05 '09 at 0:38
source share

Joel's article says it all.

In principle, never.

As Joel points out: you just lose too much by doing it from scratch. It will probably take longer than you think, and what is the end result? Something that basically does the same thing. So what is a business example for this?

This is an important point: it costs money to write something from scratch. How do you pay back this money? Many programmers ignore this point simply because they do not like the code - sometimes with justification, sometimes not.

+16
Jun 30 '09 at 15:39
source share

I had such an application, and rewriting was very useful. However, you should try to use the "enhancement" trap.

When you rewrite everything, it’s very tempting to add new features and fix some long-standing problems that you didn't have to deal with. This can lead to creep functions, as well as extend the time required to overwrite extremely.

Make sure that you decide what will be changed and what will only be rewritten - in advance .

+10
Jun 30 '09 at 15:42
source share

I was part of a small team that rewrote code from scratch, including business rules for reverse engineering earlier code. The original application was a web service written in C ++ (with regular crashes and serious memory leaks) and an ASP.Net 1.0 web application, and the replacement was an asmx-based web service in C # 2.0 and an ASP.Net web application 2.0 with Ajax. This said some of the things the team did and explained to management.

  • We maintained the existing code base in production until the new code was ready.
  • Management agreed that the rewrite (first release) would not introduce any new functions, but simply introduce existing functions. We added only 1-2 new features at the end.
  • The small team consisted of very experienced developers with excellent ability to understand and collaborate.
  • It was harder to get C ++ talent in the organization, and C # was considered as the best alternative for future service.
  • We agreed to an aggressive timeframe, but at the same time we were confident and very motivated to work on C # 2.0, ASP.Net 2.0, etc.
  • We had a team leader who protected us from senior management, and we followed this process.

The project was very successful. It was very stable and much better. Later it became easier to add new features. Therefore, I believe that rewriting code can be successfully performed taking into account the correct resource and circumstances.

+7
Jul 12 '09 at 10:08
source share

Only one quasi-legitimate reason comes to mind: politics.

I had to rewrite the code base from scratch, and this is due to politics. Basically, the previous encoder that managed the codebase was too embarrassed to release the source code for the new team that was just hired. She felt that all criticism of the code was criticism of her as an individual, and as a result, she only released the code for all of us when she was forced to. She is the only person with administrative access to the source repository, and whenever she was asked to free the entire source, she threatened to quit smoking and take all her knowledge of the code and return home.

This code base is over 15 years old and has convolution and distortion from different people with different styles. None of these styles seemed to include comments or specifications, at least in the small portions that she released to us.

With only an incomplete code and a deadline, I was forced to do a complete rewrite. As a result, I screamed because it was claimed that I had caused a serious delay, but I just kept my head and did it, not arguing.

Politics can be a huge pain.

+6
Jun 30 '09 at 15:45
source share

I do not agree with this article. For the most part, Joel is right, but there are counter examples that sometimes indicate (even if rarely) rewriting is a good idea. For example.

  • Windows NT (broke from the old DOS code base. Win2k, WinXP and the upcoming Win7 were created on this foundation. Yes, Vista too. The latest version of Windows on the old base was the infamous WinME)
  • Mac OS X (rebuilt its flagship product on FreeBSD)
  • Many cases where a competitor displaces a de facto standard. (e.g. Excel and Lotus 123).

I believe that Joel’s argument is mainly based on fairly well-written code in the existing version, which can be improved by retrospective analysis. By all means, if the code you inherited is really that bad, click to rewrite - there are some scary things there. If this is tolerable at all and works reasonably well, the phase into the new material is slower.

+6
Jun 30 '09 at 16:05
source share

I was in this situation, but instead of completely rewriting, I worked on changing the refactoring process. The problem I ran into was the enormous complexity of the code I worked with - on many pages of terrible development with a special case based on if-case and confusing regular expressions superimposed over more than 10 years of unplanned growth and expansion.

My goal was to get its reorganized function by function, so that it provides the same output for the same source data, but works much better and smoothly under the hood to facilitate future growth and increase productivity. The general solution was clean and quick, but the work of fixing the code just got harder and harder, as obscure special cases in the documents processed by the system started to show themselves, and my good clean code would produce a result that was just a little different than what the original did (these were web pages, so a different number of spaces could cause all layout problems in older versions of IE) in small and obscure ways.

I don’t know if the recycled code used everything that I left before it had the opportunity to fully integrate, but I doubt it. Why use twenty lines of code when one and a half hundred if statements and three-line regular expressions can do the same job?

+4
Jun 30 '09 at 15:49
source share

One danger in overwriting completely is that your work is constantly on the line. You stand that does not contribute to the bottom line. A code that sucks is a code that makes money.

But if you fix the existing code at a time, you are the one who knows how the money machine works.

+4
Jul 02 '09 at 14:21
source share

At some point you need to reduce your losses. If you just inherited this code base, you can make changes that have unintended consequences, and due to the lack of tests, they will be almost impossible to find.

At least start writing tests right away.

+3
Jun 30 '09 at 15:44
source share

Instead of completely rewriting from scratch, you want to start reorganizing your code base in small steps when entering unit tests. for example

  • Move duplicate code to a common class using tests for reuse throughout the project.
  • Introduce interfaces for creating individual modules to be tested. You can then reorganize the implementation behind the interface, relying on your tests to make sure you don't break anything.
+3
Jun 30 '09 at 15:50
source share

I would rather do things in parts, for example, create a back-end for a database with a data model when you work in these areas (that is, first a user login, then user management, etc.), and configure the existing interface for use the new back-end (the interface is managed, so you can also add tests). This will save the existing code with possible undocumented settings and behaviors that you would not replicate, developing again from scratch, while adding some separation of problems.

After some time, you will recycle about 60% of the code base to use new content, without the work being an official project, just maintenance, so you better argue about the development time another 40%, and once this is done, the existing front-end classes will be significantly reduced in size and complexity. Once it has been fully ported, you can reuse the new model and controller components if you have time to implement the new view.

+3
Jun 30 '09 at 16:02
source share

The answer is: rewrite from scratch as often as possible .

I spent most of my career inheriting steaming heaps of dung that were politely called "programs" written by young, inexperienced programmers, whom managers considered "rock stars." These things, as a rule, cannot be fixed, and you end up spending 10 times more effort to limp, since you would just spend rewriting them from scratch.

But I also benefited immensely from periodically rewriting my own work. Each rewrite is an opportunity to do something different and potentially better, and you should be able to reuse at least some parts of the old version.

Speaking of which, not all dubbing is a good idea. Windows Vista, for example.

+3
Jul 09 '09 at 13:29
source share

Start by writing a technical specification. If the code is terrible, then I bet there is no real specification. So write a comprehensive and detailed specification - you still need to write a specification if you want to rewrite from scratch, so time is a good investment. Be careful to include all the details about the functionality. Since you can investigate the actual behavior of the application, this should be easy. Feel free to include suggestions for improvement, but be sure to capture all the details of the current behavior.

As part of the investigation, you might consider writing some automated system tests to investigate and document expected behavior. Focus on black box / integration testing rather than unit testing (which the code probably won't allow anyway if it's so ugly).

When you have this specification, you will most likely find that the application is actually much more complicated than your first impression, and revise the rewriting from scratch. If you decide to gradually refactor, specifications and tests will help you. But if you still decide to go ahead and rewrite, then you have a good specification for working from now on, and a set of integration tests that will broadcast you when your work is completed.

+1
Jul 02 '09 at 14:15
source share

I think it depends on two things:

1) Invalid base design of an outdated code base,

2) The time required for dubbing.

1) The company I work with used a terribly designed code base, which made the refactor really difficult because we could not reorganize one bit at a time, the main problem was not in separate classes and functions, but in the general design, Thus, The refactoring approach will be very complex. (If the overall design was good, but, say, individual functions were 300 lines long and needed a break, then refactoring makes sense).

2) Despite a lot of code and very complicated startup processes. Our engine didn't do much. So rewriting was not so long. Sometimes managers do not realize that the functionality of hundreds of thousands of lines of code can be rebuilt in a very short time.

We tried to explain this to our CTO (small company), but he still thought that rewriting would be risky, so my colleague and I rewrote the basic functionality of the engine in about four days off. Then he showed to our technical specialist and, finally, he was convinced.

Now, if it takes us six months to build the basic functionality, we will not have many arguments.

+1
Jun 25 '15 at 17:03
source share

In economics, there is a contradictory statement that says:

Never take sweaty costs into account

Streams are standing, according to Wikipedia ( https://en.wikipedia.org/wiki/Sunk_cost ):

In economics and business decision making, sunken costs are costs that have already been incurred and cannot be recovered.

When in-line costs are connected with political pressure or personal ego (which manager wants to be the one who admits that they made a bad decision or did not control the results properly, even if it was inevitable or could not be directly controlled?), This leads to a situation called an escalation of obligations ( https://en.wikipedia.org/wiki/Escalation_of_commitment ), which is defined as:

a model of behavior in which an individual or group, faced with the increasingly negative consequences of a decision, action and investment, will continue, and not change its course - something irrational, but in accordance with decisions and actions previously done.

How does this relate to code?

Having a fairly long career as a software developer, one common thread I found is that when faced with a complex or ugly code base (even if this is our own two years ago), our first instinct is to want to throw away the old, ugly code and rewrite it from scratch. If this is a familiar code base, then this is usually due to the fact that we are now more familiar with project traps and business requirements than we were when we started the project, so we (perhaps subconsciously) seek the possibility that to correct our past sins by erasing them with perfection. If this is an unfamiliar code base, we often tend to oversimplify the problems that original developers encounter by masking ā€œsmall detailsā€ in favor of a ā€œlarge imageā€ of architectural thinking and often blowing budgets and time frames because of a lack of understanding of the complex pettiness of business cases, initially intended to address.

Then there is a whole concept of technical debt, which, like financial debt, MAY and WILL accumulate to such an extent that the code base becomes technically untenable. More and more time and resources are being invested in troubleshooting, extinguishing fires, and overly complex improvements, making progress ahead costly, complex, and dangerous. Projects take longer and longer due to defects and break away from project work to eliminate production problems. After a few hours, the ā€œincidentsā€ begin to become the expected operation, and not some rare outbreak. Instead of backing down and starting to do things right to increase our future productivity (and quality of life), we find ourselves in a situation where we are forced to add more and more technical debts to meet deadlines - the technical equivalent of credit card cash advances, to make minimum payment to another card.

That is all said, this does not mean that we should correspond whenever possible, and we should not avoid rewriting working code at all costs. Both extremes are potentially wasteful, and the latter, as a rule, lead to an increase in liabilities (because at all costs means complete neglect of costs, even if these costs are completely ahead of the benefits). What is about to happen is an objective assessment of the costs and benefits of rewriting code compared to incremental improvements. The challenge is to find someone with experience and objectivity to make this decision correctly. For us developers, we tend to rewrite, because it tends to be much more interesting and more interesting than working with some crappy outdated code base. Business managers tend to be biased towards another direction, because rewriting imposes some unknowns with a slight tangible immediate benefit. The result, as a rule, is the lack of a real solution, which then by default continues to reset the clock into the existing code until some circumstance requires a directed shift (or the developer secretly rewrites the code and usually gets a flogging for it).

I worked on code bases that were somewhat salvaged, albeit ugly. They did not adhere to established practices or standards, did not use templates, were ugly, but they performed their intended functions well enough and were flexible enough to be modified to meet expected future needs for the expected duration of the application, although this is not glamorous, it was it is perfectly acceptable to keep this code alive, making additional improvements when the opportunity arises. Fulfilling otherwise would bring little benefit than it would look beautiful. I would say that most of the code I should rewrite this about? the question arises under this category, and I find myself explaining to the junior developers on the team that although it would be great to rewrite YetAnotherLineOfBusinessApp into {insert the whizzbang structure here}, this is not necessary or desirable, and here are some ways we can improve it ...

I also worked on codebases that were hopeless. These were applications that barely started in the first place, as a rule, lag behind the schedule and the state with reduced functionality. They were written so that no one but the original developer can understand what the code ultimately does. I call it read-only code. After it is written, any attempt at a change potentially leads to a system illegible error of unknown origin, which leads to a panic in wholesale censuses of massive monolithic code constructions that have no purpose, except to teach the current developer what is really happening with a variable artfully named obj_85 at runtime reaches a line of 1,209 nested 7 levels deep if... else... , switch and foreach... with instructions somewhere in the DoEverythingAndMakeCoffee(...) method. Attempts to reorganize this code fail. Each path you walked leads to a different problem, as well as to other paths, and then to the paths of this branch, and then turns back to the previous path, and after two weeks of reviewing one class, you understand differently that, although it may be better encapsulated, the new code is almost as drunk and confusing as the old code, it probably contains even more errors, because the original intention of what you reorganized was completely unclear, and not knowing exactly which business cases led to the initial disaster, first of all You can be sure that you are fully reproduced functionality. Progress almost does not exist, because translating the code base is almost impossible, and something so innocent renames the variable or uses the correct type, gives an exponential number of unforeseen side effects.

Trying to improve codebases like the ones above is a futile exercise. Refactoring usually results in 80% rewriting anyway, and the end result is almost never improved by 80%. As a result, you have something very inconsistent, and the new code has a lot of trade-offs that should have been implemented in the interests of interacting with outdated code (half of which is not needed, because the legacy code that the new code needs to interact with later gets refactoring anyway). There are only two ways that can be followed ... continue to gain technical debt by cracking ā€œfixesā€ and modifications, hoping the application is outdated (or you switch to another project) before it crashes under its own weight, or someone makes a business decision and risks making full correspondence. I hate both of these options, because usually it means that until something critical works or the project keeps up with the schedule, and then you spend the next three months of nights and weekends trying to get something breathable which probably should never have been alive in the first place.

So how do you decide?

  • How well does existing code work? Is it reliable and relatively free from defects?
  • Can people on my team understand what this code does with a reasonable degree of effort? If I bring an experienced developer, can he / she have enough knowledge of this to be productive in a reasonable amount of time?
  • To do what should be simple defects, to carry out geological measurements of time in order to correct it; so much so that we cannot make real improvements or fulfill the project deadlines?
  • Is such a codebase so fragile and life expectancy that the application’s ability to adapt to future expected business needs is doubtful?
  • Does the existing code meet the original functional requirements?
  • Is your organization even susceptible to investing in an application, or will someone (especially someone at a higher level in the org diagram) be referred to solve the problem?
  • Can you provide a financial or risk justification supported by hard facts to make a business example for rewriting?
  • If, after taking full account of the time and costs of rewriting (including the development of appropriate specifications, quality testing, stabilization after preparation and training, does it make sense to start rewriting the code (we, the developers, tend only to think in terms of coding time)?
  • Do you have a choice? Is it possible that the existing code meets the requirements (because if not, rewriting huge rows will be part of the project and is considered an ā€œimprovementā€ instead of rewriting)?
+1
Jan 10 '17 at 17:47 on
source share

There is an old saying that says:

There is no such thing as bad code. There is only code that does what you want and code that does not.

This is the key to understanding when to rewrite. The system is currently doing what you want? If so, then slow but steady improvements are your best bet. If the answer is no, rewriting is what you want.

Returning to Joel's essay, he says that this code is dirty, but reliable software and provides the expected value. If instead you have untrusted code full of serious errors, and this does not apply to all of your use cases. You had things that were supposed to be there while not working or simply missing. In this case, all the little hair growing from it is not a bug fix, but a cancer.

0
Aug 26 '15 at 0:43
source share



All Articles