Aiko Yamashita

How bad is smelly code?

—Measuring the effect of code smells

By Aiko Yamashita
and Thomas Flemming

Running scientific experiments to figure out which code smells are worst for maintenance.

 

What to do with code that smells?

We have all been taught that our main responsibility as developers is not to write bad code. Unless you are a lonely programmer writing a few lines of Perl that you will throw away later, the most important thing is that the code you write is easy to read and understand. Maintainable code will, through the life span of the software product, often save you and your colleagues many hours of pain and frustration in front of the computer.

Long list...
Creative Commons Image via Getty

However, it is not always so obvious what makes code unreadable to others, which is one reason why things like coding standards, guidelines, code patterns and programming language idioms exist. One well-known guideline is the book «Refactoring» by Martin Fowler[1], where he describes a list of «smells» in the code and refactoring strategies to remove them. Yet, with this invaluable resource, we are left with the challenge not only of learning the many refactoring strategies, but also determining which ones to prioritize. Surely, they cannot all be equally important! How do we know that the code refactoring we do now will pay off in the long run?

The problem is that Martin Fowler’s refactoring book does not say which code smells are critical and which are not. Fowler himself says there is no metric or indicator that can compete with human intuition.  That leaves us only with our intuition and experience as developers when deciding whether to refactor something or not. This can be a nightmare. Where do you start in a system with thousands of code smells? Besides, any kind of change in the code can lead to undesired side effects. Even with good, automated tests, changes often involve high risks and can become very expensive. It would be nice to know for a fact which code smells are the most harmful and get rid of them first. Also, we would like to show management that we are not wasting our time on writing «beautiful code» for the sake of it, but that the effort we put in now will pay off later in the project.

 

Collecting facts about code smells

In 2009, the Norwegian Simula Research Laboratory needed to adapt an internal system to a new content management system. The project was outsourced to 6 Java developers. This project was seen as an opportunity to study closely what effect code smells has on maintainability.

During the three months the project lasted, tools were used to measure code smells, developers were interviewed every day and each developer was instructed to install a logger in the Eclipse IDE. The logger made it possible not only to measure how much time was spent modifying each file, but also the time spent searching, browsing and scrolling through code.

During the project, using an issue tracker, they registered the problems developers faced, and traced them back to the source files causing those problems. Additionally, all developers were interviewed. The data gathered contained a lot more data than what is normally available by just analyzing code repositories. Here is what they found…

 

Finding 1: God Classes are in fact bad!

Source: https://simula.no/publications/Simula.simula.1460

«Everyone knows» that God Classes are bad, but to what extent? Look at this this diagram:

Effort vs lines of code

The Y-axis shows the amount of time spent reading and/or modifying a file during the maintenance project and the X-axis is the size of the files.

Note how the largest file (1844 lines of code – LOC) took 10 times longer time to read and modify than the great majority of files with less than 600 LOC. This file constituted a God Class. Also, note that 20,000 seconds is roughly 5 hours of work (that is many hours for one file!). We can also see a large file (roughly 1400 LOC), which did not take too much time to edit. This latter file contained a large amount of getters and setters, and no logic at all (as opposed to God Classes). This explains why developers did not spend much time on it. Large files with complex logic (i.e., God Classes) may strongly impact maintainability.

Recommendation: You should split larger files with complex logic into smaller ones. A recommended threshold value is holding your file size lower than 1000 lines of code.

 

Finding 2: Data Clumps are not as bad as you may think…

Source: https://simula.no/publications/Simula.simula.1456

«Data Clumps» are collections of semantically unrelated methods and variables. Normally, files that contain data clumps contain different types of variables, often followed by a series of setters and getters. For example, the class Person in the following diagram

Refactoring data clumps

contains information that does not directly concern a person, and can thus be split into two classes. The researchers at Simula created a statistical model to explain whether the appearance of code smells increases the probability of developers facing problems during maintenance (remember that they recorded all the problems faced during maintenance and which files where causing them). They found that files containing data clumps had in fact lower chances of being involved in a problem during maintenance!

Recommendation: Leave Data Clumps alone, unless they contain other code smells.

 

Finding 3: Stick to the Interface Segregation Principle

The Interface Segregation Principle (ISP) was introduced by Robert C. Martin («Uncle Bob») as part of the SOLID principles[2].

The ISP states that no client should be forced to depend on methods it does not use. ISP splits interfaces which are very large into smaller and more specific ones so that clients will only have to know about the methods that are of interest to them.

Researchers in software engineering[3] have come up with suggestions on how to use code metrics to identify some code smells. Some commercial tools implement them, but you could also come up with your own detection strategies and implement them in tools such as SonarQube for Java or NDepend for .NET. The detection strategy used by Simula is illustrated in the following diagram, and is based on the work by the researcher Radu Marinescu[4].

ISP detection in classes

This class has a «Class Interface Width» (i.e. the number of public methods/attributes) of 10, «Clients of Class» is 8, and an «Average use of Interface» (i.e. the number of methods/attributes actually used by its clients divided by the number of clients) of 0.375. According to this detection strategy, this class probably violates the Interface Segregation Principle.

The analysis done concerning data clumps was also performed on files violating the ISP, and it was found that the odds of a file being problematic increases (with statistical significance) when a file violates this principle.

This can also be seen in the following diagram, which shows that the proportion of problematic files (approximately 30%) is higher in files violating the ISP than in files that do not violate this principle.

ISP violations vs problems

Recommendation: Splitting wide, multi-purpose interfaces can reduce the risk of maintenance problems.

 

Finding 4: Code smells «team up»

Source: https://simula.no/publications/Simula.simula.1508

In the same experiment, certain code smells showed a tendency to appear together in the same file. The diagram below shows the code smells identified that often come together in the same file. The «hoarders» are essentially those files that want to hoard all the functionality of the system. The «confounders» are smells that appear in files that are often confusing. The other two groups, «wide interfaces» and «data containers», are hopefully self-explanatory.

Relations between problems

Recommendation: If you find a certain type of smell in a file, you may also want to check if it also contains any of its «companions». A combination of code smells in a file increases risk and reduces maintainability!

 

TL;DR

As Fowler says, we should follow our instinct, experience and judgment for deciding what to refactor, but here are some guidelines based on scientific studies that can help you prioritize:

  1. Split large files (>1000 lines of code) containing extensive logic
  2. It is not critical to refactor data clumps
  3. Split wide, multipurpose interfaces into separate interfaces for each purpose
  4. Files containing certain smells may also come with more «unwanted buddies», so keep an eye on them!

Happy refactoring! :-D


References
[1]: http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672
[2]: http://www.amazon.com/Software-Development-Principles-Patterns-Practices/dp/0135974445
[3]: http://sewiki.iai.uni-bonn.de/research/cultivate/tutorial_exploring_smells_and_metrics
[4]: http://loose.upt.ro./download/thesis/thesis.zip

Aiko Yamashita

Om Aiko Yamashita

Systemkonsulent og forsker. Jobber med systemutvikling og kvalitetsanalyse av prosjekt/kildekode. Interesser inkluderer kildekodeanalyser, teknisk gjeld, og empirisk software engineering.

Kommentarer (17)

  • How bad is smelly code? Measuring the effect of bad code. | Rocketboom sier:

    [...] How bad is smelly code? Measuring the effect of bad code. Source: http://fagblogg.mesan.no/how-bad-is-smelly-code/ 0    [...]

  • Abraham Marín-Pérez (@AbrahamMarin) sier:

    Not bad, I like the idea of scientifically measuring how impacting code smells are, although to be truly scientific we shouldn’t make recommendations based on one experiment. It could be that data clumps aren’t bad on this particular project because the data model complexity is low, or it could be that god classes are bad for this particular team because they had small screens and scrolling up and down was particularly hard. In order to make a proper assessment or recommendation, more projects would need to be investigated.

    [Svar]

    Aiko Yamashita

    Aiko Yamashita Reply:

    Hi Abraham!
    Yes, I agree absolutely with you that more experimental replications are needed for being more confident about the effects of code smells, and to know in which situations/contexts they are dangerous.

    As we mentioned in this article, we suggest some *guidelines* based on empirical data, and by no means are meant to be rules that people should follow blindly (that should not be done neither with «guru’s» opinions nor academic work).

    When it comes to this particular experiment or «controlled case study» (as I like to call it) it was based on a maintenance project involving four java web applications. The systems maintained used DTO (data transference objects) as means for transferring data, and as such they come ‘naturally’ with the J2EE framework. Most DTOs were considered either Data Classes or Data Clumps by the smell detection tool, but in this context they did not cause much troubles (partly because they are *meant* to be there).

    In the sources indicated in the reference section you can find more details of the characteristics of the project that was observed as part of the study, so you can try to see if the context matches your current project. Then it would be easier to ‘extrapolate’ these results and see if they apply to your context :)

    [Svar]

  • Vidar sier:

    Very Nice writeup!

    [Svar]

    Aiko Yamashita

    Aiko Yamashita Reply:

    Thank you! :) planning for some more soon!

    [Svar]

  • Tim sier:

    I always start with separating long functions into smaller, simpler ones. Some of the nastiest problems go away, at least enough, once the code can actually be read. It also means that refactoring actual smells out down the line is much easier.

    [Svar]

    Aiko Yamashita

    Aiko Yamashita Reply:

    Yes, you are right. There are mainly two types of refactoring, the simple ones known as «floss» refactoring and the more complex ones which are known by some as «root canal» refactoring. The advantage of the first one is that they tend to have low risk in terms of potential cost and side effects, as opposed to the latter. One of the reasons XP advocates so much on constant floss refactoring is to avoid incurring in high-risky, complex refactoring, but is rather hard to say if this rationale holds in all cases.. I think that in some cases you will not be able to avoid the ‘root canal’ treatment :)

    [Svar]

  • Jeff Darcy sier:

    Very nice. I’d also add another benefit of splitting large files into multiple smaller ones: clearer revision history making it obvious at a glance which *parts* of a file require frequent change. Such frequent change is itself a code smell, so this helps identify candidates for subsequent rounds of refactoring.

    [Svar]

    Aiko Yamashita

    Aiko Yamashita Reply:

    That is a really interesting comment, cause in the same analysis where we looked at ISP violators and Data Clumps, we also found that churn size (# LOC added + deleted + modified) of a file is actually a good indicator of problematic files!

    So: another way to decide which files to refactor is to do some repository mining and find which files have a high number of revisions (as you mentioned, which has the advantage that is easier to calculate) or alternatively, use churn size.

    [Svar]

  • Ben sier:

    Hello Aiko, very informative article; I’m curious though, obviously these are suggestions for improving the state of a given codebase, but what would you suggest (with regards to similar improvements) when the codebase has a very large amount of inertia (i.e. A C/C++ front-end has numerous files, the majority of which span tens of thousands loc)?

    [Svar]

    Aiko Yamashita

    Aiko Yamashita Reply:

    Hi Ben,

    That is quite a question! There are no obvious or straightforward answers, I think.. but that is also a reason why maintainability evaluations should always consider contextual information such as the scope of the maintenance (ex. keep it up and running, vs. adapting it to another environment), or the technology available (ex. level of automation) for conducting the maintenance.

    What you say about ‘inertia’ is that the software product has reached a certain level of ‘stability’ and does not need to be modified very often.. Then one cannot expect a huge ROI from refactoring out the big files. However, I have seen many owners of complex, gigantic products that reach a point where they struggle to keep the time to market, and their situation is aggravated by short iterations, large teams, and unexperienced developers joining the teams (see: http://simula.no/research/se/publications/Simula.se.757/simula_pdf_file ). In these situations, ‘simple’ refactorings would never do.. you would probably need a combo including dependency analysis and design recovery just to see ‘what is happening’ and part from there. Another way to ‘prioritise’ areas/modules is to do repository mining and look into revision numbers, churn size and commits associated with defects.

    [Svar]

  • Lars Reed

    Lars Reed sier:

    Some of out foreign readers wonder about the name fagblogg. This is a Norwegian term, and basically means that here we blog about «fag» — our «main subject» / «professonial interests» (as opposed to the company’s social activities, recruitment activities etc).

    [Svar]

    gumaflux Reply:

    Phew thanks for clarifying that, I almost thought..

    [Svar]

  • Enlaces semana 38 | Dev attitude sier:

    [...] How bad is smelly code?: Donde se resumen las conclusiones de un estudio empírico sobre code smells y como afectan a la mantenibilidad del código. Esta bien no fiarse de la intuición y demostrar con datos que las malas prácticas en código son, en efecto, malas porque implican problemas a la larga. [...]

  • Top Posts of KW #38 – kushellig sier:

    [...] “How bad is smelly code?” by Aiko Yamashita [...]

  • Ruslan Shestopalyuk sier:

    A particularly interesting finding here is about «data clumps», in my opinion.

    I think there is an explanation for the phenomena:
    • “Semantically unrelated” is a red herring here. The “semantics” are supposed to be flexible: its them which are bending to the domain/object model at hand, not the other way around.
    • People have been having experience with such “data clumps” for ages: they are called “database tables”, and they fit the programmers’ mental models quite well.
    • And, in my opinion, the most important thing: more than anything else it has to do with *mutable state and side effects* and how implicit/explicit they are.

    Somewhat counter intuitively, the “clumps” of mutable state often make the program flow easier to comprehend, as opposed to the same state thinly spread over a million of objects, with implicit/hidden dependencies between data.

    That does not mean, of course, that we all should go back to the procedural programming (even though there is nothing wrong with it, per se).
    But neither is good the another extreme, with overly formal/dogmatic application of OOP/OOD principles causing absurd problem decomposition, that makes the program very hard to reason about.

    I think this “surprising” finding reveals a “meta-smell” in the conventional way of approaching programming as a human activity.

    At the present point of its evolution, anyway.

    [Svar]

  • Technical Debt and Actionable Advice | Cloudspace | Blog sier:

    [...] a great writeup by Aiko Yamashita and Thomas Flemming, they dissect a few popular issues that devel…, and I think this is exactly what everyone [...]