In the last couple of years, project complexity has slowly (and recently not so slowly) risen to a level where the previous ways of dealing with it seem no longer effective. In this first part, I will share some of the reasons why I think complexity is here to stay and also why I think it will continue to raise the bar on what acceptable software means.
Complexity is primarily linked to Moore＊s law and to the incredible growth of computing power available today. This has allowed software systems to start tackling problems of increasing complexity with great ease, despite the programming paradigm having evolved at a significantly slower pace. Even with the advent of multicore processing, we have just recently started to feel the jump in software complexity when talking about concurrent programming.
Since Moore＊s law has hit its plateau and it looks like no one is willing to invest in giving single cores more processing power (if not even stripping them of the power they have) we are now faced with the dilemma of how we can easily scale our applications. And it looks like there is no easy answer to this question, no answer that we can adopt while at the same time maintaining our current programming style and paradigms (mostly suited to single core applications).
The increases in processing power from this past decade have also raised the bar on what the business expects applications to do. And one of the most interesting changes we＊ve seen is that the software needs to be a lot more malleable and amendable to a fast pace of changes. This is probably the main driving force behind Agile and the push for tighter feedback loops.
So, the reason for complexity is twofold. First, the business needs faster and faster response times to requirement changes, an aspect illustrated by Agile methodologies＊ having taken over pretty much all projects and software shops. I have not heard of anyone brave enough to attempt a waterfall methodology on a complex project these days. On the other hand, we have to deal with massively concurrent applications which add a lot to the complexity of the craft of software development and we can see this in the resurgence of functional programming paradigms in most (if not all) of the mainstream languages.
Understanding why old approaches seem to fail miserably when applied to very dynamic projects requires a better understanding of what complexity is and how it can be classified. One such model is the one employed by the Cynefin Framework. They classify complexity in 4 domains, each of them with its own ways of managing it. It is worth noticing though that what works for a complexity domain does not necessarily work for another.
The idea behind these complexity domains is that complexity should be driven down from complex to complicated and maybe even simple domains. But the way you turn a complex problem into a complicated one is different from how you change a complicated problem into a simple one. In the end, it is all about constraints. Constraining a complex domain yields a complicated one and, if you constrain it further, it yields a simple domain.
This can also happen the other way around (due to a chaotic event, or a change in specifications, direction or all of the above), in which case you should watch for the signs that allow you to properly assess the complexity domain you are dealing with and treat it accordingly.
Following, there is a short description of the complexity domains as defined by the Cynefin Framework.
Simple contexts are characterised by the fact that the correct answer to the problem is obvious. Some examples of this might be writing a Java Bean, or something that can only be done in one way. This is the area of Best Practices, where the causality of an issue is clearly understood.
At this level, the way to engage in solving a problem is by first assessing the situation, categorising it and then responding to it in a preset manner.
As far as programming goes, these can be easy tasks almost anyone who can follow a checklist can do. As a sample application, think of the type of code that anyone can write (even children or non-programmers).
The complicated domain is a bit less restrictive than the simple domain and allows for alternative solutions to a problem. This is the domain of expert knowledge. Since you can have more than one good answer here, this is the domain of Good Practices. There is a causal link as well, which is, however, quite obscured by the multitude of possible solutions and definitely not as clear as in a simple domain.
Here we first assess the situation, then (since it is not obvious which solution is best) domain experts must do an analysis and then, we respond to it by implementing the solution agreed upon.
As far as programming tasks go, we can place here repetitive work that still requires some analysis of the situation. A nice example would be designing a CRUD application. This requires knowing a couple of frameworks, databases, etc, but there are a lot of established Good Practices that one can follow.
At this level, the requirements are fairly stable and one can still get away with a waterfall approach. For example, a complicated task would be one that involves writing device drivers. They are quite complicated, but there are a lot of good practices that can guide you to a good solution. The specifications will not change too much since they are tied to the hardware interface.
The complex domain is what we have started to see a lot more lately. At this level what we get are fast changing requirements as an answer to external pressure. Any change to the specification will cause a feature to be developed, and after implementing it, that feature may change the requirements again (after analysing the impact it has on the user base or other components of the application). This is the domain of emerging design, and we can only talk about causality in hindsight.
At this point, the waterfall approach will stop working due to very large feedback loops, and Agile methodologies start to get a lot of traction. Everything is geared to support fast changing requirements.
The biggest problem with complex contexts is that we are just now beginning to transition to them and we tend to consider them as being just complicated. A good sign that this happens is trying to enforce a specification document, the emergence of a lot of rules and regulations trying to control the apparent chaos, frustration caused by not understanding how the systems are expected to evolve and not having a handle on things. Despite all these, complex systems require exploratory drilling, and you need to treat them as such.
In complex domains you need to encourage exploratory drilling. This means that you need to setup your project, application and process to allow for multiple cheap failures while attempting to get to the desired end result. At this point, we are talking about launching an experiment, assessing the results following the experiment and then, if we consider them to be satisfying, integrating the solution into the application.
To achieve this, you need a couple of behaviours implemented:
The chaotic domain comprises exceptional circumstances. This is a fairly rare occurrence and can be extremely dangerous for your application and business. For example, a huge security bug that was found in your running system can cause chaos in the system.
Under such circumstances, the first thing you should do is control the damage. This means to shutdown the servers, pull the network cable, call your lawyers, whatever you can do to minimise the impact of the incident.
As a follow-up to a chaotic context switch (if the company survives it), process improvement and innovation can come without opposition, since everyone can accept change easily to prevent such events from happening again.
At the time of their inception, the Agile practices were born precisely to alleviate some of the problems that the complicated and, to a lesser extent, the complex domains raised. At that time, the preferred methodology would have been something akin to waterfall, which was beginning to fail due to the raise in complexity of the developed software (if it ever worked at all).
The main problem with waterfall was the ＆specification first＊ approach to software development and the fairly long release cycles. While these can surely work for both simple and complicated domains, the large feedback loop would prevent client involvement and lead to major rewrites of software from version to version.
The waterfall method was (and still is) a very comfortable and intuitive approach that creates a false sense of security since everyone goes about their jobs, performing admirably, according to specifications. However, when the deadline comes and the software was not exactly what the customer dreamed about, everyone has an excuse. So, this is pretty much why this process is comfortable. You are not accountable for failure. No one is.
As opposed to the Waterfall methodology, the Agile approach assumes time boxed incremental changes which match the theory much better with the problems in the complex domain and even in the complicated domain, where the tight feedback loop means a more accurate image of the client’s expectations regarding the solution.
This tight feedback loop and the idea of non-final, changeable requirements are the main reason why Agile has flourished both in complex and in complicated domains. In the complex context, you cannot know the solution to the problem before solving it. All you can do is to apply incremental changes, assess their outcomes and keep them or throw them away. Your solution will emerge from all these experiments at some point. This is basically the reason why Agile was such an overwhelming success initially.
Early success with Agile has however bred complacency and people thought the process was an easy one. For simple and complicated domains it obviously is, because in these contexts we can rely on slow changing requirements and somewhat clear cause and effect connections.
This complacency created a context which turned Agile into a set of good practices, and everyone relaxed because now they understood agile. You just need to have stand ups and the sprint retrospective and we can forget everything about the forces that drove to the creation of Agile.
Incidentally, the same happened with TDD. Due to the simple or complicated nature of the projects TDD was reduced to the requirement of having a set of unit tests for your code to give an acceptable degree of assurance as to the correctness of the code. And it got really easy. You just need to have (close to) 100% coverage. No need to think too much about the way you structure the production or testing code.
And this felt professional and all was good with the world. But then change happened.
You can see this change happening because a lot of people have started saying that Agile and TDD and all those nice little practices that made us feel very professional are failing badly. So what is happening?
Software started to need to solve complex problems and for the first time we had the hardware that can do it. The downside is we lacked a clear understanding of the complexity model behind our requirements and assumed that we can do more of the stuff that we did for complicated problems and it would just work. But ’ it could have never worked.
Complex problems are very different in nature from complicated problems due the change in the causality chain. We can only see the causality in complex problems in hindsight. While this seems clear now, it did not seem so obvious back then. It was like trying to touch the rainbow. You fail one project and you learn your lesson, establish a new good practice that would prevent the same cause and effect chain to happen and then you try again. Unfortunately, the nature of complex problems would not in the least guarantee that such approach would work, causing endless frustration.
Amusingly though, the Agile and TDD practices could have been used to help solve complex problems in their original, less institutionalised form, if the forces that make them work had been clearly understood. The current domesticated Agile and TDD rule books started to fail badly. And there is no question as to why.
Let’s just analyse what the main beef people seem to have with it. If you have full coverage using unit tests of your software, then changing software behaviour will cause tests to fail. How many would fail? It depends on the change and on the battery of tests you have written. Note that in simple and complicated domains unit tests are good, since those domains are almost immune to change and solidifying your code base IS a good idea (though these tests should be viewed as more of a deterrent of change, rather than an enforcer of correctness).
When the software is trying to solve a complex problem, especially one whose solution is not really clear at this point, that code needs to change a lot. It needs to make failure and experimentation cheap. Why would anyone use unit tests here? The only sensible answer is: because of best practices. Which obviously don’t apply to complex domains.
TDD was not initially about unit testing. The emphasis, especially for a new born software, was first on functional testing (which in my book is behavioural testing). These tests are magic in the sense that they do not test units of code but rather behaviours across the units of code. I wonder how this would work for complex systems.
Setting up functional tests provides precisely the barriers that you need when developing software in the complex domain. Tests are sometimes called executable specifications.
Functional tests do not impede changes to the code base, but rather they encourage them. They make sure that your code respects the functionality that was agreed upon so far (ease of software change was on of the goals of TDD which now seems to have been long forgotten)
The test first approach of TDD provides an initial structure (barrier) to the software feature that you are trying to implement. The emergent nature of the Red / Green / Refactor cycles fits like a glove the “emerging practices” of solving complex problems.
The Refactor part of the TDD cycle is what usually gets postponed in most teams, hence we have ‘consolidation sprints’. This is a mistake. Your code should always be top shape and properly abstracted in complex scenarios. How can you make cheap experiments if your code is like a bowl of spaghetti?
It surely does look like TDD would be a perfect fit for our complex domain. But you cannot treat TDD like a rule book and use it as such and expect results in this domain. You need to practice TDD while being aware of the forces that led to its creation and of what your testing strategy wants to achieve. You should always have a testing strategy too.
Furthermore, it is important to start loving the Red / Green and especially Refactor phases. This is the only reason why code will be easier to compose. It can never solve all the problems; if there is a major architectural change it will take time to implement, but with consistent refactoring; if there is a pattern to the changes or when that pattern emerges, your code will precisely mirror it and doing cheap exploratory coding will get a lot easier.
From the very beginning, Agile practices were based on tested and testable code. If you have an agile process and untested code, then you are probably not very agile (see flaccid scrum), the project is rather new or is in the simple (or maybe complicated) domain.
One of the recent complaints about agile practices sounded like ＆programmers should write code, not waste their time in senseless standup meetings＊. This assumes that you apply the process to a simple or complicated problem, since writing code assumes that you know exactly what that code should do.
This is not the case of the complex domain, which requires cheap failure and has the specification changing according to the success or failure of various experiments. A complex domain also requires you to structure your code in reusable pieces. How can you know that you do not write the same kind of functionality as one of your team mates, if you don＊t talk to each other?
In complex domains, communication should be strongly encouraged; getting the team together daily to communicate during a 20 minute standup rather than having them interrupt each other at random intervals throughout the day is simply an optimisation.
If standup meetings are held just because your practice requires it and you have a total disregard of the problem they try to solve, then yes, they are wasted time.
Another fairly good point about why Agile fails complex domains is the sprint planning session when you need to estimate the time that it takes to implement a feature. Since the nature of complex domains is experimenting with code, this seems to be a bit counter intuitive (you can＊t accurately make predictions about features in this context). However, I think this is really not the point in practice. I rather believe sprint planning should have a somewhat accurate idea of where we are and where we plan to go next, and it does achieve that wonderfully.
We can no longer afford to believe in the magic of sacred words like TDD or Agile. As the complexity of software grows, the whole software development team has to understand the forces that gave birth to TDD and Agile and how to apply them accordingly and with constant scrutiny.
We have a model for complexity that explains why this is true and why we won＊t be able to replicate our past successes in this current world, using the same tricks. Is this model accurate or not? We do not know very well yet, but the way that it has been received and used shows some interesting correlations, at the very least.
Complacency in your success creates all kinds of problems for you and the industry as a whole. Don＊t get too comfortable in your world and always try to learn something completely new, preferably something scary. Right now, complacency is about to get kicked out the door. If you want to join the new wave of programmers and programming paradigms, start learning now.
by Ovidiu Mățan
by Cristian Raț
by Mircea Vădan
by Călin Biriș