Command Query Responsibility Segregation

Andrei Păcurariu
Software arhitect
@Endava

PROGRAMMING

CQRS is an architectural pattern that recommends the separation between the command processing responsibility and query responsibility. Consequently, the pattern proposes that it is not necessary to have the same data store or even technology, for both read and write purposes.

The separate treatment of the two kinds of responsibilities should be implemented in two different layers of the application.

Rationale

The pattern recognizes that query and command processing are fundamentally different.

Command processing follows, sometimes, extremely complex business rules that dictate the set of valid combinations of data manipulations - for command processing, consistency constraints and integrity are of the essence.
Using a SQL database for storage, for command processing normalization is very important.
Command processing depends on the transactional consistency of data.
Command processing can be asynchronous - which is sometimes a great thing to achieve.
Query requires no integrity constraints and therefore no normalization, so de-normalized databases are a great choice since queries are faster.
Query requires complex filters and aggregations of data for the benefit of the user interface.

Taking into account the statements above, it makes sense to consider using two different models, one for enforcing business rules and one for presenting information.

An important thing to note is that it is NOT forbidden for the command processing layer to read data. It is almost always required to read data in order to implement the command layer! What is important, however, is to recognize that the read requirements of the command processing layer are fundamentally different than those of the query layer. For instance, reads performed by the command layer are generally against the primary key of the entities and the read logic itself is hard coded in the command processing layer. Instead, the query layer needs to provide a flexible filter mechanism to allow users to find the data they need.

As an implementation hint, using SQL databases for storage, indexes for the command processing database are probably completely different than indexes used for the query database. Furthermore, the command processing DB has no need of historic data which can be safely deleted from the DB, in order to improve performance. On the other hand, the query DB will probably maintain historic data, at least for a reasonable amount of time.

To go further down this road, the query database itself will probably not maintain all the historic data either, but only commonly used historic data. What is commonly used historic data and how old it might get depends entirely on the application, but truly old historic data, the kind only ever likely to appear in reports, would probably be best stored in a reporting database which is only seldom used.

Going back to what CQRS forbids or not, what IS instead forbidden is for the query layer to perform any data modifications (write access) altogether! Remember the de-normalized database and the assumption that the query layer does not depend on integrity constraints and transactional consistency.

Another argument behind the pattern is the observation that the read frequency is not equal to the write frequency. Users usually issue a lot of read requests per each write command, if any. Therefore, it makes sense that the read side of the application might need to scale a lot more than the write side.

Furthermore, there is no real need to present data from the same database used for writing. This is because presented data does not necessarily need to be up to date instantly, and it usually isn"t. Even though the current data is read from the database, once presented on the screen, it is already old and potentially out of date.

High-load applications use caching anyway (but unfortunately as a performance improvement afterthought not as an architecture decision). So, we might as well accept the cache and the fact that the presented data might be stale and incorporate these facts into our architecture. Accepting this would allow for a lot of simplification and performance improvement as described in what follows.

Why to implement CQRS

Besides the benefit of being able to control scaling between read and write access there are other important benefits, as follows:

It allows the command processing side to be simple and specialized for business rule enforcement, without concerning about the UI needs.
It allows the query side to be simple and specialized for filtering and aggregations of data in the interest of UI, without constraining it to the model used for writing.
It allows the development of the UI and query side to be done in parallel with the development of the command processing side.
It allows the query and the command processing layers to evolve differently in time.
It allows the usage of specialized technologies for different layers. So, the command processing layer might use an ORM (Object-Relational Mapper), like Entity Framework, but the query layer does not necessarily need a full-fledged ORM - it might settle to a micro-ORM such as Dapper or no ORM at all. This is because the query layer does not need an identity map or change tracking and those features only reduce the query speed!
Because it forces the segregation of query and command processing responsibilities, the architecture is more easily migrated to multi-tier by moving, for example, the command processing layer to a service.
Because of this separation, it is also possible to have more senior developers work on the domain model of the command processing side, while having more junior developers work on the data acquisition for UI presentation (aka query layer) side.
Using CQRS will probably make an application more Azure friendly since it could use Table Storage for command processing but use SQL Database for the query layer. Table Storage is not very suitable for reporting type scenarios like UI, but is very useful (and cheap) for command processing. Generally, I reckon, you would suffice in using the partition key and row key that Table Storage provides for accessing data in the interest of command processing.

So, when not to use CQRS ?

Well, probably you wouldn"t want to use it when implementing very simple applications or applications where the write model is similar to the read model, or easily adaptable. Furthermore, and more importantly, you don"t expect to have changes in the future that increase the complexity of the business layer to a degree that maintaining both models (read and write) in the same classes becomes unwieldy - although, practically, how would you be able to predict this in an agile environment?

Nonetheless, except extremely simple, low on business layer complexity, slow evolving (or not evolving at all) applications it might make sense to plan ahead and separate the two models and two responsibilities even though not using two databases. This is a strategy that I have used myself - implementing two models: a full-fledged domain model for the command processing side, and a read model (basically a view-model in MVC parlance) for the query side, but using the same DB. I did this because I felt that CQRS is more about the separation of responsibilities of command processing and query rather than the separation of the physical data stores. The query layer would use views while the command processing layer would use tables. I got all the benefits of CQRS except differential scaling for read and write but also without the effort of synchronizing the query DBs with the command processing DB. If scaling would be required, I can easily create a query DB with tables identical to the currently implemented views and just add an ETL (Extract, Transform and Load) mechanism to sync with the command processing DB. Now, I must highlight that moving from views in the same DB to tables in a different DB kept in sync via asynchronous ETL would mean that the application can no longer assume that data changes are synchronously available with command processing!

Fig. 2 - Our custom implementation of CQRS

How does CQRS fit with agile practices

In order for agile practices to be successfully applied in the development of a product, it is not only a matter of successful application of the chosen agile process (SCRUM, Kanban, etc.) but it is also necessary for the architecture to allow the product to evolve - sometimes quite unexpectedly!

Having a rigid architecture, high coupling, low cohesion or a great degree of complexity contribute to increasing what I call application inertia. Like in physics, inertia is the application"s resistance to change. It is obvious that this would defeat any agile process attempt at the development level, thus making only a superficial application of agile principles.

Having said these, CQRS helps a great deal and this alone may justify its usage.

CQRS helps by allowing us to keep two disconnected models in the same application:

A domain model that enforces the business rules
A read model that is optimized for what the application actually needs to show on the UI

The great thing is that these two models, each handled by its corresponding layer (the command processing layer and the query layer respectively) can evolve independently and have no connection to one another. Therefore, we have no coupling between them and each is interested only by its specific purpose, thus having a high cohesion. Low coupling and high cohesion are paramount for the application to be able to evolve with agility.

So, if any new requirement in business rules should come, only the domain model in the command processing layer will probably be affected. Similarly, if any new requirement in presentation should come, only the read model in the query layer will probably be affected. But, even if new requirements that need to touch both models in both layers should come, each layer evolves independently to handle the new requirements. This is still better than having one super complex model with both responsibilities that needs to be changed!

Furthermore, agility is not only about handling requirements on an existing product but is also about the way we go about developing a new product. Well, CQRS helps us here as well since it allows us to easily mock the command processing side but still enabling the creation of the read model with a mock query layer to support it. The application can easily add features on that framework as requirements unfold.

Lastly, the ability to easily mock parts of the application is a prerequisite for successful coverage of unit testing. This allows for TDD (Test-Driven Development) or BDD (Behavior-Driven Development) which provide a good (if not mandatory) foundation for any agile development approach.

What about security

I am happy to be able to wrap this up quickly and just say that security is not directly the responsibility of either layer - query or command processing!

Security with regards to displayed information can generally be implemented as a series of imposed filters. But, the decision to impose certain filters on data is not made in the query layer itself but rather in a layer above.

In terms of security of what data manipulations can be performed by a given actor, this appears more of an issue of use-cases, since the actor will fall into one of the predefined application roles. Therefore, again, it is not the responsibility of the command layer to enforce security but just to execute commands. Validating whether a given role should be able to execute a given command, dependent on the role, the command type and its parameters, is up to a higher layer, above the command layer.

So, it appears that the decision to use CQRS does not impact security concerns very much.

Risks

Like any new pattern, concept or technology one encounters, there are risks in applying it, more so if the pattern implies a paradigm shift. In this case the fact that the read database might not be the same as the write database and the fact that we now have 2 models, one for query and one for command processing, is a paradigm shift that may take some time to apply properly.

Also, the fact that the changed data is not immediately available once the command has been executed will make the application a bit more complex since it needs to handle this.

Conclusions

CQRS is a powerful and smart pattern for the reasons stated above that I will summarize again below:

It allows different scalabilities for the read and the write sides
It is based on sound observations that show that reads are a lot more frequent than writes.
It allows the parallelization of development effort for the two layers.
It allows a correct and pure domain model on the command processing side to be maintained unaffected by query requirements (which might change often during and after the development of an application).
It allows command processing and query to use different, fit, technologies, thus potentially reducing complexity, development time and increasing performance.
It reduces application complexity by clearly separating responsibilities at a macro level.
It fits very well with, and under certain circumstances even enables the application of, domain-driven design (which is a very important aspect in its own right).
Allows for even more different persistence mechanisms such as event sourcing (which is the most appropriate in some cases).

Applying CQRS is most beneficial when the application is complex and might need to scale out. Nonetheless, CQRS has its advantages regardless of scalability, as illustrated above.

I recommend that the application specifics are carefully analyzed prior to committing to the application of CQRS but I feel that generally it might be a good idea!