For this fifth article in our series on how business agility can improve your bottom line, we look at how business agility aids building resilient systems. Here business agility practices help by aiding us in looking at risk management for the entire system. Not only ensuring the systems we build respond even when under stress, but helping us build safety into the system.
Check out the previous articles in the series here:
Risk management in organizations is often the function of separate governance groups. These groups include areas like legal and compliance, security, architecture, data privacy, and more. Collectively we can think of these groups as being responsible for the safety of the organization. However, defining a policy and enforcing it across the organization does not scale well. By necessity, we end up with the lowest common denominator of policy being applied to everyone. This results in:
High-level policies that can be difficult to translate into how they apply in your particular context
Teams in creating and modifying products being delayed explaining to the governance areas what they are doing, often too late in the lifecycle causing additional costs to rectify issues
Risks going unnoticed as the cadence for releases doesn’t match the cadence at which the safety areas can respond, especially when there are risks that cross safety domains and multiple safety areas that rarely talk need to be engaged
This is made even more complex as the rate of regulatory change accelerates in response to world events and introduction of new technologies. The idea that we only need to manage risk at the edge of our networks is long gone, requiring new strategies to manage our environments.
Business agility can help here by looking beyond technology into how we can break down the silos between safety areas and align them more closely with the organizational value streams. One approach to tackling this is Lean Control, described in chapter 6 of the SSH book and presented here.
Resilient systems are ones that work well even under stress and are built to handle disruptions that can happen at any moment. This protects your bottom line because they make sure that at the end of the day, your business is still available.
A common model used in the technology space is that an organization's systems are made up of three elements: people, processes, and technology (derived from Leavitt’s 4 element alignment model created in the 1960’s). To achieve operational efficiency, leadership needs to achieve a balance, ensuring these three areas work well together.
The model is also a useful way to consider how to approach building resilient systems.
From a people perspective, a resilient system is one without a single point of failure (SPOF). In a team, this is the single person that everyone relies on to perform well. This is because should that person be unable to do his job, the rest of the team falls apart in his absence.
Resilient processes are the ones that are well-defined and can respond to changes. Having these kinds of processes ensure that things will run smoothly, even in the event of disruptions. This was a problem that every company had to face in 2020 when the pandemic forced people to work from home. Processes that relied on everyone being at the same place at the same time, like in-person meetings, were made obsolete, forcing everyone to adapt to more video calls and remote working environments.
In terms of technology, they should be capable of continuing to run and respond to pressure, even in a degraded state.
A popular practice that has come to prominence over the past few years is Google’s Site Reliability Engineering (SRE). The practice, originating in 2003 but made popular in the 2016 book, is a useful guide to help improvement across these three areas.
Creating a model to help facilitate the conversation between safety areas and delivery teams can be a good way to accelerate learning. TACO is the acronym we use to help clients map controls from software delivery pipelines to the organization. TACO stands for Traceability, Access, Compliance, and Operations.
Traceability: Identify what happened in the pipeline
Access: Secure access to the pipeline and the access of what is built in the pipeline
Compliance: Validate the payload in the pipeline for quality, security and non-functional requirements
Operations: Record and monitor the deployment and subsequently running system
The resulting process has several goals:
It creates a mnemonic delivery teams in the organization can use as a reference to think about whether their delivery process is properly encompassing the governance needs.
It provides a guide to help think through whether the controls being put into place satisfy the purpose of the control objectives.
It provides a frame for comparing SDLC governance of pipelines against to see what steps we need to take to move to continuous compliance.
Systems that are adaptable are also much more resilient to threats and stress. Agile organizations make their systems more adaptable by shifting the learning process. With a business agility mindset, learning isn’t just done at a regular cadence but is instead habitual and constant.
Failure becomes an opportunity for learning, building the organizational system back stronger through open, blameless conversation about what we can learn from the failure. This requires continual attention from leadership at all levels to ensure the right behaviors are being exhibited. Consider how you might measure whether those behaviors are being exhibited in your own organization.
Managing risk is essential to protecting your bottom line, and building resilient systems within organizations focused on continual learning is a good approach. It isn’t easy, and takes focused effort to not only introduce the new practices but make them stick. All of this can be considered adopting business agility.
If you want to learn more about resilient systems, be sure to give this episode of the Definitely, Maybe Agile Podcast a listen. Or reach out for a conversation, always happy to discuss how organizations are approaching this large and complex topic.