My first shot at describing a common failure in the development of IT systems: “how?” is evaluated before “why?” and “what?” are understood.
I see the story over and over again: A business unit comes to IT with a business problem to solve. Architects are brought in to design a solution. PowerPoints with logos are delivered within weeks.
This is a theme across applied technology and I cannot think of an Enterprise that I have not seen fall victim to it.
Analogy time!
Let’s go car analogy. This is akin to the business saying they want to do move things from one place to another. The architect decides it needs to be made with Ford F150 Lightnings because EVs are the future. The business likes a green solution and trusts that IT is leveraging its expertise on implementation. But 9 months in, IT fully understands it’s hauling high volumes of stone from a quarry, not just the bed fulls of gravel they initially thought. But the decision was made and 2 years of engineering goes into managing the workflow, massive fleet, and supply chain for the little pickup trucks to handle the throughput.
The choice of the technology dictated well into the business where it should have served it. If dump trucks had been used, the value from the quarry would have been realized years in advance and existing industrial patterns of fleet and supply chain management could have been copy/pasted.
It’s really 2 fold: First the technology is chosen before the use case is understood, then the sunk costs are honored on the implementation.
Ancient Philosophy Time
Aristotelian Causes
I’m generally not a fan of Aristotle’s causes, particularly the “Final Cause” because of it’s implications that nature has a purpose. But when we talk about the creations of humans, it provides a valuable structure.
But let’s walk through them:
- Material cause: the substance of a thing. What it is made of.
- Efficient cause: what brings a thing into existence
- Formal cause: the essence of a thing: its functional structure
- Final Cause: the end or goal of a thing
The classic example is a wooden table.
- Material cause: wood
- Efficient cause: carpenter
- Formal cause: legs and a top
- Final cause: to hold objects in a location above the ground
If any of these are not met, then it is not a wooden table.
The Problem of Imagination
Too often I see the Material and Formal Causes developed first. The plan starts like this:
- Material Cause: Kotlin, Kafka, FaaS, Kubernetes, DynamoDB
- Efficient cause: uh, some team
- Formal cause: an event driven architecture
- Final cause: Corporate Initiative Acronym
The material and formal causes come from what is seen as “modern.” . It’s how things “should be made.” . It’s fast and efficient. The cool modern business initiative needs cool modern technology.
It’s alluring to start this way. Thinking about technology is fun. Learning new technology is fun. Having built something with cool tech looks impressive.
But now imagine that the Corporate Initiative requires a system of record. The clients of this system need to be able to access the data from a variety of dimensions. Microservices to support these multiply, and data is duplicated in a mess where constant syncing is needed and there are no proper authoritative sources.
The staffing at the company has no experience building with or supporting this technology stack. They are fully dedicated to supporting an older stack of virtual machines, JMS messaging, and traditional relational databases.
The system is delivered late, it continually has reliability issues, and the design choices end up with an unnecessarily complex system to deliver the business value required.
When the issues become apparent, the solution “fails forward.” Rather than reworking to address what is understood later, the Mythical Man Month problem trudges on. Sunk costs are honored, and ultimately more time is wasted forcing the solution to avoid “rework.” I’ve honestly never seen the effort of rework honestly compared to the effort or pushing forward.
Proper Process
Before any technical thought goes into an effort the business case needs to be well understood. And this is a common disconnect. Business speaks business, IT speaks tech. The business people describe the system, but assume things. For decades, they’ve been able to come back and ask for random reports or new uses. Without digging deeper into what they see as future requirements, it’s easy to box yourself out of being able to support them.
A common solution to this is to force the business into defining all of their requirements. If this is internal development and not a contract, this is ineffective. It only serves to protect development from blame, not service the success of the business.
The only answer is to ask questions and listen. Development must understand the business they are serving, and better, really know the people communicating the demand. Sympathy and culture are better than process, but unfortunately difficult to quantify and measure.
The people creating and supporting the solution also need to be considered. It is a rare case where an entirely new staff can be hired with new skills. And new technology can be lacking in day 2 operational tools and practices.
Every component in a system will fail. So what does that look like? How do you monitor so you can proactively intervene? Does your operational tool set support that or will it require additional expense and support of that tooling?
You’re Boring
The natural result here is that boring technology is usually the most appropriate solution. Tried and true tooling and operations.
That doesn’t mean that new technology can’t or shouldn’t be introduced.
Make sure the technology is reasonably mature. Too often the new hotness fizzles out and becomes a joke in 2 years. Also there needs to be an ecosystem of day 2 support around the technology that’s well established. Basically: Can you Google it and find consistent answers?
Ensure you can dedicate human resources to learn and support.
Then take a use case with lower SLOs/SLAs. Ideally these are systems that are not in the critical path of business operations. Examples include analytical/reporting workloads, internal systems that can handle being down for sufficient durations, etc.
Do not spend months solving for every possible problem you can imagine in an artificial PoC environment. The same principles behind Extreme Programming/Agile need to be applied. Get the tech stack where it can be stable, handle common failures, be monitored, and have a recovery mechanism. Do enough of a stress test to ensure it can handle a “normal” workload and push past that. You should capture what failure looks like in a destructive event and when overwhelmed so that monitoring is effective. That is the MVP.
Then pilot the tech stack with one of those identified workloads. Every team involved needs to have a culture of curiosity and cooperation in order to be successful. The lessons learned in real life need to be applied back to architecture, code, and operations until it hums. Then the stack can be introduced into more and more critical use cases, given success.
Large business initiatives and highly critical workloads are not the place to experiment. Using it in these situations is gambling the business.
TLDR
Technology has no value in itself. Its only value is the enablement it delivers. Using cool tech for cool tech’s sake is not serving the business. It’s a selfish decision for personal gratification.