Cattle instead of pets – end of carefully crafted software?

We use an externally developed software running as a service in containers. After a few days the service started to be non-responsive and had to be re-started. We learned that the host ran out of memory. In our project meeting nobody proposed the that memory leak in the software should be fixed. This was awakening: we did not care about an obvious software bug. 

We had an interesting case in one of our research projects lately. I plan to use this as an example in future teaching.  We use an externally developed software running as a service in containers. After a few days the service started to be non-responsive and had to be re-started. By using a monitoring tool we learned that the host ran out of memory. In other words, we discovered that the software has a memory leak. The solution was simple, the containers are regularly restarted. Since, we had five containers orchestrated with Docker Swarm, we could just restart then one after another without causing an externally visible break. After a few weeks the memory analysis tool we were able to see that the growth of memory consumption was not an issue anymore.  This is also shown the picture below (generated by NMON Visualizer tool https://nmonvisualizer.github.io/nmonvisualizer/index.html)

In our project meeting nobody proposed the that memory leak in the software should be fixed. This was awakening: we did not care about an obvious software bug.  We considered the problem solved and remembered the well-known “pets and cattle metaphor”.  This metaphor compares traditional software development to pets. People take good care of their pets, while in saving the life of one animal in cattle is purely an economical decision. In cloud systems, developers think individual instances of running software as cattle.

Later, this discussion made me to think about new trends on software, teaching and research.  For example:

  • In the past HW designers used redundancy to increase reliability, but that was not popular in software. Is this now changing, and can we learn something from the HW people here?
  • Traditional computer scientist would have concentrated on fixing and avoiding memory leaks while software engineer thinks about the most economical solution. If there we extra resources, those would be used for creation of new customer value. Is this a good example to explain what software engineering is about?
  • This case underlines the importance of stateless architecture – if the service had maintained a session state, the solution would not have been possible. Web and cloud architectures do matter.

 

I wanted to initiate this discussion since

  • We are in a process of designing the new courses in Web and Cloud and we welcome all input. What kind of things the students should learn about the future cloud systems? If you have input or opinions contact kari.systa@tuni.fi.
  • This is also an important research topic, and if you want to collaborate in that contact (davide.taibi@tuni.fi) or me.

 

 

Text:  Kari Systä

Picture and software engineer to solve the problem: Ville Heikkilä

Author of the software: unknown