In any company, it is the responsibility of the IT team to provide business continuity of IT operations, network, infrastructure, and software applications. To achieve this, it is up to the IT executives to decide if these responsibilities will be taken care of in house, outsourced, or some hybrid model. An example of a hybrid model is a dedicated team performing operation monitoring during business days and a vendor covering the night and weekend shifts. Below, I discuss some frequent scenarios of how startups, small, and medium size businesses manage IT operations.
Practice 1: The Development Team Is Responsible For 24/7 System Uptime
This is probably the most common scenario found in technology startups or small businesses where the development team creates custom apps for back office business needs. There are different factors that could decide if this would be the best approach for an IT team. The success of the outcome is very much determined by the process maturity and philosophy of the members. If the senior employees have a highly technical background, this may be the best and only approach to take, like the SRE practice at Google. However, often teams do not have this experience.
In many cases, I come across businesses where the development staff is reluctant to be available off hours for on-call duty. Typically, when a service downtime is reported, it is difficult to find the person on-call to fix the problem in a timely manner.
Although this practice appears to be the most cost-effective one since the same development team is responsible for the 24/7 monitoring and service uptime, it usually provides subpar results because system disruptions last for extended periods, impacting the business economically and in reputation.
Practice 2: DevOps Owns The Responsibility
Organizations who embrace digital transformation trends often have a DevOps team, practice, or some members who are responsible for development operation (DevOps) tasks. DevOps is a set of practices and automation that helps to shorten the development life and provides continuous software delivery with high quality. In this scenario, DevOps members are responsible for monitoring the system uptime and addressing any downtime. In my experience, with this model, monitoring is focused mainly on the end consumer applications.
This model tends to work well since the team has an operational mindset and creates automated mechanisms to monitor, alert, and compensate service degradation and disruption. This automation facilitates prevention measures instead of reactive procedures. Also, the more mature the team is in processes, the better they are in applying best practices and achieving efficiency.
The common challenge in this practice is the number of resources that are needed to provide 24/7 support. With full-time employees or in a staff augmentation model, you need at least two people to cover nightly and weekend shifts. DevOps is a very sophisticated job skill, that is in high demand, and therefore the resource costs are usually high. In average, a team requires three DevOps engineers to cover a 24/7 calendar. Another problem is the team’s lack of expertise, especially when the team members are more junior or inexperienced in IT operations. Evolving skills for system and application architecture, automation, fault-tolerance, and resiliency can take years for professionals to develop.
Practice 3: There Is An IT Operations Team
This practice can be found mostly in medium and large businesses. Having a fully dedicated IT Operations team working 24/7 is a considerable investment. Large and global organizations generally take advantage of different office locations distributed in different time zones, this helps to cover a 24/7 operational support with a reduced overhead of shift rotation or being on-call. Having a dedicated IT Operations team is a costly option for the SMB.
In this model, organizations have the structure of a NOC with some APM tools to extend monitoring to the application level. One frequent challenge for IT Operations is the different toolsets that are used to cover all the technologies. Security operations and automated scalability are other pain points that most IT Operations teams face. Not all organizations can afford employees with this kind of specialized knowledge that is essential to proactively prevent disruptions caused by peaks of workloads or malicious activity.
Practice 4: IT Operations Are Outsourced
Outsourcing is a good strategy when you want to delegate the responsibilities and risk to a third-party. Risk is mitigated via contractual terms and organizations can focus on their primary business operations alleviating the IT operational management and costs.
Companies usually see the outsourcing option as a cost-effective, the challenge is the selection of a good vendor. Changing vendors who have a deep knowledge of your IT infrastructure is not a trivial task, this often requires a big effort in transition planning and execution. On the other hand, outsourcing is not an optimal option for businesses with rigid regulatory requirements or secrecy because confidential data would be provided to a third-party vendor, therefore putting the security of the company at risk.
Practice 5: Managed Service Provider
Managed Services is the practice of outsourcing the responsibility of maintaining and anticipating the need of functions and resources to improve operations and cut expenses. This is usually contracted in a subscription model, where the client and the MSP define contractually service level agreements (SLAs) that indicate the quality of metrics of their relationship.
This delivery model is ideal for organizations that look beyond traditional outsourcing criteria to get long-term benefits expected from the solutions provider. For the SMB, the Managed Services model allows to outsource management, operations, and delivery of processes effectively and reduces their costs.