Keeping an eye on your IT Resources’ Health - wherever you are

1. Traditional Approach and Challenges involved

In the traditional approach, if any of the resources are faulty, an individual will call the service desk to log a ticket. The service desk representative collects the details, raises a ticket in the system, and assigns it to a support engineer. The support engineer checks ticket details and reaches (physically travels to) the place where the resource is present, identifies and corrects the problem.

Keeping an eye on your IT Resources’ Health - wherever you are

This process involves various challenges such as:

Lack of real-time health monitoring
Consolidated view of the health of all resources
Underutilization of staff and assets
More scope for manual errors
High infrastructure and operation costs
Resource-intensive work
Delays in identifying & rectifying defects
No scope for proactive / reactive maintenance

2. Different Approaches to conquer the challenges

To eliminate the redundant manual controls and reduce human error, which costs a significant amount of money, a system should exist to frequently monitor and ensure that the resources are present where they need to be, and are functioning efficiently and effectively. Such a control system can be built using two different approaches:

2.1 Agentless monitoring system.

Agentless monitoring works typically using what is already exposed by the vendor, without installing any extra piece of software in the client’s environment. For example SNMP (Simple Network Management Protocol) is used to monitor servers and network devices in an agentless manner and WMI (Windows Management Instrument) is used for monitoring many windows based servers and applications.

Agentless monitoring via WMI (Windows only servers) or VMware (VMware only services and instances) can allow for some level of customized scripting but agentless solutions do not provide the same level of expansion and integration that is possible with an agent-based solution.

Pros and Cons of Agentless monitoring

Pros	Cons
No installations needed Useful when large nodes are to be deployed System resources are not wasted Less maintenance	Cannot get the granular level metrics for monitoring, reports etc. Predictive maintenance is difficult Cannot be extended for custom server, service/application metrics collection

2.2 Agent-based monitoring system

Agent-based monitoring typically needs a small software component to be installed on the resource which is to be monitored. The agent installed collects the data and responds to the monitoring application. Using agent-based monitoring, granular metrics can be accessed for better monitoring, alerting, and reporting. For example, zabbix agent in the case of the Zabbix tool and various node exporters in the case of the Prometheus tool are used to collect metrics at the granular level and are customizable.

Agent installed eases the direct interaction with the client platform and services through which execution of the automated corrective actions can be done in case of proactive/reactive maintenance.

Pros and Cons of Agentless monitoring

Pros	Cons
More granular details can be collected for monitoring, reports Proactive/reactive maintenance can be done based on the data Less risk of downtime of the resources Customizations on the data to be collected is possible	Difficulty in deploying agents on all the nodes in case of large node deployments. Maintenance of the agents like version upgrades etc.

However, the decision on using agentless or agent-based monitoring depends on the actual need. In simple scenarios, where more granular metrics are not needed and there are large nodes to be deployed, agentless solution is suitable. But for large-scale, real-world production environments, agent-based monitoring is the way to get detailed, valuable visibility and analytics-based insights in an efficient fashion.

3. Various outlooks to identify the Resources

Based on the organization's size (large/medium/small), resources that are to be monitored might be in the thousands, and uploading details of these resources to the database manually is a tedious job and might cause errors. Here comes the need to automatically detect the resources to be monitored.

Identifying the resources that are to be monitored can be a one-time manual activity by uploading a data sheet to the database. There are tools/apps available to auto-discover the resources within a network based on the IP range/ subnet in case of agentless monitoring. With agent-based monitoring, the discovery of the applications, databases, microservices, etc. is also possible.

4. Different Metrics and KPIs that can be monitored

Different metrics collected from the resources combined with the thresholds configured can be grouped, organized to provide meaningful insights (KPI) of the organization's resources.

Some of the metrics that can be monitored:

4.1 Agentless:

Simple Health Checks like ICMP ping for checking the availability
System Metrics like CPU utilization, Memory utilization

4.2 Agent based:

Simple Health Checks like ICMP ping for checking the availability
System Metrics like CPU utilization, Memory utilization
Log Metrics to get Failure Details
JMX Metrics to get information regarding heap memory usage, number of active threads, number of loaded classes and CPU usage
Performance metrics like Request Count, Response Time
Custom metrics like Page visits, slow response pages, navigation errors etc.
ODBC Monitoring for checking specific database queues, usage statistics

Based on these metrics, different KPIs like resources that failed more times, the percentage of critical alerts, etc. can be visualized.

5. Seeing is believing (Graphical representation of the KPIs)

Several tools like Grafana and Prometheus are available that provide configurable dashboards/Reports, really easy on the eye, and can be integrated with monitoring applications as well as different types of time-series databases. Some of the tools also provide a specific query language to visualize the required KPI/metrics.

6. Know it even before it is failing (Alerting and data-based predictive analysis)

By configuring the meaningful thresholds, alerts can be triggered and the monitoring application can be integrated with the different notifications systems like slack, SMS notification, SMTP server for email alerts, etc.

Different types of forecasts can be done based on the time-series data stored by the monitoring applications. Using different techniques like cluster analysis, text analysis, and using different ML algorithms on the time series data, users can configure alerts based on remaining useful life (RUL), anomaly detection, failure detection, etc.

7. Industry trends in monitoring solutions

Since the growth of organizations is rapid, resource utilization is increasing. In this regard, monitoring solutions help organizations with easy detection of the availability of resources, downtimes, and empowers them to proactively detect future problems.

Below diagram shows the typical factors of resource monitoring:

There are multiple monitoring solutions available, which provide end-to-end execution and support in the market. They can either be hosted on public/private clouds or on premise.

8. Wipro’s Optima Remote Monitoring Services Solution

Optima RMS is an extensible and configurable monitoring solution, comprising multiple components for the ever-evolving enterprise ecosystem.

Optima RMS collects, normalizes, and monitors real-time data across enterprise resources and provides deep insights into the resource's health and performance through powerful visualization, analytics, and dashboards. It also provides downstream integration with a ticketing system like Service Desk and integrates with reporting tools like Grafana.

8.1 Monitoring Process flow - Optima

8.2 Solution architecture of Optima RMS

8.3 Benefits of the Optima RMS

Simplicity of the solution, easy adaption and configurations
Multiple deployment options, lighter deployment options, bulk onboarding options
Proactive monitoring
Deep insights into system health & performance
Cost and effort savings on operational / support processes
Higher operational efficiency
Predictive analytics to prevent failures, lower risk, higher associate productivity
Reduction in Helpdesk calls / tickets
Powerful visualization capabilities, analytics and reporting / dashboards – pre-defined as well as custom
Agent less and leveraging standard agents or with custom agents to the need
Full control, and access to source code
Simpler licensing policies
End-to-End product support
Lower TCO

8.4 Differentiators

9. Summary

Monitoring applications helps organizations identify possible issues before they affect business continuity. It also helps to detect the root cause of problems when something goes wrong. Be it a small business with less than 50 nodes or a large enterprise with more than 1,000 nodes, continuous monitoring helps to develop and maintain high performance of resources with little/no downtime.

The decision of which monitoring tool to choose depends on many factors like the size of the organization, expansion plans, budget constraints, etc. However, organizations should look for a comprehensive monitoring application that meets present-day needs while also providing scalability for future expansion.

About the authors

Sarada Kallakuri

Technical Lead - Software Engineering Practice, IES, Wipro Limited.

As an AWS Certified Solution Architect - Associate, and a Sun Certified Java Professional, Sarada has around 14 years of experience in software development with expertise in various technologies and tools like Java/J2EE, Python, Ansible, Hadoop Administration, vRA, vRO.

Radhakrishna Singuru

DMTS - Senior Member – Software Engineering Practice, Industrial & Engineering Services, Wipro Limited.

Radha has more than 24 years of experience in product and system software development spanning cloud and virtualization technologies, scalable platforms, SDN, L2/L3 switching and stacking software, etc., across multiple industry domains.

Keeping an eye on your

IT Resources Health - wherever you are

About the authors

Related Articles

Contact Wipro