Getting started with network monitoring

I wrote this article for the Connect magazine and it was published in issue Connect 09/09, here I publish it with the kind permission of the editorial staff.

What Monitoring Provides and What Technologies Are Available

Monitoring takes many forms, and we can use a variety of technologies and protocols. We can build the entire monitoring system on our own scripts or free solutions, investing only our time and knowledge. Or we can use one of the many commercial products available.

If we don't rely on third-party applications but create our own scripts or programs, we will have a detailed understanding of how the monitoring is carried out, and our solution will precisely match our needs. However, we must have deeper knowledge of scripting or programming, along with an understanding of network protocols and technologies. Additionally, creating such a solution is time-consuming.

Free products are often highly versatile with extensive configuration options. However, they also require deeper knowledge to set up, as some configurations involve writing custom scripts. The advantage is that we have a comprehensive environment (which includes configuration, dashboards, graph processing) and only need to customize certain templates for data collection.

In contrast, commercial products are usually installed with just a few clicks, and we can get monitoring up and running in a few minutes. We only need to know the addresses of the devices and what data we want to monitor on them. There are usually pre-made templates for various areas. However, this also limits the possibilities for customization.

Of course, nothing is purely black or white, so we can also add our own scripts to commercial applications. And we can find freely distributable systems that are ready for the most common deployments.

Some Monitoring Systems
Free - Nagios, Cacti, Zabbix
Paid - Zenoss, PacketTrap pt360, WhatsUp Gold
From Large Companies - Microsoft System Center Operations Manager, HP OpenView, IBM Tivoli Netcool, CiscoWorks LAN Management Solution

What Do We Want to Monitor?

Before we start planning the actual monitoring, we need to determine what exactly we want to monitor. And what outputs from the monitoring are our priority. Based on this, we need to choose the right technologies. Although there are comprehensive systems with many components, we probably won't find a monitoring tool that covers all the areas we need to monitor in a larger company. We must therefore combine multiple products.

We can look at monitoring from two perspectives. We want to know when a problem occurs, that something has stopped working or a critical limit has been exceeded. Or we want to obtain current (and historical) information about a particular system, such as server utilization, to plan its future use. An overview of where clients with what IP and MAC addresses are connected to which switch port. Or monitoring the utilization of data links.

Areas for Monitoring

Let's now consider what can be monitored in a computer network. First, from a more general perspective, we can monitor:

Servers and their services
Active network devices
Network communication/traffic
Security

In addition, there are certain specific areas where we can use the same basic monitoring as for the groups above, but we can get more information by using specialized tools. These include IP telephony, wireless networks, and virtual environments.

Delving deeper, we may be interested in:

Server availability
Service/application availability (including latency - response time)
Events on servers
Resource utilization (CPU, memory, disk)
Link utilization - data transfer measurement
Network traffic statistics
Analysis of abnormal network behavior
Information about switch ports
Monitoring of specific areas such as WiFi or IP telephony
Security incidents

Stručné schéma zapojení monitorovacích technologií do počítačové sítě

Monitoring Technologies

If we use a comprehensive monitoring system for server oversight, we usually have two basic options for accessing the information.

Monitoring with an agent, where we install a special client on the server. We need to have an agent for the given operating system, we must have the ability to install it on the server, and an additional application is added that can cause problems. On the other hand, we get a wide range of data that we can obtain from the server.

The second option is monitoring without an agent, where we test the server's own services or obtain data using certain standard protocols (such as SNMP, WMI, IPMI).

Above, we described the areas we would like to monitor. Now, we'll provide a brief list of the technologies we can use for this monitoring, whether independently or within a monitoring system.

Server availability using ping test
Service availability by establishing a TCP connection or at the application level
Server events - Syslog
Obtaining data using a client
Obtaining data using monitoring protocols WMI, SNMP, IPMI
Monitoring network flows - NetFlow
Network protocol analysis - network protocol analyzer
Network security - IDS/IPS

Monitoring Outputs - Reports and Alerts

We can set up perfect monitoring that will monitor and record everything in our network, but if we don't have clear, accessible, and often intelligent outputs, we won't achieve anything. Therefore, it is important to plan from the beginning what output is most suitable for each area.

In terms of data types, we have two main areas. One is events, which we obtain through Syslog, WMI, or SNMP traps from servers, switches, and other devices. The other is values, often numerical, indicating the current state of a property. We also need to consider whether we are interested in the current state or need to store the history of how the values change over time.

Of course, different representations are suitable for different monitored data. CPU utilization interests us over a certain time period, and a graph is a suitable display. Current switch port states, on the other hand, are better displayed in a table. Server availability over a period can be displayed as a single percentage value.

For a global view of the network, a graphical representation is advantageous, where we see a diagram of the network or its parts. If a problem occurs somewhere, the relevant element is highlighted, and we can click to get detailed information. We can also have event categories displayed by severity, showing the number of unresolved incidents.

The representations described above are interesting and useful in certain situations. However, if we are dealing with security or critical events, and we do not have a monitoring team constantly watching the dashboard, it is much more useful to generate and send email or SMS messages (perhaps using an IP GSM gateway). We can send important notifications for information and critical messages to prompt a quick response.

Ukázka grafu datového toku na interfacech switche ze SNMP

An intelligent system can run on server and active device events, as well as on NetFlow data flows, evaluating the large amount of data from various systems, finding relationships between them, and generating security alerts. It can even perform certain reactions, such as blocking a switch port from which attacks are spreading through the network.

Device/Service Availability

Let's now take a closer look at the basic technologies for monitoring.

Probably the first thing we start monitoring is the availability of a server or active network device. In a small company with one or two servers, the unavailability will likely become apparent very quickly. In a larger environment, however, problems with related services can accumulate, and it may take some time before someone notices the unavailability of a server or network path (which are often redundant).

Typically, device availability is determined using the simple ICMP echo request/response method (i.e., ping). Alternatively, a "SNMP ping" is used, which is an SNMP query on a common OID. This allows us to determine if the device is "alive" and measure the response time (latency). However, we won't find out if, for example, the Apache web server is not running. So the next step is to monitor the availability of the application service.

Most common network services use the TCP protocol and listen on a specific port. So we can test whether we can establish a TCP connection to the given port on the server. This means the service is running and listening. A better verification is an application test, where we verify that the service is behaving as expected. So, for example, for a web server, we connect to a page and verify that it returns a 200 header code. For a mail server, we make basic SMTP commands to establish a connection to the server, etc.

Syslog - Server Events

Syslog is a standard for forwarding log messages over the network. It serves to concentrate logs from various devices and their applications to one place so we can react to them. On the client, we need an application that sends log messages as they are added, using the Syslog protocol. And then we need a Syslog server that receives and processes these messages.

In the Linux world, this is a common practice. The situation is a bit more complex in the Microsoft world. Windows creates a variety of logs called Event Log, and various servers add more logs. These logs are in MS format, and we natively have no support for Syslog. However, on the internet, we can find free applications that act as a Syslog server, such as Kiwi Syslog Server. And also clients that forward selected events from the Event Log, such as Snare Agent for Windows or SaberNet NTsyslog.

Syslog is very useful because hundreds of messages are typically added to the logs per minute, and we don't have a chance to go through this information for dozens of devices. On Syslog, we can create scripts that analyze incoming messages and alert us to problems. For example, from the Windows Security log, we can read failed login attempts, and if one account tries to log in many times within a certain time interval, we can send an email to the administrator that it may be an attack.

Another advantage is that we can store a large number of messages, i.e., a long history. Many devices can only store a limited amount of logs locally. Moreover, we have these logs available even if the server is not available. If a server has failed or been attacked by an intruder and we cannot log in to read the local log, we can find the messages that were stored on the Syslog just before the server stopped responding.

Monitoring is a creative area, and there is no prescribed way to do it. Everything depends on our needs, abilities, and wishes. Here we have only mentioned the possibilities that we can use. Next time, we'll take a closer look at the basic protocol for network management, SNMP, and the protocol for monitoring network flows, NetFlow.

Abbreviations Used
ICMP - Internet Control Message Protocol
SNMP - Simple Network Management Protocol
OID – Object Identifier
WMI - Windows Management Instrumentation
IPMI - Intelligent Platform Management Interface
IPS - Intrusion Prevention System
IDS - Intrusion Detection System
IP – Internet Protocol
TCP - Transmission Control Protocol
MAC – Media Access Control

Comments

[1] Benda

Zdravím všechny, já osobně ve firmě používám zabbix. Za tu cenu 0,- :-) je to nejlepší. Takže pokud by měl někdo zájem mohu poslat základní templates na cisco 2950, 2960 a windows servery.
- comment responded to by [2]Jakub
- comment responded to by [8]Jan Škrabal
Thursday, 03.09.2009 08:45 | answer
[2] Jakub

respond to [1]Benda:
Dobrý den,
měl bych zájem o ty templates pro Cisco 2950, 2960 a hlavně pro Win. Pošlete mě to prosim na JamesGNR@seznam.cz. Děkuji.
- comment responded to by [5]jirtos
Monday, 07.09.2009 07:24 | answer
[3] joe07

tiez by ma zaujimali tie templaty. Mohol by si to uploadnut na nejaky server a poslat link?

Wednesday, 09.09.2009 07:40 | answer
[4] jirtos

Dobrý den,
V rámci své práce jsem implementoval již celou řádku monitorovacích/dohledových systémů a ještě bych doporučil velmi kvalitní projekt OPENNMS (opennms.org) psaný v JAVĚ a tím pádem (spolu s DB PostgreSQL) jede nejen na *NIXech, ale také na platformě MS, s tím, že jeho přístup k dohledu je více směrován na I/O.
A abych nezapomněl, projekt Zenoss je samozřejmě také zdarma (jak Zenoss Core, tak většina ZenPacků (tj. modulů) je OpenSource), placená je pouze podpora jako enterprise verze (to samá platí vlastně i pro ostatní systémy).

Friday, 11.12.2009 02:00 | answer
[5] jirtos

respond to [2]Jakub: Templates obecně nejsou problém, jde spíš o dohledový systém, který chcete používat (resp. a to je důležitější, JAK ho chcete používat) a podle toho se rozhodnout - například grafická nadstavba Nagiosu Centreon má naprostou většinu templatů již v sobě (je to dáno hlavně výrazně vyšší složitostí a možností systému), Zabbix má pár základních template, které pro jeho potřeby většinou naprosto vyhovují a Zenoss distribuuje templaty sebou (relativně dostačující základ) a snad vše na co si vzpomenete je možné implementovat pomocí ZenPacks, které downloudujete tak jako tak (zenpacks nejsou jen templates, ale také podpora např. WMI monitoringu, jabberu, apod.).

Friday, 11.12.2009 02:07 | answer
[6] hofikhof

Dobrý den,
zajímalo by mě, jak by se dalo vyřešit nějakými free nástroji toto:
Potřebuji sledovat Windows/Linux servery, hlídat různé údaje z HW, log soubory, dělat reporty a mít to v nějakém jednoduchém administračním rozhraní (webovém nejlépe) s apachem,php,sql databází pro shromažďování dat. Něco na bázi syslogu, nagiosu, muninu ale v jednom?
Díky za odpovědi...
- comment responded to by [7]Honza Prokůpek
Monday, 08.03.2010 19:18 | answer
[7] Honza Prokůpek

respond to [6]hofikhof: Dobry den, potreby mam temer stejne a velmi se nam osvedcil projekt opsview /naddstavba nagiosu, byvali centron/. Muzu poskytnou radu ci pomoct v pripade zajmu. :o)
- comment responded to by [9]Zdenek
Monday, 22.03.2010 13:51 | answer
[8] Jan Škrabal

respond to [1]Benda: Dobrý den,
rád bych Vás požádal o template na Cisco 2950 24ports. Můžete mi ji zaslat na mail skrabaj@post.cz? Byl bych Vám moc vděčný. Pořád se mi nedaří upravit verzi pro 48ports.

Wednesday, 07.04.2010 17:20 | answer
[9] Zdenek

respond to [7]Honza Prokůpek: Muzete prosim uvest kam na vas smerovat dotazy ohledne OPSVIEW?

Thursday, 22.04.2010 07:44 | answer
[10] Sikus

My v práci používáme na sledování sítě (CISCO zařízení) prográmek The Dude. Sledujeme dostupnost zařízení pomocí pingu a pomocí SNMP sledujeme vytížení linek + se ukládá historie vytížení do grafů. Velice dobrý nástroj zdarma. Funguje pod Windows.
Také není špatný prográmek NeDi pro monitoring sítě.

Friday, 02.09.2011 10:49 | answer
[11] R

Zajímavá volba je www.check-zone.cz. Jedná se online monitorovací portál. Není potřeba nic instalovat ;) Stačí v pohodlí webového prohlížeče zadat servery a jejich služby které chceme sledovat.

Friday, 18.01.2013 13:40 | answer