NetLogger

____________________________________________________

____________________________________________________

Introduction:

NetLogger (Networked Application Logger) is a highly instrumented monitoring tool for distributed systems developed at Lawrence Berkeley National Laboratory. In any distributed system very often unexpectedly low performance can be observed. This is not only caused by network congestions, rather bottlenecks in different parts of the system such as the applications, the operating system, the network adaptors, routers and switches are often the main cause of low performance. The NetLogger toolkit includes tools that make it easy to log interesting events at various points of a distributed system with very low overhead. Thus NetLogger provides us with tools to monitor the hosts, the network and the applications of a distributed system giving us a complete view. NetLogger helps find the precise point of bottlenecks thus enabling developers make necessary amendments to the system to the enhance performance.

Throughout this project I was working with NetLogger 2.0.13, which was the latest stable version of NetLogger when I started my internship. Just before I finished my internship, Lawrence Berkeley National Laboratory released NetLogger 2.2; this version may be easier to port to GridFTP 2.4.0.

I have prepared an installation guide for NetLogger version 2.0.13, please also see the online tutorial prepared by the developers of NetLogger for details of the toolkit.

NetLogger Components:

The NetLogger toolkit contains the following components [1]:

                A simple, common message format for all monitoring events which includes high-precision timestamps.

                    C, C++, Java, Perl, Python and TCL calls that can be added to the existing source code to generate monitoring

                    events, and sending the events to a fila, a network server, syslogd, or memory

                A powerful customizable X-Windows tool for viewing and analysis of event logs based on time correlated and/or

                    object correlated events.

                    A collection of instrumented system monitoring tools.

                netlogd: A daemon that collects NetLogger events from several places at a single, central host.

                    netchard: An event archive system for NetLogger data, based on mySQL.

 

Why NetLogger:

NetLogger is a monitoring tool which is designed to monitor in real time under actual operating conditions. There are a number of reasons why NetLogger could be preferred to other monitoring tools such as log4j or Autopilot.

One big advantage of NetLogger is that its ULM message format used by NetLogger ensures that there are negligible monitoring overheads.

The NetLogger API libraries only 6 simple calls:

The NetLogger Visualization tools make it really easy to spot delays in the system. The NLV tool will produce graphs such as the following [2]:

 

 

From this it is very easy to identify exactly between which events the most time is being lost. In this example we can clearly see that there is a significant delay between the events ISS_START_WRITE and APP_RECEIVE. If a delay at this point is unexpected then the developers can look into what goes on between these two events that causes the delay.

 

References:

[1] NetLogger Website : http://www-didc.lbl.gov/NetLogger/

[2] http://www-didc.lbl.gov/NetLogger/nlv/nlvmain.html