Thursday, April 18, 2013

Logging and Graphing Ping from the Linux Command Line

Lately I've been having a bit of trouble with my ISP.  For the past month or so, between the hours of about 6pm and midnight I can't connect to Twitter.  The start and stop time of the problem varies, and it only occurs intermittently.  Some nights are fine and you think things are fixed, then it starts over again.  I thought it was me at first, but this thread over at Whirlpool shows a growing list of frustrated users with the same problem.  I though TPG would've been able to find the problem straight away, but I was wrong.

The TPG representative on the forum listed some tests that he'd like us to perform to help track down the problem.  Ideally doing tests like ping, traceroute, and nslookup before and after the problem starts.  I don't know much about the infrastructure of the internet or how routing occurs, but I'm familiar with these commands and what they mean.  I'd like to help out, but because the problem was intermittent I kept missing the right time to run the tests, and after a week or so of trying I was fed up.

I had an ancient laptop sitting beside me running an outdated version of Ubuntu that I decided to put to use.  By automating the process of running the tests and logging the output I would be able to capture the exact moment of failure and hopefully pinpoint the problem.  After all, the internet without Twitter just isn't the same.

Recording the result of the tests was easy, a quick and dirty bash script did the job.


logger.sh

#!/bin/bash

DATE=$(date +%Y_%m_%e_%H_%M_%S).log    #get date YYYY_MM_DD_HH_MM_SS

printf "ping -c 5 twitter.com \n\n" > PING_twitter_com_$DATE
printf "%s\n" "$(ping -c 5 twitter.com)" >> PING_twitter_com_$DATE
printf "\n\n\n" >> PING_twitter_com_$DATE

echo $DATE

The script is simple.  It gets the date in a human readable sortable format writes the command to run as a string to a file named PING_twitter_com_YYYY_MM_DD_HH_MM_SS.log.  It then writes the result of the command to the file.  More tests can be added by replicating the above code for each command.  By issuing the following command

while true; do ./logger.sh; sleep 300; done;

logger.sh will be run every 5 minutes (actually it's every 5 minutes plus the time to run the tests) until stopped.  This will generate a log file for each execution containing the data we want.  An example is below.

PING_twitter_com_2013_04_18_20_17_59.log

ping -c 5 twitter.com

PING twitter.com (199.59.148.10) 56(84) bytes of data.
64 bytes from twitter.com (199.59.148.10): icmp_req=1 ttl=51 time=256 ms
64 bytes from twitter.com (199.59.148.10): icmp_req=2 ttl=51 time=256 ms
64 bytes from twitter.com (199.59.148.10): icmp_req=3 ttl=51 time=257 ms
64 bytes from twitter.com (199.59.148.10): icmp_req=4 ttl=51 time=257 ms
64 bytes from twitter.com (199.59.148.10): icmp_req=5 ttl=51 time=257 ms

--- twitter.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 256.601/257.101/257.432/0.464 ms

This is great, but it's a bit hard to see what's going on.  By extracting the average ping value and graphing it over time, a better picture of the problem can be created.  A command line graph can be generated from a directory of log files with the following command.

cat PING_twitter_com*.log | awk -F '[ /]' '/rtt/{ print $8 }' | gnuplot -persist -e "set terminal dumb 121 28; set yrange[200:400]; plot '-' with impulses title 'Ping (ms)';"

At first it seems a little complicated but once you break it down things become clear.  The required files are read with cat.  The files are read in order because the wild card operator used will guarantee that.  This is then piped to awk.  It looks for lines that match a pattern.  rtt is at the start of the lines that contain the average ping so I've set awk to look for that, but you could've used mdev or another string on the line you are looking for.  The field separators for awk are set with the dash F option to be space and forward slash.  This makes extracting the average ping easy.  By breaking the line up at spaces and forward slashes you can see that the average ping is the eighth field.  awk is set to print the eighth field as its output.  This is piped to gnuplot.

gnuplot needs to be set to use the terminal.  The "set terminal dumb" command does this.  Plotting the data from stdin is done with the plot '-' command.  This gives the basic info, but is can be spruced up a bit with some extra formatting, ie, changing the range, using impulses, enlarging the graph, and giving the data a title.  It's up to you to decide what looks best.

Linux Command Line Graph
A graph of ping results to twitter.com every 5 minutes

As you can see from the graph above, I didn't encounter the problem tonight.  The average ping result remained remarkably stable at around 250 ms.  Who knows, they may have fixed the problem.  But i'll run this for the next couple of nights just to make sure.



4 comments:

  1. this is great but I couldnt quite follow your steps. Could you guide me a little? Im trying to monitor my AT&T T1 traffic so I can see when it spikes.

    ReplyDelete
  2. First of all, are you using Linux? If you're using windows I'm not sure how to go about doing it. If you're using Linux let me know what part you're having trouble with and I'll try to help

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. your script didn't work for me very well, but your idea was really good, so I did this:

    HORA=./hora
    DATOS=./datos
    LOGS=/scripts/graph/PING_outside*.log
    echo > $HORA
    echo > $DATOS

    for FILE in `ls $LOGS`;do
    echo $FILE | awk 'BEGIN { FS = "_" } ; { print $6":"$7":"$8 }' | sed 's/.log//g' >> $HORA
    cat $FILE | awk -F '[ /]' '/rtt/{ print $8 }' >> $DATOS
    done
    sed '/^$/d' -i $DATOS
    sed '/^$/d' -i $HORA

    gnuplot -persist << EOF
    filename="$DATOS";
    set terminal dumb 121 28;
    set title 'Ping (ms)';
    plot filename with impulses title "Ping (ms)"
    EOF
    for LINEA in `seq 1 $(wc -l $DATOS | awk '{print $1}')`; do
    VALX=`head -n $LINEA $DATOS | tail -n 1`
    VALY=`head -n $LINEA $HORA | tail -n 1`
    echo "$VALX - $VALY"
    done;

    ReplyDelete