Smokeping – big “gap” when restarting

Smokeping is a great tool for network monitoring, we recently adopted smokeping to use it for URL monitoring. We use the Curl probe for this purpose.

Everything seems working fine until the number of URL goes over a thousand, whenever we restart the service, we see a gap in the graph , this gap means Smokeping did nothing at that time.

We did some analysis and make some change:

  • changing the forks parameter from 5 to 100

  • changing the pings from 5 to 3 (if you do this you have to delete all your rrds first otherwise smokeping will not be able to update  – this is a database change)

After doing that, we see the it has some improvement such as the polling time cycle is now less than 300s most of the time.

But the “gap” when restarting is still there. After trying many debugs, i found that smokeping have a sleep time in between run, this sleep time is control by the “offset” parameter in the “General” configuration section. This seems not working as it is expected, for example , your polling cycle is step=300s  , but the run now takes 290s or even 330s , i see sleep time is some times like 297s. This is not correct , it should only sleep if its last run less than 300s and the sleep time should be 300s – last run time.  I decided to hack into the code (Smokeping.pm) , this is my code:

 

report_probes($probes, $myprobe);
#Mynew var
my $last_run_time=-1;
while (1) {

unless ($opt{nosleep} or $opt{debug}) {
my $sleeptime = $step - (time-$offset) % $step;
if (defined $myprobe) {
$probes->{$myprobe}->do_debug("Sleeping $sleeptime seconds.");
} else {
do_debuglog("Sleeping $sleeptime seconds.");

}
#This is my hack code
if($last_run_time>-1)
{
$sleeptime=$step-$last_run_time;
}
if($sleeptime>0 && $last_run_time>=0) #we will only sleep after the 2nd run only
{
sleep $sleeptime;
}
last if checkhup($multiprocessmode, $gothup) && reload_cfg($cfgfile);
}
my $now = time;
run_probes $probes, $myprobe; # $myprobe is undef if running without 'concurrentprobes'

my %sortercache;
if ($opt{'master-url'}){
my $new_conf = Smokeping::Slave::submit_results $slave_cfg,$cfg,$myprobe,$probes;
if ($new_conf && !$gothup){
do_log('server has new config for me ... HUPing the parent');
kill_smoke $cfg->{General}{piddir}."/smokeping.pid", SIGHUP;
# wait until the parent signals back if it didn't already
sleep if (!$gothup);
if (!$gothup) {
do_log("Got an unexpected signal while waiting for SIGHUP, exiting");
exit 1;
}
if (!$multiprocessmode) {
load_cfg_slave(\%opt);
last;
}
}
} else {
update_rrds $cfg, $probes, $cfg->{Targets}, $cfg->{General}{datadir}, $myprobe, \%sortercache;
save_sortercache($cfg,\%sortercache,$myprobe);
}

my $runtime = time - $now;
#This is a hack
$last_run_time=$runtime;

Leave a Reply

Your email address will not be published. Required fields are marked *