?

Log in

No account? Create an account

fanf

DNSQPS: an alarming shell script

« previous entry | next entry »
24th Feb 2015 | 17:12

I haven't got round to setting up proper performance monitoring for our DNS servers yet, so I have been making fairly ad-hoc queries against the BIND statistics channel and pulling out numbers with jq.

Last week I changed our new DNS setup for more frequent DNS updates. As part of this change I reduced the TTL on all our records from one day to one hour. The obvious question was, how would this affect the query rate on our servers?

So I wrote a simple monitoring script. The first version did,

    while sleep 1
    do
        fetch-and-print-stats
    done

But the fetch-and-print-stats part took a significant fraction of a second, so the queries-per-second numbers were rather bogus.

A better way to do this is to run `sleep` in the background, while you fetch-and-print-stats in the foreground. Then you can wait for the sleep to finish and loop back to the start. The loop should take almost exactly a second to run (provided fetch-and-print-stats takes less than a second). This is pretty similar to an alarm()/wait() sequence in C. (Actually no, that's bollocks.)

My dnsqps script also abuses `eval` a lot to get a shonky Bourne shell version of associative arrays for the per-server counters. Yummy.

So now I was able to get queries-per-second numbers from my servers, what was the effect of dropping the TTLs? Well, as far as I can tell from eyeballing, nothing. Zilch. No visible change in query rate. I expected at least some kind of clear increase, but no.

The current version of my dnsqps script is:

    #!/bin/sh
    
    while :
    do
        sleep 1 & # set an alarm
        
        for s in "$@"
        do
	    total=$(curl --silent http://$s:853/json/v1/server |
                    jq -r '.opcodes.QUERY')
            eval inc='$((' $total - tot$s '))'
            eval tot$s=$total
            printf ' %5d %s' $inc $s
        done
        printf '\n'
        
        wait # for the alarm
    done

| Leave a comment | Share

Comments {13}

small tweak

from: anonymous
date: 24th Feb 2015 17:36 (UTC)

I (ab)use "while sleep $X ; do ...." a lot, and one of the reasons why I like it is that ^C (usually) kills the sleep, so it exits with a non-zero exit code, and the loop ends. Your reworking breaks this - I have to get lucky with the timing (or mash ^C).

But only a small change can make it work again:

true & # any old thing to wait on
while wait
do
sleep 1 &
....
done

cheers,
Mike

Reply | Thread

Tony Finch

Re: small tweak

from: fanf
date: 24th Feb 2015 18:19 (UTC)

Cunning, thanks!

Bash's signal handling and error propagation is very bad :-( At least, I think it is Bash's fault. It is made worse by the buffer bloat in Linux's terminals and pipes, so stopping an output flood is unnecessarily hard.

Reply | Parent | Thread

Re: small tweak

from: anonymous
date: 24th Feb 2015 22:08 (UTC)

Bash is probably the wrong answer for anything you don't type into a terminal.

Why, yes, I am a hypocrite.

But when do you stop just adding features/fixing edge cases/fixing bugs and start re-writing in perl/ruby/whatever? 10^{0,1,2,3,4} LOC?

-- Mike

Reply | Parent | Thread

Ewen McNeill

DNS stats

from: edm
date: 24th Feb 2015 18:58 (UTC)

That's a clever hack, especially the background sleep for timing stability.

But I'm left wondering why you didn't write it in a language with built in HTTP, JSON, and associative array support? Python, Perl, Go, etc all come to mind. I suspect it'd be no more lines, and easier to read/write. And it doesn't look like you're getting any benefit from the Bourne shell -- eg no pipeline tricks, or redirection tricks.

Ewen

PS: thanks for the pointer to "jq" -- that looks like a useful tool for ad hoc JSON command line handling.

Reply | Thread

Tony Finch

Re: DNS stats

from: fanf
date: 24th Feb 2015 22:59 (UTC)

It's one of those scripts which started off as an ad-hoc command line and didn't grow up enough to warrant rewriting in a "better" language. For fun an equivalent Perl script is:
    #!/usr/bin/perl

    use warnings;
    use strict;

    use JSON;
    use LWP::Simple;
    use Time::HiRes qw(time sleep);

    my (%tot,%inc);
    $tot{$_} = 0 for @ARGV; 

    my $mark = time;
    for (;;) {
        for my $s (@ARGV) {
            my $t = decode_json(get("http://$s:853/json/v1/server"))
                    ->{opcodes}->{QUERY};
            $inc{$s} = $t - $tot{$s};
            $tot{$s} = $t;
            printf " %5d %s", $inc{$s}, $s;
        }
        print "\n";

        $mark += 1;
        sleep $mark - time;
    }

Reply | Parent | Thread

Ewen McNeill

Re: DNS stats

from: edm
date: 24th Feb 2015 23:05 (UTC)

Fair enough :-)

FWIW, I (now!) have a personal metric that if I find I really want associative arrays then shell script is probably the wrong language choice.... I've done that "associative arrays in shell" a few times too, but each time it turned out to be a strong hint that the program was growing beyond "a few lines of shell" and would be much easier converted to something else with the features I wanted built in.

Ewen

Reply | Parent | Thread

Tony Finch

Re: DNS stats

from: fanf
date: 24th Feb 2015 23:14 (UTC)

Yes, that's a good criterion for ditching shell. I have almost never used this eval hack before :-)

Another reason is if you need to check command exit status carefully, especially in pipelines. It seems ironic that a command language is often not that good at running commands!

Reply | Parent | Thread

Malc

Re: DNS stats

from: mas90
date: 25th Feb 2015 12:31 (UTC)

Another reason is if you need to check command exit status carefully, especially in pipelines.
'set -o pipefail'?

Reply | Parent | Thread

Tony Finch

Re: DNS stats

from: fanf
date: 25th Feb 2015 13:02 (UTC)

A recent example I had to deal with is nsdiff which has diff-style exit status, so 0 indicates no change, 1 indicates differences, and 2 indicates an error. Properly error checking a program like this is a bloody pain in shell.

Since I wrote nsdiff this is my own fault. I should probably learn not to write programs that produce "clever" exit status codes :-)

Reply | Parent | Thread

Re: DNS stats

from: anonymous
date: 2nd Mar 2015 23:45 (UTC)

echo ${PIPESTATUS[*]}

Reply | Parent | Thread

Tony Finch

Re: DNS stats

from: fanf
date: 3rd Mar 2015 12:29 (UTC)

See below :-)

Reply | Parent | Thread

Malc

from: mas90
date: 25th Feb 2015 12:29 (UTC)

Bash has native associative arrays; see 'declare -A'.

Reply | Thread

Tony Finch

from: fanf
date: 25th Feb 2015 12:58 (UTC)

Yeah, I avoid bashisms, and treat any temptation to use them as a sign that I ought to be using a better programming language :-)

Reply | Parent | Thread