The Official Scalr blog software to auto-scale the world's websites

4Aug/104

Named.conf crash report

Description

The ssh extension that we use to upload changes to our nameserver configuration file (named.conf) segfaulted, causing corruption in it. This corrupted configuration was replicated to other name servers to propagate changes.

Timeline and resolution

Wed Aug 4 10:47 PST 2010 - The ssh extension used for transporting nameserver updates segfaulted.
The named.conf configuration file that was being transported was corrupted in the process, and was then synchronized to other nameservers.

Wed Aug 4 10:55 PST 2010 - A client reported an issue with DNS.
We found the corruption and started working on a fix.

Wed Aug 4 11:05 PST 2010 - We manually generated a new named.conf file, and uploaded it to the nameservers.
The new valid named.conf propagated to the other nameservers.

Prevention

To prevent this from happening in the future, we are taking the following action:

We will cease to use ssh as transport and will create a local daemon which will update named.conf directly from our database on each NS server. This daemon will be ready within the next 24 hours.

Estimated Impact

18 minutes.

Regards,
The Scalr Team

Filed under: Reports Leave a comment
Comments (4) Trackbacks (0)
  1. Keep the good work up and inform us in the way you have done here when something goes wrong. Thanks a lot!

  2. Thanks for the report guys – nicely put. Keep doing the good work.

  3. There’s nothing wrong with ssh as a transport. Maybe it would be best to run named-checkconf before reloading the bind server?


Leave a comment

(required)

No trackbacks yet.