Named.conf crash report
Description
The ssh extension that we use to upload changes to our nameserver configuration file (named.conf) segfaulted, causing corruption in it. This corrupted configuration was replicated to other name servers to propagate changes.
Timeline and resolution
Wed Aug 4 10:47 PST 2010 - The ssh extension used for transporting nameserver updates segfaulted.
The named.conf configuration file that was being transported was corrupted in the process, and was then synchronized to other nameservers.
Wed Aug 4 10:55 PST 2010 - A client reported an issue with DNS.
We found the corruption and started working on a fix.
Wed Aug 4 11:05 PST 2010 - We manually generated a new named.conf file, and uploaded it to the nameservers.
The new valid named.conf propagated to the other nameservers.
Prevention
To prevent this from happening in the future, we are taking the following action:
We will cease to use ssh as transport and will create a local daemon which will update named.conf directly from our database on each NS server. This daemon will be ready within the next 24 hours.
Estimated Impact
18 minutes.
Regards,
The Scalr Team