Some days ago I took my home server offline to upgrade it. The server was running on a single hard drive so far. It has data backups but full restore from a backup would take too much downtime for some services.
The server was originally running as my home NAS (mostly for my main desktop backups) and for hosting the test environments of my clients projects. Recently the server started to run some partially critical automation apps.
I decided to set up RAID-1 with two disks to reduce possible downtime. When a disk fails in RAID-1, it can be quickly replaced and synced with the remaining disk. No OS and services would have to be reinstalled.
The upgrade process started a week ago. I made the list of running services and documented the whole system. I also tried out the RAID setup with a VirtualBox machine. I chose 2x1TB Western Digital Red disk drives for the hardware. The current 1TB Western Digital Black disk was left for backups.
I reinstalled the OS to get Debian Jessie, the old one was Wheezy. I used Legacy Boot and DOS partition tables for the new disks. I was not sure how well Grub handles RAID with GPT partition tables. Googling showed me some warnings. DOS partition tables are supposed work well for disks smaller than 2TB.
Largest chunk of the downtime was spent on copying the old data. I had about 570GB of files on the old disk.
cp -p preserved all permissions and modification times (required for my backup system). I had set up all apps and services into the directory
/files which meant I did not have to install and setup every one of them again. I just had to add specific include directives to supervisor, Nginx and Apache configuration files and reinstall Node.js and SWI-Prolog runtimes.
There were some things I did not fully think of before upgrading but were still necessary:
- Notify ALL people about the upgrade. They can get worried when stuff is not running.
- Install NTP to keep clock correct:
apt-get install ntp.
- Use the UTC timezone:
/etc/resolv.confto avoid local censoring.
/etc/ssh/sshd_configto keep keys consistent over upgrades (other tools will freak out otherwise when contacting the machine).
- Make sure you re-add all custom cron entries after reinstall.
- Realtek ethernet complains about missing firmware but still works. Later
it screws up other machines connection for some reason. Actually installing
the firmware fixes it:
apt-get install firmware-realtek.
cp -pto preserve timestamps, users and permissions when copying the old files.
Everything (about 20 apps and services) worked again after 4 hours of downtime. The server has been rock stable for almost a week now.