Monday, October 7, 2013

More Exadata Patching Gottya's (Part Two)

In my last post I produced quite a list of Oracle Exadata patching gottya's. I think shared my thoughts about the first two:

  1. Didn't run/address Exacheck findings
  2. Root FS full or nearly full

Let's talk about the next two on my list:

  1. Didn't backup the compute node boot partition.
  2. Lost passwords

Backup and Restore of the Compute Nodes

Each time you apply a patch, the patch set instructions tell you to backup the boot partition of the compute node (you don't need to do this on the cell servers). There are two logical partitions on the Exadata box, the first is the boot partition and the second is the backup boot partition. I mentioned these earlier, but they deserve mention again:

/dev/VGExaDb/LVDbSys1 - Boot partition
/dev/VGExaDb/LVDbSys2 - Backup partition

You can use the imageinfo command to verify the boot partition (since things do change from time to time).  

In the early days of Exadata you had to manually perform the backup of the boot partition to the backup partition. Oracle then added the dbserver_backup.sh script to make backing up the boot partition to the backup partition easier. The dbserver_backup.sh script will backup both the root file system and the /boot file systems.  Note that the /u01 file system is not backed up with the dbserver_backup.sh command. You will want to make sure that /u01 gets backed up as well before you start patching. This can easily be done via a tar command. Make sure you backup /u01 and the root FS on a regular basis.

On his blog, Vishal Desai provides a nice script to snapshot the root and /u01 file system and then back it up to a mount point on ZFS. You could, of course, use any NFS mount point to do this, but if your connected to ZFS via Infiniband your going to get the benefit of the speed of Infiniband. Please be aware that I've looked at this backup script and it seems ok, but use it at your own risk. I've not tested it on an Exadata box (yet). The next time I'm on one I'll do so if I have time. Please, always use the dbserver_backup.sh script whenever recommended by Oracle.

You should schedule regular backups of root, and /u01 on your Exadata boxes. It's that important. Bare metal restores can be a pain! If you want to see how to restore the system from your backups you can look in MOS document 1556257.1 titled "Exadata YUM Repository Population, One-time Setup Configuration and YUM Upgrades". Section  7 titled Rolling Backup Software Changes, contains a list of instructions on how to restore your boot partition, should it be lost, if you used the dbserver_backup.sh script. If you used Vishal's method, then section 7 should give you some idea how to restore the boot partition based on the backup.

Lost Passwords

I've run into more than one case where the customer has lost the root password of some critical part of the Exadata infrastructure. One of the most common situations is the loss of the root password on the Infiniband switches. All I can say is that if you loose the root password to the infiniband switches and you need to access those switches (to install an update for example) then you are in for some significant time and trouble. The process of recovering the password is both painful and laborious. Experience has also shown that it's time consuming. So, please, don't loose your passwords.

Next time we will discuss the next two bullet points:

  1. Node misconfigurations
  2. Didn't RTFM

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.