Wednesday, October 23, 2013

Did you make a mistake patching the Oracle Home and Grid Home on Exadata?

Part of the process of applying Oracle patches can involve relinking the Oracle Executable. For example if your using Database Vault, you might need to relink your Oracle Executable to either enable or disable that feature. Some patches might require that you manually relink the software.

The thing is, in Exadata, it's important to link the software the correct way. This is because the Infiniband fabric uses a protocol called the RDS protocol to communicate. In order to use the RDS protocol, you need to make sure the database is relinked using the correct libraries.

So, if you need to relink the ORACLE_HOME or GRID_HOME software make sure you include the ipc_rds library in the make command as seen here:

make -f ins_rdbms.mk ipc_rds ioracle

If you run Exacheck it will tell you if you are not linked using the RDS protocol.

So, if you find your having network issues or node fencing - check to make sure you are using the RDS protocol. Oracle provides a way to check if you are using the RDS protocol. Simply set the ORACLE_HOME correctly (and other variables) and issue the following command:


$ORACLE_HOME/bin/skgxpinfo
 
If it says rds, your in good shape on that node if you get the rds response from each Oracle Home and Grid Home.
 
 
Also, are the databases in your Exadata communicating over infiniband? Did you know that by default they do not. I'll talk about that in my next post!

Robert
 

Wednesday, October 16, 2013

The Exadata October QFSDP is out!

Patch number 17452393 has been released. This is the October QFSDP for Linux Exadata. You can find it on Metalink by going to the patches and updates page and putting in the patch number.

Here is a list of whats new in the October QFSDP delta'd against the July:



Databsae Patch to 11.2.0.3.21
Opatch to 11.2.0.3.5
Oplan to 12.1.0.1.3

The Cloud Control  patches are a bit changed too:

OMS Base patch set number is 16290212 (changed from 16236221)
Two new agent patches: 14075824 and 14509490 (along with the previous 4 patches in July which are still there).
DBPlugin has 2 new patches on top of the 3 that were already there :
16910687
17052137


Also the DBNodeUpdate Utility has appeared in the patch. Note 1553103.1 provides information on the DBNodeUpdate utility.

Also check out document 888828.1 for important information on all Exadata patches.

For those of you running 12c GI on Exadata, there is a patch available for you. This is an upgrade to 12.1.0.1.1 GI. (this is a PSU, and not contained in the QFSDP). See patch 1727829 for more information on this patch. This patch set also says it contains a DB update (I've  not yet installed the patch so I can't as yet verify this is true).

So - go, test, test, test and then patch!!

Friday, October 11, 2013

Oracle Database 12c New Features is almost here!

I've had several emails asking me about my new book Oracle Database 12c New Features. Well it's almost here! For those of you who are interested in gory details, writing a book is a multi-phased process.

1. You do a proposal, they accept... or don't.
2. You write your chapters.
3. The chapters go through technical editing - I get the chapter back and have to review and reply to each comment by the technical reviewer. Then I send the chapter back for copy editing.
4. The chapters go through copy editing - this is where they correct grammar, sentence structure, the general outline and so on.


5. The publisher then typesets the pages. In this process the pages are transferred from a word document, into a format that the printer can use. After the pages are all set, a PDF of the page is produced and sent to the author.
6. Review the page proofs and return comments and corrections.
7. The publisher does more of whatever they do before the book goes to the printing press.
8. The book is printed. It used to be that the print runs were larger. My 9i New Features book first print run was 5,000 copies I think, maybe even 10,000. Now days, they can print on demand much easier so the initial print runs are usually not as large. I'm not sure how big this print run will be.
9. The book is shipped out to be sold by Web and brick and mortar stores.

So... I just finished step 6, the page proofs.... which pretty much ends the writing process for the book. Now, it's just a matter of getting it printed. I am hopeful that we will get it in the stores before the currently reported publish date of 12/13/2013.

I'm now taking a 2-3 day holiday and then starting on my next book project - OCP - Oracle 12c Administrator Certified Professional!

I have another project coming sometime in the future that I call "Scorched Earth", but that's a totally different story and has nothing to do with Oracle. Someday I will write a book about it though.

Enjoy your Columbus day!!

Monday, October 7, 2013

More Exadata Patching Gottya's (Part Two)

In my last post I produced quite a list of Oracle Exadata patching gottya's. I think shared my thoughts about the first two:

  1. Didn't run/address Exacheck findings
  2. Root FS full or nearly full

Let's talk about the next two on my list:

  1. Didn't backup the compute node boot partition.
  2. Lost passwords

Backup and Restore of the Compute Nodes

Each time you apply a patch, the patch set instructions tell you to backup the boot partition of the compute node (you don't need to do this on the cell servers). There are two logical partitions on the Exadata box, the first is the boot partition and the second is the backup boot partition. I mentioned these earlier, but they deserve mention again:

/dev/VGExaDb/LVDbSys1 - Boot partition
/dev/VGExaDb/LVDbSys2 - Backup partition

You can use the imageinfo command to verify the boot partition (since things do change from time to time).  

In the early days of Exadata you had to manually perform the backup of the boot partition to the backup partition. Oracle then added the dbserver_backup.sh script to make backing up the boot partition to the backup partition easier. The dbserver_backup.sh script will backup both the root file system and the /boot file systems.  Note that the /u01 file system is not backed up with the dbserver_backup.sh command. You will want to make sure that /u01 gets backed up as well before you start patching. This can easily be done via a tar command. Make sure you backup /u01 and the root FS on a regular basis.

On his blog, Vishal Desai provides a nice script to snapshot the root and /u01 file system and then back it up to a mount point on ZFS. You could, of course, use any NFS mount point to do this, but if your connected to ZFS via Infiniband your going to get the benefit of the speed of Infiniband. Please be aware that I've looked at this backup script and it seems ok, but use it at your own risk. I've not tested it on an Exadata box (yet). The next time I'm on one I'll do so if I have time. Please, always use the dbserver_backup.sh script whenever recommended by Oracle.

You should schedule regular backups of root, and /u01 on your Exadata boxes. It's that important. Bare metal restores can be a pain! If you want to see how to restore the system from your backups you can look in MOS document 1556257.1 titled "Exadata YUM Repository Population, One-time Setup Configuration and YUM Upgrades". Section  7 titled Rolling Backup Software Changes, contains a list of instructions on how to restore your boot partition, should it be lost, if you used the dbserver_backup.sh script. If you used Vishal's method, then section 7 should give you some idea how to restore the boot partition based on the backup.

Lost Passwords

I've run into more than one case where the customer has lost the root password of some critical part of the Exadata infrastructure. One of the most common situations is the loss of the root password on the Infiniband switches. All I can say is that if you loose the root password to the infiniband switches and you need to access those switches (to install an update for example) then you are in for some significant time and trouble. The process of recovering the password is both painful and laborious. Experience has also shown that it's time consuming. So, please, don't loose your passwords.

Next time we will discuss the next two bullet points:

  1. Node misconfigurations
  2. Didn't RTFM

Thursday, October 3, 2013

Applying Exadata Patches - The Biggest Gottyas

Trouble in Exadata City

If you have applied an Exadata patch or two, you know that there are a few gottyas that can lurk in a stab you in the back. I thought I'd provide my top-n list of Exadata Patching Gottyas in a hope that you won't suffer from them.

  1. Didn't run/address Exacheck findings
  2. Root FS full or nearly full
  3. Didn't backup the compute node boot partition.
  4. Lost passwords
  5. Node misconfigurations
  6. Didn't RTFM
  7. Backups only to Exadata cell disks
  8. ILOM not working
  9. Don't know how to use ILOM
  10. "This is the way we do things" itis. 
  11. The patches are cumulative - Really?
  12. Forgetting to patch GHomes.
  13. Forgetting to patch all the OHomes.
  14. Forgetting to relink Oracle properly (this is a major performance problem).
  15. Opening SR's without  using your Oracle Exadata CSI.

In the next several posts I will address a few of these items in order. As I move along, I might add a few here if they come to me, or are suggested by others.

Note that in my stories, the names have been changed to protect the innocent. Nothing I say will have any relationship to Wikileaks at all, nor will I be revealing any government secrets. I'd tell you that the UFO's are real and that the aliens have landed, but then I'd have to kill you.

So, let's start with the first few:

Didn't run/address Exacheck findings 


I've done a few posts on Exacheck. It is, perhaps, one of the most underused tools that the DBA or Exadata Machine administrator, has available to them. Exacheck gets better and better and really should be part of your daily administrative reports in my opinion.

To not run an Exacheck report before you start patching (both in the planning stages and also just before the actual application) and *read* it (yes - there are those who run reports and then don't read them! - I KNOW!!) You should clearly understand the meaning of every item that is listed on that exacheck and why it's showing up on your system. You should document the reoccurring items that you already know are OK (but always check the details in case something small and significant has changed) and deal with those errors or warnings that are new and not on your "safe to ignore" list.

In my opinion, I will not start a patch set application until I'm happy that the Exacheck is clean. That does not mean a score of 100%, that means that I understand the reasons, clearly, for all of the errors and warnings and that  you could describe each of those reasons to me, so that I'd understand. If you can't verbalize the reasons for them (short of being mute) then you don't understand them in most cases.

I can't point my finger at any Exadata upgrades that I've been involved in where checking the Exacheck report would have solved a major problem that we ran into. I can point to more than one or two where checking the Exacheck report likely saved us a problem.


Root FS full or nearly full

I've actually run into this more than a few times. I'll find that on one of the compute nodes that the root file system is dangerously close to filling up. In fact, I just had a case where the root FS did fill up, which caused a system panic and rebooted the node.

Monitoring the root file system is, frankly, a basic production monitoring responsibility. If you have Oracle Platinum services and the Platinum gateway - understand that they are not monitoring the size of the root file system for you. They will notice when it fills up and the node crashes, but then it's a bit to late. You should configure monitoring for the space in file systems, including root, on any Oracle Database machine, including Exadata. OEM is the proper place to be doing this monitoring.

Often I find that the Root FS is filled with the following:

1. Files transfered into /tmp - typically Oracle Datapump dump files or sometimes files related to data feeds that were sftp to or from the Exadata Machine.
2. Core dump files that never got cleaned up.
3. Oracle related files such as accidentally creating output files in the root file system and the like.

You should regularly scour your / file system and clean it up. Also, note that Oracle provides a script that you can use to backup the boot partition (/dev/VGExaDb/LVDbSys1) to a backup partition (/dev/VGExaDb/LVDbSys2).

You should use that script anytime something major changes on the root file system (for example, you add a new OH). The script name is dbserver_backup.sh. You can find more information on this script in MOS note 1473002.1. In my mind, I'd also schedule a regular backup of the root file system to some non-Exadata media such as an attached ZFS storage device.



Next time we will cover another 2-3 items on my list. Stay tuned!