Wednesday, October 11, 2017

Running an Oracle Data Pump export from PL/SQL

Using The dbms_datapump Procedure

I love PL/SQL - it has become an incredibly powerful language. PL/SQL can be abused of course - for example, it's well documented that writing DML statements using PL/SQL loops can perform much slower than a well written SQL statement. However, I think that DBA's don't use PL/SQL nearly enough in their work.

We tend to refresh development databases quite a bit around here. We take a production copy, scrub it, and then create a development database out of it - a very common thing to do.

However, when this happens, certain data that was generated in the previous development database is lost during this refresh process, and we want to preserve it.

One option, of course, is to call Data Pump before the refresh and export the tables in question. Then, after the refresh, we can import the tables. This is all well in good, but I am already performing quite a bit of SQL and PL/SQL during the refresh, and it would be nice to export and import the data from the SQL prompt - so I don't have to be shelling back and forth between Oracle and the OS.

Oracle makes this easy to do with the dbms_datapump PL/SQL procedure.

First, all of the work I'm doing is run from a shell script. To make it easy to test, I've created different functions in the shell script for all of the actions I need to take.

Specifically:
  • Setting preliminary variables
  • Creation of directories
  • Creating the export function
  • Check the status of the export

Setting Preliminary Variables

First, the script I have already sets several variables. The ones I'll be using in the new functions are:
  • ${HistoryDumpDate} - this ensures I get a unique name for the dump file. I simply grab the current system date to get this value as seen in this example:

    export HistoryDumpDate="`date +20%y%m%d_%p%H_%M_%S`"
  •  DestDB - This is the name of the database I am creating.

Creating The Directories

 Next, I need to create the directory on the file system and in the database that I am going to put the Data Pump dumpfile into ....  In this case, my script checks to see if the physical directory exists and if it does not, it created this.

Once the physical directory is created, I use the Oracle create directory command to create the directory entry we need inside the database as seen here:

f_create_directory
{
# DestDb is the name of the database I'm refreshing
export DestDb=$1

if [ ! -d /backup/log_history/${DestDb} ]
then
     mkdir -p /backup/log_history/${DestDb}
fi

# Note that this is a refresh so we assume this directory does not already exist. Otherwise
# we'd create a test for it first.
sqlplus / as sysdba<<WEOF
create directory log_history as '/backup/log_history/${DestDb}';
exit
WEOF
}

Creating the Export Function

Having created the directories we are ready to create the export function. In this function we

  1. The function takes two parameters
    1. The first is the schema name that owns the table
    2. The second is the name of the table we want to export.  
  2. Define the HistoryDumpDate variable mentioned earlier
  3. Connect to SQL Plus using a here document (that's the WEOF part)
  4. Run an PL/SQL anonymous block to do the export. 

Here is the function:


f_export_log_history_table
{
export Schema_Name=$1
export Table_Name=$2

printf "Dumping the log_history_table to /backup/log_history/${DestDb} \n">>$Log

export HistoryDumpDate="`date +20%y%m%d_%p%H_%M_%S`"

sqlplus / as sysdba<<WEOF
declare
     h1 number;
begin
     h1:=DBMS_DATAPUMP.OPEN('EXPORT','TABLE',NULL,'EXPORT_${DestDb}_${Schema_Name}_${Table_Name}_${HistoryDunpDate}','LATEST');
     dbms_datapump.add_file(h1,'history_testdb_${HistoryDumpDate}.dmp','LOG_HISTORY');
     dbms_datapump.metadata_filter(h1, 'SCHEMA_EXPR','IN (''$Schema_Name'')');
     dbms_datapump.metadata_filter(h1, 'NAME_EXPR','IN (''NONPROD_LOG_HISTORY'')');
     DBMS_DATAPUMP.START_JOB(h1);
end;
/
exit
WEOF
}


A few things to note about this function.
  • At the moment, I have not added exception handling to the function. I need to go back and do that and fully plan on it. Right now, I'm kind of in the "get it working" stage of development.
  • You can find the documentation for dbms_datapump here
  • The dbms_datapump functions I've used here do the following:
    • OPEN
      • The OPEN function creates a new Data Pump job. 
      • It returns a handle that you will reference in the other functions as you define parameters for the job. 
      • Note that I've indicated this is an EXPORT as opposed to an IMPORT. 
      • The export Job name is
         

        'EXPORT_${DestDb}_${Schema_Name}_${Table_Name}${HistoryDunpDate}'

        I use this job naming convention to avoid any collisions.
      • If there are errors in your script, it's possible that the job will be created. In this case, you may need to clear out the export job. This page has a great article on clearing out Data Pump jobs that won't seem to go away.
    •  ADD_FILE
      • This procedure is used to define the dumpfile that I want to use. You can parallelize the data pump export by defining more than one dumpfile. 
    • METADATA_FILTER
      • Note that there are two calls to dbms_datapump.metadata_filter.
      • The first metadata filter defines the schema name that the object I want to export is in. By default, Data Pump will use the schema you are logged into. In my case, I didn't want to use that default, so I had to create the metadata_filter definition you see.
      • The second metadata filter defines the object I want to export. In this case, just a single table.
      • Note that in both filters I've used SCHEMA_EXPR and NAME_EXPR
        • The use of the EXPR indicates that I'll be using a schema and name expressions.
        • The next paramter is the expression itself- in this case, IN clauses. 
        • I could have used SCHEMA and NAME instead and just listed the names.
    •  START_JOB - This starts the data pump job.

Once the start_job procedure is executed, a data pump job will be executed. Note that dbms_metadata.start_job is a procedure and not a function. This is because the whole data pump process is an asynchronous process. So, in many ways there really is no success or failure condition that would be returned by the function. It COULD return an exception, of course, and I need to program for that... which I will do later. 

Additionally, you can 
  • Import the data using the dbms_datapump package.
  • Check the status of an export or import job using the dbms_datapump package.
  • Set various export/import related parameters, such as remapping and the like.