You have a client who runs SQL code to transform large volumes of incoming data from the client’s customers, and populate SQL databases accessible to the client’s in-house applications. You arrive at the customer site with the assignment of improving SQL performance and expect to start doing some basic SQL performance tuning, but then realize that some, most or all of the SQL data transformation code is written with the assumption that incoming data complies with the file formats the client supplies to its customers. We all know that is going to be a bad assumption, right? So, before tackling tuning issues, you need to fix performance problems caused by incomplete code. The client does not need his SQL system to be rewritten from the ground up, so as part of performance tuning, you will need to make code repairs without modifying the system flow that already exists.
What follows is a set of suggestions for improving SQL logic in a data transformation system. These suggestions apply to SQL logic whether it is executed from DTS packages, SSIS packages, SQL scripts, or stored procedures.
If statements – Use if statements to check the data content conditions and the database conditions. Since humans think in the positive it is a good practice not to use negative if statements, and to always keep the positive outcome as the first alternative action, and the negative outcome as the else action. When either of the outcomes justifies, your code should output diagnostic messages and, if necessary, stop execution…
Try catch blocks – Never assume that a SQL or DML statement is going to execute. Write for both the positive and the negative outcomes and handle the exceptions using the Try Catch Throw structure. Here too, when the negative outcome justifies, output diagnostic statements and if needed stop execution.
Diagnostic information – It is useful to classify your diagnostic output messages. The classics:
- Information only
are a good way to help identify what to look at when something about the execution of the SQL needs attention. Include as much information as you can, including the point in the code where the condition was found, the condition that justifies writing the diagnostic, and any remedial action. It can also be useful to number or otherwise identify specific diagnostic messages.
Handle all exceptions with diagnostic messages and, if necessary, stop execution. Give enough information in the messages to allow for manual intervention to fix the diagnosed problem before the transformation code is rerun.
Commit Statements – Rethink the commits in the code. You do not want to waste execution time rerunning table population code for a table that was correctly populated the first time, but you also do not want to commit data that will need to be rolled back because an error condition is found.
Ensure Presence of Database Components – Do not read from, write to, or update a table unless you check that it exists. If a table is missing, either create it or handle as an exception.
Do not read from, or update a column within an existing table unless you check that the column is there. If a column is missing, either alter the table to add it, or handle as an exception.
Wrap DML Code Statements in Try Catch blocks – Do not execute code against a table, any INSERT, UPDATE, CREATE or DELETE statement unless it is coded inside of a try catch block and on the THROW statement exceptions are handled.
Data Content Conditions
Before populating a column, check the input value against criteria appropriate to the target column’s definition and, if appropriate, validate the input value against expected content.
Data type compatibility – SQL has seven data categories, each of which has multiple data types. The mechanics of comparing input data content to output type is too large to cover here, for is a good discussion and illustration go to http://msdn.microsoft.com/en-us/library/ms191530.
Field length – The length of the incoming value should be compared to the length of the target column before an attempt is made to move it. If truncation is OK for a particular column then code for truncation, otherwise handle the condition as an exception.
Enumerations – When the content of a target column is a constrained to a set of values, validate the incoming data and handle the exceptions.
Max and min of numeric, date, or time values – When the content of a target numeric column is constrained to be within a set range, validate the incoming data against the defined range and handle the exceptions.
Other tests – There are also other text and numeric data conditions you can check, such as where is the decimal in a monetary value, are names capitalized properly, etc. Think through each field as you go along and give the client what he needs to optimize the up time of the SQL when it runs.