Two-Pass Compare
Two Pass Compare can be specified on the More Options tab:
- Use two pass compare - whether to use this feature.
- Delay between first and second passes - the number of seconds to wait between first and second passes
- Generate difference file - whether to generate a file containing the keys of the rows that are different between the two tables. For two pass compare, this must be specified.
- Difference file. This is the name of the difference file. If this is omitted it will default to datacomp_diff.txt in your default directory.
Two-Pass Compare can only be used when you are comparing Tables, and not when you are comparing Queries.
Introduction
Two-Pass compare was developed to deal with two problems that can happen with the Data Compare:
- tables are being updated while compare running. This can happen if the compare is run against tables in an operational system. Because the data is continuously being updated, the compare can detect some differences that (a few seconds later) will not exist.
- collation order problem. This happens when the tables are in databases that have different collation orders.
How it works
Two pass compare works as follows:
- the first pass of the compare works as per normal compare, except that it generates a difference file. The difference file contains a list of the rows that are different between the two tables.
- the second pass of the compare reads the difference file and re-compares the rows to see whether or not they are still different.
AQT will wait the Delay period between the first and second phases. If you are running the compare against an operational system, you should set the delay period to a time greater than the largest commit-interval of your database processes.
If you are having problems with the collation order problem, then the re-compare will correctly compare the rows that were not compared successfully due to the collation problem.
Database Access
The first and second phases work quite differently in the way they access the data:
- the first pass reads all the rows in the table by doing a sequential scan of the table. This is a very fast way of reading a table.
- the second pass reads each of the difference-rows individually to see whether they are still different. This is fast if you only have a few rows, but very slow if you have large number of rows which are different.
Consequently, we only recommend using the Two-Pass Compare if you expect relatively few rows to be different.
Multi-Pass Compare
When running the Data Compare in batch mode, you have more flexibility about how the two-pass compare is run, including using multiple passes.
Normal two-pass compare
This can be done by specifying, on the --datacomp control statement, the following additional parameters:
twopass=y,gendiff=y,diff_file=<filename>
- twopass specifies that two-pass compare is to be used.
- gendiff specifies that a difference file be generated.
- diff_file is the name of the difference file. Note: replace <filename> with the name of your difference file.
Alternatively
A two-pass compare can also be run by running the compare twice. The first time by specifying:
gendiff=y,diff_file=<filename>
and the second time by specifying
diff_file_in=<filename>
- by specifying diff_file_in on the second compare, it will run in second pass mode. Eg. the compare will read the difference file and re-compare all the rows.
- note that <filename> is the same in both cases.
Running the compare as two steps like this gives you more flexibility:
- you can run the two compares at different times - for instance the main step during a convenient batch window, and the second step several hours later.
- you can manually review / edit the difference file between the two compare steps.
Multiple passes
The above idea can be extended to run the compare in multiple passes. To do three passes:
- on the first pass code:
gendiff=y,diff_file=<filename>
- on the second pass code:
gendiff=y,diff_file_in=<filename>,diff_file=<filename2>
- on the third pass code:
diff_file_in=<filename2>
The interesting point here is the second pass. This does second-pass processing PLUS it generates a difference file of the differences that still remain at the end of this compare pass. This second difference file is then input to the third step.
Note that <filename> and <filename2> can be specified as the same file. This will not cause a problem within AQT.
Layout of the Difference File
If you wish to view / edit the difference file, it is in the following format:
row-key;grid1-rowid;grid2-rowid;description;
where
- row-key is the key of the database rows that are different. If there are multiple columns in the key, the values are comma separated.
- grid1-rowid and grid2-rowid. These are only used when running the compare in interactive mode. It is the rowids of the rows in the Results grids. AQT needs this information to amend or delete these rows in these grids with the results of the second-pass of the compare.
- description is a brief description of the difference between the tables. This is provided for information only.
If you are running AQT batch mode, and are adding or editing values in the difference file, you only need to specify a value for row-key.