วันพฤหัสบดีที่ 13 มีนาคม พ.ศ. 2551

Deleting records with duplicate data

DELETING RECORDS WITH DUPLICATE DATA -- REVISITED
If you're looking for a way to delete records with duplicate data without
writing special programs, copying files, and other such tasks, consider a simple
solution that SQL provides. Let's see how it's done.

Consider the following DDS for file MyFile:

* ======================================
* = Sample file MyFile =
* ======================================
A R MYREC
A LASTNAME 20
A FIRSTNAME 20

Now, consider that file MyFile contains the following data:

JOHNSON BILL
EDWARDS MICHAEL
MORRIS SANDRA
DAVIS TOM
MORAY ANNA
DAVIS RYAN
PAGE LEE
EDWARDS TIM

Presume you want a last name to occur only once in the file. Records for Ryan
Davis and Tim Edwards duplicate those for Tom Davis and Michael Edwards,
respectively, and should be deleted.

The following SQL statement will do just that.

Delete From MyLib/MyFile A
Where RRN(A) >
(Select Min(RRN(B)) From MyLib/MyFile B
Where A.LastName = B.LastName)

The statement deletes records from MyFile when the relative record number (RRN)
is greater than the lowest relative record number for each last name. The lowest
relative record name for a name is found using the Min (minimum) function in the
subquery.

After executing the above SQL statement, file MyFile contains the following
data:

JOHNSON BILL
EDWARDS MICHAEL
MORRIS SANDRA
DAVIS TOM
MORAY ANNA
PAGE LEE

If you would like to view the duplicate data rather than delete it (or as a
precaution before the actual delete), you can use the following SQL statement:

Select * From MyLib/MyFile A
Where RRN(A) >
(Select Min(RRN(B)) From MyLib/MyFile B
Where A.LastName = B.LastName)

===============================================================

In the last issue, we included a tip on deleting duplicate records using SQL. We
received a considerable amount of reader feedback on the item. One
correspondence from Derek Sutcliffe suggests a variant that he suggests will
improve performance, particularly on large files.

Following is Derek's suggested replacement, which lets you view the duplicates
to be deleted:

Select RRN(B) From MyLib/MyFile B, Mylib/Myfile C
Where B.LastName = C.LastName and RRN(B) > RRN(C)

His suggested replacement that performs the deletion of duplicate records is as
follows:

Delete From MyLib/MyFile A
Where RRN(A) in
(Select RRN(B) From MyLib/MyFile B, Mylib/Myfile C
Where B.LastName = C.LastName and RRN(B) > RRN(C))

ไม่มีความคิดเห็น: