If you're looking for a way to delete records with duplicate data without writing special programs, copying files, and other such tasks, consider a simple solution that SQL provides. Let's see how it's done.
Consider the following DDS for file MyFile:
* =================================================================
* = Sample file MyFile =
* =================================================================
A R MYREC
A LASTNAME 20
A FIRSTNAME 20
Now, consider that file MyFile contains the following data:
JOHNSON BILL EDWARDS MICHAEL MORRIS SANDRA DAVIS TOM MORAY ANNA DAVIS RYAN PAGE LEE EDWARDS TIM
Presume you want a last name to occur only once in the file. Records for Ryan Davis and Tim Edwards duplicate those for Tom Davis and Michael Edwards, respectively, and should be deleted.
The following SQL statement will do just that.
Delete From MyLib/MyFile A
Where RRN(A) >
(Select Min(RRN(B)) From MyLib/MyFile B
Where A.LastName = B.LastName)
The statement deletes records from MyFile when the relative record number (RRN) is greater than the lowest relative record number for each last name. The lowest relative record name for a name is found using the Min (minimum) function in the subquery.
After executing the above SQL statement, file MyFile contains the following data:
JOHNSON BILL EDWARDS MICHAEL MORRIS SANDRA DAVIS TOM MORAY ANNA PAGE LEE
If you would like to view the duplicate data rather than delete it (or as a precaution before the actual delete), you can use the following SQL statement:
Select * From MyLib/MyFile A
Where RRN(A) >
(Select Min(RRN(B)) From MyLib/MyFile B
Where A.LastName = B.LastName)
Thanks to Alex Nubla for sharing this tip.