Deleting Records with Duplicate Data -- The Easy Way

Article ID: 14895

If you're looking for a way to delete records with duplicate data without writing special programs, copying files, and other such tasks, consider a simple solution that SQL provides. Let's see how it's done.

Consider the following DDS for file MyFile:

      *    =================================================================
      *    = Sample file MyFile                                            =
      *    =================================================================

     A          R MYREC

     A            LASTNAME      20
     A            FIRSTNAME     20 

Now, consider that file MyFile contains the following data:

  JOHNSON             BILL
  EDWARDS             MICHAEL
  MORRIS              SANDRA
  DAVIS               TOM
  MORAY               ANNA
  DAVIS               RYAN
  PAGE                LEE
  EDWARDS             TIM

Presume you want a last name to occur only once in the file. Records for Ryan Davis and Tim Edwards duplicate those for Tom Davis and Michael Edwards, respectively, and should be deleted.

The following SQL statement will do just that.

  Delete From MyLib/MyFile A
    Where RRN(A) > 
          (Select Min(RRN(B)) From MyLib/MyFile B 
             Where A.LastName = B.LastName)       

The statement deletes records from MyFile when the relative record number (RRN) is greater than the lowest relative record number for each last name. The lowest relative record name for a name is found using the Min (minimum) function in the subquery.

After executing the above SQL statement, file MyFile contains the following data:

  JOHNSON             BILL
  EDWARDS             MICHAEL
  MORRIS              SANDRA
  DAVIS               TOM
  MORAY               ANNA
  PAGE                LEE

If you would like to view the duplicate data rather than delete it (or as a precaution before the actual delete), you can use the following SQL statement:

  Select * From MyLib/MyFile A
    Where RRN(A) > 
          (Select Min(RRN(B)) From MyLib/MyFile B 
             Where A.LastName = B.LastName)       

Thanks to Alex Nubla for sharing this tip.

ProVIP Sponsors

ProVIP Sponsors