
Some may be in favor of NOT EXISTS. SAS seems to be in favor of NOT IN operator as it does not require tables to be merged. If you want to delete more than one data set, then list the names after the DELETE keyword with a blank space between the names. Use the DELETE statement to delete one or more data sets from a SAS library.

Some softwares may consider both the queries as same in terms of execution so there would not be a noticeable difference in their CPU timings. Specifying Data Sets to Delete Specifying Data Sets to Save Specifying Data Sets to Delete. There is no performance enhancement for either. only difference and is, tables are global temp tables, which are available to all the users where tables are local ones, which are available only to the user. This advise is generally taken out of context. Modern softwares use SQL optimizer to process any SQL query. Adding to LinusH both temporary tables are for a session. Tip - In many popular forums, it is generally advised to use NOT EXISTS rather than NOT IN. data allinfo merge allpeople info120 by id run Now, assume that the goal is a dataset with all contact info for only those people in the dataset. SAS Dataset MERGE (Including prior sorting) took least time (1.3 seconds) to complete this operation, followed by NOT IN operator in subquery which took 1.4 seconds and then followed by LEFT JOIN with WHERE NULL clause (1.9 seconds). Table2 - Dataset Name : Temp2, Observations - 10K, Number of Variables - 1 Table1 - Dataset Name : Temp, Observations - 1 Million, Number of Variables - 1 To answer this question, let's create two larger datasets (tables) and compare the 4 methods as explained above. The MERGE Statement joins the datasets dataset1 and dataset2 by the variable name. Where not exists (select name from dataset2 b This process is repeated for each rows of variable name. NOT EXISTS subquery writes the observation to the merged dataset only when there is no matching rows of a.name in dataset2. Method III - Not Exists Correlated SubQuery

At the next step, WHERE statement with 'b,name is null' tells SAS to keep only records from table A. At the second step, these columns are matched and then the b.name row will be set NULL or MISSING if a name exists in table A but not in table B. In the first step, it reads common column from the both the tables - a.name and b.name. In this method, we are performing left join and telling SAS to include only rows from table 1 that do not exist in table 2. Quit The output is shown in the image below. Where name not in (select name from dataset2) I know I can lag/retain/fill in this example, but the big data I'm working with requires merging/joining due to other factors.The simplest method is to write a subquery and use NOT IN operator, It tells system not to include records from dataset 2. What kind of SQL JOIN would allow me to keep all records and fill the missing with those that have been successfully joined? Seems feasible but MERGE with this data never retains all records. Merging only gives me three records back, but I need to maintain all records and fill the missing (note that through the MKEY, we should be to link A and B records to missing IDs) as shown below: A M123 So we end up merging the missing data MPOOL. I have five total records below, and the final dataset needs to maintain all five records.

I'm essentially splitting a dataset into two (those that have an ID and those that are missing ID), and merging the missing back into the non-missing by a set of match keys to help fill the ID.
