TOPIC: SELECT
Using the LIKE operator in PROC SQL WHERE clauses in SAS
26th November 2025Recently, I was working in SAS and decided to trying picking out datasets and variables from its dictionary tables, eventually picking out the maximum length of a variable type for assigning the length of a new variable. This could have been done using a long-established technique:
proc sql;
select distinct memname into :dsns separated by '#'
from dictionary.tables
where lowcase(libname) = 'work'
and index(lowcase(memname), "r_") = 1
and index(lowcase(memname), "visit") = 0;
quit;
The result is that it creates a macro variable containing a delimited list of work datasets with names beginning with r_ and not containing the string visit. As well as using the index function to find the placing of one string within another, I have seen the count function used for similar purposes, albeit without the placement specificity. Since the =: operator which looks for a search string at the start of a larger is not something that works in SQL (data step is more than fine), you cannot do something like this:
proc print data = sashelp.vmember noobs;
where lowcase(libname) = 'work'
and lowcase(memname) =: "r_";
run;
While the contains operator works similarly to the count function when it comes to search text positioning, yet another option is the like operator, and that is shown in the example below:
proc sql;
select distinct memname into :dsns separated by '#'
from dictionary.tables
where lowcase(libname) = 'work'
and lowcase(memname) like 'r\_%' escape '\'
and lowcase(memname) not like '%visit%';
quit;
Here, % and _ are placeholder characters, with the first matching zero or more characters and the second matching one character. Thus, the underscore in r_ needs escaping to look for that pattern (otherwise, it will look for the letter r at the start of a string and a single character after it) and a backslash character (\) will cover that duty. To ensure that it does what you want, adding escape '\' after the expression tells SAS what is happening.
Another thing to watch is that the percent (%) character needs a form escaping from the SAS Macro language processor, and placing the search term in single quotes attends to that. That means that %visit% does not cause any errors when you are looking for visit within a dataset name and using negation (the not operator) to exclude that possibility from the search results. However, using _%visit% might be a better pattern for finding visit at the end of a name, though.
Should you wish to play around with the above to see what happens for your own learning, try using something like this to give you a few test datasets:
data r_test r_visit visit;
set sashelp.class;
run;
Otherwise, feel free to add your own test cases to cement the ideas even further. All too often, we look up something, deploy it and then forget about, especially when AI is involved. Nevertheless, the fastest way to write code can be to use what is embedded in your memory.
ERROR: Ambiguous reference, column xx is in more than one table.
5th May 2012Sometimes, SAS messages are not all that they seem, and a number of them are issued from PROC SQL when something goes awry with your code. In fact, I got a message like the above when ordering the results of the join using a variable that didn't exist in either of the datasets that were joined. This type of thing has been around for a while (I have been using SAS since version 6.11, and it was there then) and it amazes me that we haven't seen a better message in more recent versions of SAS; it was SAS 9.2 where I saw it most recently.
proc sql noprint;
select a.yy, a.yyy, b.zz
from a left join b
on a.yy=b.yy
order by xx;
quit;
ERROR: Invalid value for width specified - width out of range
8th June 2010This could be the beginning of a series of error messages from PROC SQL that may appear unclear to a programmer more familiar with Data Step. The cause of my getting the message that heads this posting is that there was a numeric variable with a length less than the default of 8, not the best of situations. Sadly, the message doesn't pinpoint the affected variable, so it took some commenting out of pieces of code before I found the cause of the problem. That's never to say that PROC SQL does not have debugging functionality in the form of FEEDBACK, NOEXEC, _METHOD and _TREE options on the PROC SQL line itself or the validation statement, but neither of these seemed to help in this instance. Still, they're worth keeping in mind for the future, as is SAS Institute's own page on SQL query debugging. Of course, now that I know what might be the cause, a simple PROC SQL report using the dictionary tables should help. The following code should do the needful:
proc sql;
select memname, name, type, length
from dictionary.columns
where libname="DATA" and type="num" and length ne 8;
quit;
Basic string searching in MySQL table columns
29th April 2010Last weekend, I ended up doing a spot of file structure reorganisation on the web server for my Assorted Explorations website and needed to correct some file pointers in entries on my outdoors blog. Rather than grabbing a plugin from somewhere, I decided to edit the posts table directly. First, I needed to select the affected observations and this is where I had to pick out the affected rows and edit them in MySQL Query Browser. To accomplish that, I needed basic string searching, so I opened up my MySQL e-book from Apress and constructed something like the following:
select * from posts_table where post_text like '%some_text%';
The % wildcard characters are required to pick out a search string in any part of a piece of text. There may be a more sophisticated method, but this did what I needed in a quick and dirty manner without further ado. Well, it was what I needed.
Finding the number of observations in a SAS dataset
16th May 2007There are a number of ways of finding out the number of observations (also known as records or rows) in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.
First up is the most basic and least efficient method: read the whole data set and increment a counter to pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.
data _null_;
set test end=eof;
count+1;
if eof then call symput(”nobs”,count);
run;
The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…
proc sql noprint;
select count(*) into :nobs from test;
quit;
Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets, but you are still reading some data here.
proc sql noprint;
select nobs into :nobs from dictionary.tables where libname=”WORK” and memname=”TEST”;
quit;
The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:
data _null_;
if 0 then set test nobs=nobs;
call symputx(”nobs”,nobs);
stop;
run;
The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from the data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.
To finish, here is the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option, and you don’t need a SAS/AF licence to do it, either.
%let dsid=%sysfunc(open(work.test,in));
%let nobs=%sysfunc(attrn(&dsid,nobs));
%if &dsid > 0 %then %let rc=%sysfunc(close(&dsid));
The first line opens the data set, and the last one closes it; this is needed because you are not using data step or SCL and could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.