SQL | Technology Tales

TOPIC: SQL

Using the LIKE operator in PROC SQL WHERE clauses in SAS

26^th November 2025

Recently, I was working in SAS and decided to trying picking out datasets and variables from its dictionary tables, eventually picking out the maximum length of a variable type for assigning the length of a new variable. This could have been done using a long-established technique:

proc sql;
    select distinct memname into :dsns separated by '#'
        from dictionary.tables
            where lowcase(libname) = 'work'
            and index(lowcase(memname), "r_") = 1
            and index(lowcase(memname), "visit") = 0;
quit;

The result is that it creates a macro variable containing a delimited list of work datasets with names beginning with r_ and not containing the string visit. As well as using the index function to find the placing of one string within another, I have seen the count function used for similar purposes, albeit without the placement specificity. Since the =: operator which looks for a search string at the start of a larger is not something that works in SQL (data step is more than fine), you cannot do something like this:

proc print data = sashelp.vmember noobs;
    where lowcase(libname) = 'work'
    and lowcase(memname) =: "r_";
run;

While the contains operator works similarly to the count function when it comes to search text positioning, yet another option is the like operator, and that is shown in the example below:

proc sql;
    select distinct memname into :dsns separated by '#'
        from dictionary.tables
            where lowcase(libname) = 'work'
            and lowcase(memname) like 'r\_%' escape '\'
            and lowcase(memname) not like '%visit%';
quit;

Here, % and _ are placeholder characters, with the first matching zero or more characters and the second matching one character. Thus, the underscore in r_ needs escaping to look for that pattern (otherwise, it will look for the letter r at the start of a string and a single character after it) and a backslash character (\) will cover that duty. To ensure that it does what you want, adding escape '\' after the expression tells SAS what is happening.

Another thing to watch is that the percent (%) character needs a form escaping from the SAS Macro language processor, and placing the search term in single quotes attends to that. That means that %visit% does not cause any errors when you are looking for visit within a dataset name and using negation (the not operator) to exclude that possibility from the search results. However, using _%visit% might be a better pattern for finding visit at the end of a name, though.

Should you wish to play around with the above to see what happens for your own learning, try using something like this to give you a few test datasets:

data r_test r_visit visit;
    set sashelp.class;
run;

Otherwise, feel free to add your own test cases to cement the ideas even further. All too often, we look up something, deploy it and then forget about, especially when AI is involved. Nevertheless, the fastest way to write code can be to use what is embedded in your memory.

Dealing with Error 1064 in MySQL queries

27^th April 2023

Recently, I was querying a MySQL database table and got a response like the following:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use

The cause was that one of the data column columns was named using the MySQL reserved word key. While best practice is not to use reserved words like this at all, this was not something that I could alter. Therefore, I enclosed the offending keyword in backticks (`) to make the query work.

There is more information in the MySQL documentation on Schema Object Names and on Keywords and Reserved Words for dealing with or avoiding this kind of situation. While I realise that things change over time and that every implementation of SQL is a little different, it does look as if not using established keywords should be a minimum expectation when any database tables get created.

ERROR: Invalid value for width specified - width out of range

8^th June 2010

This could be the beginning of a series of error messages from PROC SQL that may appear unclear to a programmer more familiar with Data Step. The cause of my getting the message that heads this posting is that there was a numeric variable with a length less than the default of 8, not the best of situations. Sadly, the message doesn't pinpoint the affected variable, so it took some commenting out of pieces of code before I found the cause of the problem. That's never to say that PROC SQL does not have debugging functionality in the form of FEEDBACK, NOEXEC, _METHOD and _TREE options on the PROC SQL line itself or the validation statement, but neither of these seemed to help in this instance. Still, they're worth keeping in mind for the future, as is SAS Institute's own page on SQL query debugging. Of course, now that I know what might be the cause, a simple PROC SQL report using the dictionary tables should help. The following code should do the needful:

proc sql;
    select memname, name, type, length
        from dictionary.columns
            where libname="DATA" and type="num" and length ne 8;
quit;

SAS9 SQL Constraints

23^rd July 2007

With SAS 9, SAS Institute has introduced the sort of integrity constraints that have been bread and butter for relational database SQL programs, but some SAS programmers may find them more restrictive than they might like. The main one that comes to my mind is the following:

proc sql noprint;
    create table a as select a.*,b.var from a left join b on a.index=b.index;
quit;

Before SAS 9, that worked merrily with nary a comment, only for you now to see a warning like this:

WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.

In the data step, the following still runs without a complaint:

data a;
    merge a b(keep=index var);
    by index;
run;

On the surface of it, this does look inconsistent. From a database programmer's point of view having to use different source and target datasets is no hardship but seems a little surplus to requirements for a SAS programmer trained to keep down the number of temporary datasets to reduce I/O and keep things tidy, an academic concept perhaps in these days of high processing power and large disks. While adding UNDO_POLICY=NONE to the PROC SQL line does make everything consistent again, I see this as being anathema to a database programming type. Though I do admit to indulging in the override for personal quick and dirty purposes, abiding by the constraint is how I do things for formal purposes like inclusion in an application to a regulatory authority like FDA.

WARNING: The quoted string currently being processed has become more than 262 characters long…

20^th June 2007

This is a SAS error that can be seen from time to time:

WARNING: The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks.

In the days before SAS version 8, this was something that needed to be immediately corrected. In these days of SAS character variables extending beyond 200 characters in length, it becomes a potential millstone around a SAS programmer's neck. If you run a piece of code like this:

data _null_;
    x="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
run;

What you get back is the warning message at the heart of the matter. While the code is legitimate and works fine, the spurious error is returned because SAS hasn't found a closing quote by the required position and the 262-character limit is a hard constraint that cannot be extended. There is another way, though: the new QUOTELENMAX option in SAS9. Setting it as follows removes the messages in most situations (yes, I did find one where it didn't play ball):

options noquotelenmax;

This does, however, beg the question as to how you check for unbalanced quotes in SAS logs these days; clearly, looking for a closing quote is an outmoded approach. Thanks to code highlighting, it is far easier to pick them out before the code gets submitted. The other question that arises is why you would cause this to happen anyway, but there are occasions where you assign the value of a macro variable to a data set one and the string is longer than the limit set by SAS. Here's some example code:

data _null_;
    length y $400;
    y=repeat("f",400);
    call symput("y",y)
run;

data _null_;
    x="&y";
run;

My own weakness is where I use PROC SQL to combine strings into a macro variable, a lazy man's method of combining all distinct values for a variable into a delimited list like this:

proc sql noprint;
    select distinct compress(string_var) into :vals separated by " " from dataset;
quit;

Of course, creating a long delimited string using the CATX (new to SAS9) function avoids the whole situation and there are other means, but there may be occasions, like the use of system macro variables, where it is unavoidable and NOQUOTELENMAX makes a much better impression when these arise.

Finding the number of observations in a SAS dataset

16^th May 2007

There are a number of ways of finding out the number of observations (also known as records or rows) in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.

First up is the most basic and least efficient method: read the whole data set and increment a counter to pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.

data _null_;
    set test end=eof;
    count+1;
    if eof then call symput("nobs",count);
run;

The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…

proc sql noprint;
    select count(*) into :nobs from test;
quit;

Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets, but you are still reading some data here.

proc sql noprint;
    select nobs into :nobs from dictionary.tables where libname=”WORK” and memname=”TEST”;
quit;

The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:

data _null_;
    if 0 then set test nobs=nobs;
    call symputx("nobs",nobs);
    stop;
run;

The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from the data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.

To finish, here is the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option, and you don’t need a SAS/AF licence to do it, either.

%let dsid=%sysfunc(open(work.test,in)); %let nobs=%sysfunc(attrn(&dsid,nobs)); %if &dsid > 0 %then %let rc=%sysfunc(close(&dsid));

The first line opens the data set, and the last one closes it; this is needed because you are not using data step or SCL and could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.

Updating Oracle data tables that have associated sequence objects

3^rd May 2007

Here’s something that I want to put somewhere for future reference before I forget it: keep sequences associated with Oracle data tables up to date while adding records. Given that it took me a while to find it, it might come in useful for someone else too.

The first thing is to update the sequence itself:

SELECT TABLE_SEQ.NEXTVAL FROM DUAL;

Dual is a handy single record table that you can use to update sequences. Use the actual associated table itself if you like to see that sequence number rocket…

The next thing is to use the new value to assign a table ID as part of an INSERT statement:

INSERT INTO “TABLE” VALUES (TABLE_SEQ.CURRVAL, 1, ‘Test value’);

Learning about Oracle

20^th April 2007

My work in the last week has put me on something of a learning about Oracle. This is down to my needing to add file metadata to a database as part of an application that I am developing. Since the application is written in SAS, I am using SAS/Access for Oracle to update the database using SQL pass-through statements written in Oracle SQL. I am used to SAS SQL and there is commonality between it and Oracle’s implementation, which is a big help. Nevertheless, there, of course, are things specific to the Oracle world about which I have needed to learn. My experiences have introduced me to concepts like triggers, sequences, constraints, primary keys, foreign keys and the like. In addition, I have also seen the results of database normalisation at first hand.

Using Oracle’s SQL Developer has been a great help in my endeavours, thanks to its online help and the way that you can view database objects in an easy-to-use manner. It also runs SQL scripts, giving you a feel for how Oracle works, and anyone can download it for free upon registration on the Oracle website. Additionally, I have installed the Oracle 10g database Express edition at home, which serves as a valuable personal learning resource. That is another free download from Oracle’s website.

My Safari bookshelf has been another invaluable resource, providing access to O’ Reilly’s Oracle books. Of these, Mastering Oracle SQL has proved particularly useful, and I made a journey to Manchester after work this evening (Waterstones on Deansgate is open until 21:00 on weekdays) to see if I could acquire a copy. While that quest was to prove fruitless, I now have got the doorstop that is Oracle Database 10g: The Complete Reference from The Oracle Press, an imprint of Osborne and McGraw Hill. Since I needed a broader grounding in all things Oracle, this should help. Usefully, it also covers SQL, but the aforementioned O’ Reilly volume could return to the wish list if that provision is insufficient.