Technology Tales

Adventures & experiences in contemporary technology

Finding the number of observations in a SAS dataset

16th May 2007

There are a number of ways of finding out the number of observations (also known as records or rows) in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.

First up is the most basic and least efficient method: read the whole data set and increment a counter to pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.

data _null_;
set test end=eof;
count+1;
if eof then call symput(”nobs”,count);
run;

The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…

proc sql noprint;
select count(*) into :nobs from test;
quit;

Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets, but you are still reading some data here.

proc sql noprint;
select nobs into :nobs from dictionary.tables where libname=”WORK” and memname=”TEST”;
quit;

The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:

data _null_;
if 0 then set test nobs=nobs;
call symputx(”nobs”,nobs);
stop;
run;

The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from the data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.

I’ll finish with the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option, and you don’t need a SAS/AF licence to do it, either.

%let dsid=%sysfunc(open(work.test,in));
%let nobs=%sysfunc(attrn(&dsid,nobs));
%if &dsid > 0 %then %let rc=%sysfunc(close(&dsid));

The first line opens the data set, and the last one closes it; this is needed because you are not using data step or SCL and could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.

Comments:

  • dane wu says:

    I ran the first three lines of the following codes but running an error for the third line:

    %LET DSID=%SYSFUNC(OPEN(work.test,IN));

    %LET popsize=%SYSFUNC(ATTRN(&DSID,NOBS));

    *%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));

    The error message : The %IF statement is not valid in open code.

    But I believe it was fine when I ran them first time. So any advice?

  • John Hennessy says:

    You need to include the said lines in a macro. Maybe I should have mentioned that…

  • Prabhat says:

    Thanks John .. That was helpful

  • Matt says:

    Doesn’t that say how worthless sas is that it takes at least 3 lines of code to do what other languages can accomplish in 5 characters?

  • Eddie Rowe says:

    Another way is to use ODS to output the results of a call to PROC CONTENTS. And you get a lot more information.

  • Jan Poloniecki says:

    Useful, but would be even more useful if there were comparative timings for each way of getting nobs.

  • Jan Poloniecki says:

    I have timing from the log for just this one

    5252 data _null_;
    5253 if 0 then set a3 nobs=caseslost;
    5254 put “WARNING: no estimates ” caseslost=;
    5255 stop;
    5256 run;

    WARNING: no estimates caseslost=204
    NOTE: DATA statement used (Total process time):
    real time 0.00 seconds
    cpu time 0.01 seconds

  • Jan Poloniecki says:

    Come to think of it… how can the “real” time be less than the cpu time?

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.