16th May 2007
There are a number of ways of finding out the number of observations (also known as records or rows) in a SAS data set and, while they are documented in a number of different places, I have decided to collect them together in one place. At the very least, it means that I can find them again.
First up is the most basic and least efficient method: read the whole data set and increment a counter to pick up its last value. The END option allows you to find the last value of count without recourse to FIRST.x/LAST.x logic.
data _null_;
set test end=eof;
count+1;
if eof then call symput(”nobs”,count);
run;
The next option is a more succinct SQL variation on the same idea. The colon prefix denotes a macro variable whose value is to be assigned in the SELECT statement; there should be no surprise as to what the COUNT(*) does…
proc sql noprint;
select count(*) into :nobs from test;
quit;
Continuing the SQL theme, accessing the dictionary tables is another route to the same end and has the advantage of needing to access the actual data set in question. You may have an efficiency saving when you are testing large datasets, but you are still reading some data here.
proc sql noprint;
select nobs into :nobs from dictionary.tables where libname=”WORK” and memname=”TEST”;
quit;
The most efficient way to do the trick is just to access the data set header. Here’s the data step way to do it:
data _null_;
if 0 then set test nobs=nobs;
call symputx(”nobs”,nobs);
stop;
run;
The IF/STOP logic stops the data set read in its tracks so that only the header is accessed, saving the time otherwise used to read the data from the data set. Using the SYMPUTX routine avoids the need to explicitly code a numeric to character transformation; it’s a SAS 9 feature, though.
To finish, here is the most succinct and efficient way of all: the use of macro and SCL functions. It’s my preferred option, and you don’t need a SAS/AF licence to do it, either.
%let dsid=%sysfunc(open(work.test,in));
%let nobs=%sysfunc(attrn(&dsid,nobs));
%if &dsid > 0 %then %let rc=%sysfunc(close(&dsid));
The first line opens the data set, and the last one closes it; this is needed because you are not using data step or SCL and could leave a data set open, causing problems later. The second line is what captures the number of observations from the header of the data set using the SCL ATTRN function called by %SYSFUNC.
2nd May 2007
Here's a gotcha that caught up with me on my journey into the world of Oracle SQL: string quoting. Anything enclosed in double-quotes (") is the name of an Oracle object (variable, table and so on) while values are enclosed in single quotes ('). The reason that this one caught me out is that I have a preference for double quotes because of my SAS programming background; SAS macro variables resolve only when enclosed in double-quotes, hence the convention.
20th April 2007
My work in the last week has put me on something of a learning about Oracle. This is down to my needing to add file metadata to a database as part of an application that I am developing. Since the application is written in SAS, I am using SAS/Access for Oracle to update the database using SQL pass-through statements written in Oracle SQL. I am used to SAS SQL and there is commonality between it and Oracle’s implementation, which is a big help. Nevertheless, there, of course, are things specific to the Oracle world about which I have needed to learn. My experiences have introduced me to concepts like triggers, sequences, constraints, primary keys, foreign keys and the like. In addition, I have also seen the results of database normalisation at first hand.
Using Oracle’s SQL Developer has been a great help in my endeavours, thanks to its online help and the way that you can view database objects in an easy-to-use manner. It also runs SQL scripts, giving you a feel for how Oracle works, and anyone can download it for free upon registration on the Oracle website. Additionally, I have installed the Oracle 10g database Express edition at home, which serves as a valuable personal learning resource. That is another free download from Oracle’s website.
My Safari bookshelf has been another invaluable resource, providing access to O’ Reilly’s Oracle books. Of these, Mastering Oracle SQL has proved particularly useful, and I made a journey to Manchester after work this evening (Waterstones on Deansgate is open until 21:00 on weekdays) to see if I could acquire a copy. While that quest was to prove fruitless, I now have got the doorstop that is Oracle Database 10g: The Complete Reference from The Oracle Press, an imprint of Osborne and McGraw Hill. Since I needed a broader grounding in all things Oracle, this should help. Usefully, it also covers SQL, but the aforementioned O’ Reilly volume could return to the wish list if that provision is insufficient.
17th April 2007
Because of my work, I recently have had a bit of exposure to Oracle SQL Developer, which I have been using as part of application development and testing activities. For further investigation, I decided to have a copy at home for further perusal (it's a free download) and it was with some interest that I found out that it could access MySQL databases. To accomplish this, you need Connector/J for MySQL so that communication can occur between the two. Though you quickly notice the differences in feature sets between Oracle and MySQL, it seems a good tool for exploring MySQL data tables and issuing queries.
