ANALYSIS | Technology Tales

Calculating geometric means using SAS

19th March 2012

Recently, I needed to calculate geometric means after a break of a number of years and ended racking a weary brain until it was brought back to me. In order that I am not in the same situation again, I am recording it here and sharing it always is good too.

The first step is to take the natural log (to base e or the approximated irrational value of the mathematical constant, 2.718281828) of the actual values in the data. Since you cannot have a log of zero, the solution is either to exclude those values or substitute a small value that will not affect the overall result as is done in the data step below. In SAS, the log function uses the number e as its base and you need to use the log10 equivalent when base 10 logs are needed.

data temp; set temp; if result=0 then _result=0.000001; else _result=result; ln_result=log(_result); run;

Next, the mean of the log values is determined and you can use any method of doing that so long as it gives the correct values. PROC MEANS is used here but PROC SUMMARY (identical to MEANS except it defaults to data set creation while that generates output by default; because of that, we need to use the NOPRINT option here), PROC UNIVARIATE or even the MEAN function in PROC SQL.

proc means data=temp noprint; output out=mean mean=mean; var ln_result; run;

With the mean of the log values obtained, we need to take the exponential of the obtained value(s) using the EXP function. This returns values of the same magnitude as in the original data using the formula e^mean.

data gmean; set mean; gmean=exp(mean); run;

On Making PROC REPORT Work Harder

1st September 2010

In the early years of my SAS programming career, there seemed to be just the one procedure to use if you wanted to create a summary table. That was TABULATE and it was great for generating columns according to the value of a variable such as the treatment received by a subject in a clinical study. To a point, it could generate statistics for you too and I often used it to sum frequency and percentage variables. Since then, it seems to have been enhanced a little and it surprised me with the statistics it could produce when I had a recent play. Here’s the code:

proc tabulate data=sashelp.class; class sex; var age; table age*(n median*f=8. mean*f=8.1 std*f=8.1 min*f=8. max*f=8. lclm*f=8.1 uclm*f=8.1),sex / misstext="0"; run;

When you compare that with the idea of creating one variable per column and then defining them in PROC REPORT as many do, it has to look more elegant and the results aren’t bad either though they can be tweaked further from the quick example that I generated. That last comment brings me to the point that PROC REPORT seems to have taken over from TABULATE wherever I care to look these days and I do ask myself if it is the right tool for that for which it is being used or if it is being used in the best way.

Using Data Step to create one variable per column in a PROC REPORT output doesn’t strike me as the best way to write reusable code but there are ways to make REPORT do more for you. For example, by defining GROUP, ACROSS and ANALYSIS columns in an output, you can persuade the procedure to do the summarising for you and there’s some example code below with the comma nesting height under sex in the resulting table. Sums are created by default if you do this and forgoing an analysis column definition means that you get a frequency table, not at all a useless thing in many cases.

proc report data=sashelp.class nowd missing; columns age sex,height; define age / group "Age"; define sex / across "Sex"; define height / analysis mean f=missing. "Mean Height"; run;

For those times when you need to create more heavily formatted statistics (summarising range as min-max rather showing min and max separately, for example), you might feel that the GROUP/ACROSS set-up’s non-display of character values puts a stop to using that approach. However, I found that making every value combination unique and attaching a cell ID helps to work around the problem. Then, you can create a format control data set from the data like in the code below and create a format from that which you can apply to the cell ID’s to display things as you need them. This method does make things more portable from situation to situation than adding or removing columns depending on the values of a classification variable.

proc sql noprint; create table cntlin as select distinct "fmtname" as fmtname, cellid as start, cellid as end, decode as label from report; quit;

proc format lib=work cntlin=cnlin; run;

Technology Tales

Calculating geometric means using SAS

On Making PROC REPORT Work Harder

Topics Discussed

Translate