Technology Tales

Adventures & experiences in contemporary technology

Reading data into SAS using the EXCEL and PCFILES library engines

4th March 2010

Recently, I had the opportunity to have a look at the Excel library engine again because I need to read Excel data into SAS. You need SAS Access for PC Files licensed for it to but it does simplify the process of getting data from spreadsheets into SAS. It all revolves around setting up a library pointing at the Excel file using the Excel engine. The result is that every worksheet in the file is treated like a SAS dataset even if there names contain characters that SAS considers invalid for dataset names. The way around that is to enclose the worksheet name in single quotes with the letter n straight after the closing quote, much in the same way as you’d read in text strings as SAS date values (’04MAR2010’d, for example). In order to make all of this more, I have added some example code below.

libname testxl excel 'c:\test.xls';

data test;
set testxl.'sheet1$'n;
run;

All of the above does apply to SAS on Windows (I have used it successfully in 9.1.3 and 9.2) but there seems to be a way of using the same type of thing on UNIX too. Again, SAS Access for PC Files is needed as well as a SAS PC Files server on an available Windows machine and it is the PCFILES engine that is specified. While I cannot say that I have had the chance to see it working in practice but seeing it described in SAS Online Documentation corrected my previous misimpressions about the UNIX variant of SAS and its ability to read in Excel or Access data. Well, you learn something new every day.

Harnessing the power of ImageMagick

26th October 2008

Using the command line to process images might sound senseless but the tools offered by ImageMagick certainly prove that it has its place. I have always been wary of using bulk processing for my digital photo files (some digitised from film prints with a scanner) but I do agree that some of it is needed to free up some time for other more necessary things. With this in mind, it is encouraging to see the results from ImageMagick and I can see it making a major difference to how I maintain my online photo gallery.

For instance, making thumbnail images for the gallery certainly seems to be one of those operations where command line bulk processing comes into its own and ImageMagick’s own convert command is heaven sent for this one. For resizing images, all that’s needed is the following:

convert -resize 40% input.jpg output.jpg

Add a spot of further shell scripting and even a dash of Perl and the possibilities for this sort of thing become clearer and this is but the pinnacle of the proverbial iceberg. The -rotate switch will do what the name suggests and there are a whole plethora of other options on tap. So long as you have Ghostscript on your system, conversion of graphics to Postscript (and Encapsulated Postscript too) and PDF files is possible with the -page option controlling the margin around the image itself in the resulting outputs. Unfortunately, portrait is the sole orientation on offer but a bit of judicious post processing will turn things around. Here’s a command that’ll do the trick:

convert -page 792×612+72+72 input.png ps2:output.ps

For retrieving image metadata like its resolution and size, the identify command comes into play. The -verbose option invokes the output of all manner of image metadata so using grep or egrep is perhaps advisable, especially for bulking processing with the likes of Perl. Having the ability to stream image metadata makes loading databases like MySQL less of a chore than the manual data entry that has been my way of doing things until now.

Copying only updated and new files

20th October 2008

With Linux/UNIX, the command line remains ever useful and allows you to do all manner of things, including file copying that only adds new files to a destination. Here’s a command that accomplishes this in Linux:

cp -urv [source] [destination]

The u switch does the update while r ensures recursion (by default, cp only copies files from a source directory and not anything sitting in subfolders) and v tells the command to tell the user what is happening.

Though buried and hardly promoted, Windows also has its command line and here’s what accomplishes a similar result:

xcopy /d /u [source] [destination]

Anything’s better than having to approve or reject every instance where source and destination files are the same or, even worse, to overwrite a file when it is not wanted.

Ghostscript: **** Unable to open the initial device, quitting.

6th October 2008

The above error message has been greeting me when creating PDF’s with Ghostscript on a Solaris box and does need some translation. If you are directing output to a real printer, I suppose that it is sensible enough: nothing will happen unless you can connect to it. It gets a little less obvious for PDF creation and seems to mean that the pdfwrite virtual device is unable to create the specified output file. A first port of call would be check that you can write to the directory where you are putting the new PDF file. In my case, there seems to be another cause so I’ll have to keep looking for a solution.

Update: I have since discovered the cause of this: a now defunct TEMP assignment in the .profile file for my user account. Removing that piece of code resolved the problem.

A way to combine PDF files in UNIX and Linux

4th October 2008

My latest adventure in the world computing has led me into the world of automated PDF generation. When my first approach didn’t prove to be completely trouble-free, I decided to look at the idea of going part of the way with it and finishing off the job with the open source utility Ghostscript. It is that which got me thinking about combining bookmarked PDF files and I can say that Ghostscript is capable of producing what I need as long it doesn’t generate any errors along the way. Here’s the command that does the trick:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=final.pdf source_file1.pdf source_file2.pdf

The various switches of the gs command have very useful roles with dBATCH ensuring that Ghostscript shuts down when all is done, dNOPAUSE removing any prompts that would otherwise be given, q for quiet mode, sDEVICE using Ghostscript’s own PDF creation functionality and sOutputFile creates the output file, stopping Ghostscript from sending it to its default stream. All of this applies to Windows Ghostscript too, though the name of the executable is gswin32c for 32-bit Windows instead of gs.

When it comes to any debugging, it is useful to consider that Ghostscript is case sensitive with its command line switches, something that I seen to trip up others. I am getting initial device initialisation so it strikes me that dropping some of the ones that reduce the number of messages might help me work out what’s going on. It’s a useful idea that I have yet to try.

There is also online documentation if you fancy learning more and Linux.com have an article that considers other possible PDF combination tools as well. All in all, it’s nice to have command line tools to do these sorts of things rather than having to use GUI applications all of the time.

Recursive FTP with the command line

6th August 2008

Here’s a piece of Linux/UNIX shell scripting code that will do a recursive FTP refresh of a website for you:

lftp <<~/Tmp/log_file.tmp 2>>~/Tmp/log_file.tmp

open ${HOSTNAME}

user ${USER} ${PSSWD}

mirror -R -vvv “${REP_SRC}” “${REP_DEST}”

EndFTP

When my normal FTP scripting approach left me with a broken WordPress installation and an invalid ticket in the project’s TRAC system that I had to close, I turned to looking for a more robust way of achieving the website updates and that’s what led me to seek out the options available for FTP transfers that explicitly involve directory recursion. The key pieces in the code above are the use of lftp in place of ftp, my more usual tool for the job, and the invocation of the mirror command that comes with lftp. The -R switch ensures that file transfer is from local to remote (vice versa is the default) and -vvv turns on maximum verbosity, a very useful thing when you find that it takes longer than more usual means. It’s all much slicker than writing your own script to do the back-work of ploughing through the directory structure and ensuring that the recursive transfers take place. Saying that, it is possible to have a one line variant of the above but the way that I have set things up might be more familiar to users of ftp.

How much space is that folder taking up on your disk?

23rd July 2008

In Windows, it’s a matter of right-clicking on the folder and looking in its properties. I am sure that there is a better way of doing it in that ever pervasive operating system but, in the worlds of Linux and UNIX, the command line comes to the rescue as it is wont to do. What follows is the command that I use:

du -sh foldername

The s option makes it present the total space taken up and leaving it out gets you a breakdown of how much space the subfolders are taking up as well. The h makes the sizes output more friendly to human eyes with things like 10K, 79M and 51G littering what you get. The command itself is a much shorter way of saying “print disk usage”. It’s all quick and easy when you know it and very useful in this age of ever increasing data volumes.

A quick way to create a blank text file

27th June 2008

The primary job done by the touch command in UNIX or Linux is to update the time stamps on files. However, it also has another function: creating an empty text file where you are "touching" a file that doesn’t exist. This has its uses, particularly when you want to reduce the amount of pointing and clicking that you need to do or you want to generate a series of empty files in a shell script. Whatever you do with it is up to you.

Quickly surveying free disk space on UNIX and Linux

21st June 2008

Keeping an eye on disk space on a Solaris server is important for me at work while keeping the same top level overview is good for my use of Linux at home too. Luckily, there’s a simple command that delivers the goods:

df -h 2>/dev/null

The "df -h" piece is what delivers the statistics while the "2>/dev/null" rids the terminal of any error messages; ones stating that access has been denied are common and can cloud the picture.

Automating FTP I: UNIX and Linux

11th April 2008

Having got tired of repeated typing in everything at the prompt of an interactive command line FTP session and doing similar things via the GUI route, I started to wonder if there was a scripting alternative and, lo and behold, I found it after a spot of googling. There are various opportunities for its extension such as prompting for username and password instead of the risky approach of including them in a script or cycling through a directory structure but here’s the foundation stone for such tinkering anyway:

HOSTNAME='ftp.server.host'
USER='user'
PSSWD='password'
REP_SRC='source_directory'
REP__DEST='destination_directory'
FILENAME='*'

rm -rf log_file.tmp

cd "${REP_DEST}"

ftp -i -n -v <<EndFTP >>log_file.tmp 2>>log_file.tmp
open ${HOSTNAME}
user ${USER} ${PSSWD}
prompt
cd "${REP_SRC}"
mget "${FILENAME}"
EndFTP

cd ~

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.