TOPIC: COMPUTER FILE
Some PowerShell fundamentals for practical automation
27th October 2025In the last few months, I have taken to using PowerShell for automating tasks while working on a new contract. There has been an element of vibe programming some of the scripts, which is why I wished to collate a reference guide that anyone can have to hand. While working with PowerShell every day does help to reinforce the learning, it also helps to look up granular concepts on a more bite-sized level. This especially matters given PowerShell's object-oriented approach. After all, many of us build things up iteratively from little steps, which also allows for more flexibility. Using an AI is all very well, yet the fastest recall is always from your on head.
1. Variables and Basic Data Types
Variables start with a dollar sign and hold values you intend to reuse, so names like $date, $outDir and $finalDir become anchors for later operations. Dates are a frequent companion in filenames and logs, and PowerShell's Get-Date makes this straightforward. A format string such as Get-Date -Format "yyyy-MM-dd" yields values like 2025-10-27, while Get-Date -Format "yyyy-MM-dd HH:mm:ss" adds a precise timestamp that helps when tracing the order of events. Because these commands return text when a format is specified, you can stitch the results into other strings without fuss.
2. File System Operations
As soon as you start handling files, you will meet a cluster of commands that make navigation robust rather than fragile. Join-Path assembles folder segments without worrying about stray slashes, Test-Path checks for the existence of a target, and New-Item creates folders when needed. Moving items with Move-Item keeps the momentum going once the structure exists.
Environment variables give cross-machine resilience; reading $env:TEMP finds the system's temporary area, and [Environment]::GetFolderPath("MyDocuments") retrieves a well-known Windows location without hard-coding. Setting context helps too, so Set-Location acts much like cd to make a directory the default focus for subsequent file operations. You can combine these approaches, as in cd ([Environment]::GetFolderPath("MyDocuments")), which navigates directly to the My Documents folder without hard-coded paths.
Scripts are often paired with nearby files, and Split-Path $ScriptPath -Parent extracts a parent folder from a full path so you can create companions in the same place. Network locations behave like local ones, with Universal Naming Convention paths beginning \ supported throughout, and Windows paths do not require careful case matching because the file systems are generally case-insensitive, which differs from many Unix-based systems. Even simple details matter, so constructing strings such as "$Folder*" is enough for wildcard searches, with backslashes treated correctly and whitespace handled sensibly.
3. Arrays and Collections
Arrays are created with @() and make it easy to keep related items together, whether those are folders in a $locs array or filenames gathered into $progs1, $progs2 and others. Indexing retrieves specific positions with square brackets, so $locs[0] returns the first entry, and a variable index like $outFiles[$i] supports loop counters.
A single value can still sit in an array using syntax such as @("bm_rc_report.sas"), which keeps your code consistent when functions always expect sequences. Any collection advertises how many items it holds using the Count property, so checking $files.Count equals zero tells you whether there is anything to process.
4. Hash Tables
When you need fast lookups, a hash table works as a dictionary that associates keys and values. Creating one with @{$locs[0] = $progs1} ties the first location to its corresponding programme list and then $locsProgs[$loc] retrieves the associated filenames for whichever folder you are considering. This is a neat stepping stone to loops and conditionals because it organises data around meaningful relationships rather than leaving you to juggle parallel arrays.
5. Control Flow
Control flow is where scripts begin to feel purposeful. A foreach loop steps through the items in a collection and is comfortable with nested passes, so you might iterate through folders, then the files inside each folder, and then a set of search patterns for those files. A for loop offers a counting pattern with initialisation, a condition and an increment written as for ($i = 0; $i -lt 5; $i++). It differs from foreach by focusing on the numeric progression rather than the items themselves.
Counters are introduced with $i = 0 and advanced with $i++, which in turn blends well with array indexing. Conditions gate work to what needs doing. Patterns such as if (-not (Test-Path ...)) reduce needless operations by creating folders only when they do not exist, and an else branch can note alternative outcomes, such as a message that a search pattern was not found.
Sometimes there is nothing to gain from proceeding, and break exits the current loop immediately, which is an efficient way to stop retrying once a log write succeeds. At other times it is better to skip just the current iteration, and continue moves directly to the next pass, which proves useful when a file list turns out to be empty for a given pattern.
6. String Operations
Strings support much of this work, so several operations are worth learning well. String interpolation allows variables to be embedded inside text using "$variable" or by wrapping expressions as "$($expression)", which becomes handy when constructing paths like "psoutput$($date)".
Splitting text is as simple as -split, and a statement such as $stub, $type = $File -split "." divides a filename around its dot, assigning the parts to two variables in one step. This demonstrates multiple variable assignment, where the first part goes to $stub and the second to $type, allowing you to decompose strings efficiently.
When transforming text, the -Replace operator substitutes all occurrences of one pattern with another, and you can chain replacements, as in -replace $Match, $Replace -replace $Match2, $Replace2, so each change applies to the modified output of the previous one.
Building new names clearly is easier with braces, as in "${stub}_${date}.txt", which prevents ambiguity when variable names abut other characters. Escaping characters is sometimes needed, so using "." treats a dot as a literal in a split operation. The backtick character ` serves as PowerShell's escape character and introduces special sequences like a newline written as `n, a tab as `t and a carriage return as `r. When you need to preserve formatting across lines without worrying about escapes, here-strings created with @" ... "@ keep indentation and line breaks intact.
7. Pipeline Operations
PowerShell's pipeline threads operations together so that the output of one command flows to the next. The pipe character | links each stage, and commands such as ForEach-Object (which processes each item), Where-Object (which filters items based on conditions) and Sort-Object -Unique (which removes duplicates) become building blocks that shape data progressively.
Within these blocks, the current item appears as $_, and properties exposed by commands can be read with syntax like $_.InputObject or $_.SideIndicator, the latter being especially relevant when handling comparison results. With pipeline formatting, you can emit compact summaries, as in ForEach-Object { "$($_.SideIndicator) $($_.InputObject)" }, which brings together multiple properties into a single line of output.
A multi-stage pipeline filtering approach often follows three stages: Select-String finds matches, ForEach-Object extracts only the values you need, and Where-Object discards anything that fails your criteria. This progressive refinement lets you start broad and narrow results step by step. There is no compulsion to over-engineer, though; a simplified pipeline might omit filtering stages if the initial search is already precise enough to return only what you need.
8. Comparison and Matching
Behind many of these steps sit comparison and matching operators that extend beyond simple equality. Pattern matching appears through -notmatch, which uses regular expressions to decide whether a value does not fit a given pattern, and it sits alongside -eq, -ne and -lt for equality, inequality and numeric comparison.
Complex conditions chain with -and, so an expression such as $_ -notmatch '^%macro$' -and $_ -notmatch '^%mend$' ensures both constraints are satisfied before an item passes through. Negative matching in particular helps exclude unwanted lines while leaving the rest untouched.
9. Regular Expressions
Regular expressions define patterns that match or search for text, often surfacing through operators such as -match and -replace. Simple patterns like .log$ identify strings ending with .log, while more elaborate ones capture groups using parentheses, as in (sdtm.|adam.), which finds two alternative prefixes.
Anchors matter, so ^ pins a match to the start of a line and $ pins it to the end, which is why ^%macro$ means an entire line consists of nothing but %macro. Character classes provide shortcuts such as w for word characters (letters, digits or underscores) and s for whitespace. The pattern "GRCw*" matches "GRC" followed by zero or more word characters, demonstrating how * controls repetition. Other quantifiers like + (one or more) and ? (zero or one) offer further control.
Escaping special characters with a backslash turns them into literals, so . matches a dot rather than any character. More complex patterns like '%(m|v).*?(?=[,(;s])' combine alternation with non-greedy matching and lookaheads to define precise search criteria.
When working with matches in pipelines, $_.Matches.Value extracts the actual text that matched the pattern, rather than returning the entire line where the match was found. This proves essential when you need just the matching portion for further processing. The syntax can appear dense at first, but PowerShell's integration means you can test patterns quickly within a pipeline or conditional, refining as you go.
10. File Content Operations
Searching file content with Select-String applies regular expressions to lines and returns match objects, while Out-File writes text to files with options such as -Append and -Encoding UTF8 to control how content is persisted.
11. File and Directory Searching
Commands for locating files typically combine path operations with filters. Get-ChildItem retrieves items from a folder, and parameters like -Filter or -Include narrow results by pattern. Wildcards such as * are often enough, but regular expressions provide finer control when integrated with pipeline operations. Recursion through subdirectories is available with -Recurse, and combining these techniques allows you to find specific files scattered across a directory tree. Once items are located, properties like FullName, Name and LastWriteTime let you decide what to do next.
12. Object Properties
Objects exposed by commands carry properties that you can access directly. $File.FullName retrieves an absolute path from a file object, while names, sizes and modification timestamps are all available as well. Subexpressions introduced with $() evaluate an inner expression within a larger string or command, which is why $($File.FullName) is often seen when embedding property values in strings. However, subexpressions are not always required; direct property access works cleanly in many contexts. For instance, $File.FullName -Replace ... reads naturally and works as you would expect because the property access is unambiguous when used as a command argument rather than embedded within a string.
13. Output and Logging
Producing output that can be read later is easier if you apply a few conventions. Write-Output sends structured lines to the console or pipeline, while Write-Warning signals notable conditions without halting execution, a helpful way to flag missing files. There are times when command output is unnecessary, and piping to Out-Null discards it quietly, for example when creating directories. Larger scripts benefit from consistency and a short custom function such as Write-Log establishes a uniform format for messages, optionally pairing console output with a line written to a file.
14. Functions
Functions tie these pieces together as reusable blocks with a clear interface. Defining one with function Get-UniquePatternMatches { } sets the structure, and a param() block declares the inputs. Strongly typed parameters like [string[]] make it clear that a function accepts an array of strings, and naming parameters $Folder and $Pattern describes their roles without additional comments.
Functions are called using named parameters in the format Get-UniquePatternMatches -Folder $loc -Pattern '(sdtm.|adam.)', which makes the intent explicit. It is common to pass several arrays into similar functions, so a function might have many parameters of the same type. Using clear, descriptive names such as $Match, $Replace, $Match2 and $Replace2 leaves little doubt about intent, even if an array of replacement rules would sometimes reduce repetition.
Positional parameters are also available; when calling Do-Compare you can omit parameter names and rely on the order defined in param(). PowerShell follows verb-noun naming conventions for functions, with common verbs including Get, Set, New, Remove, Copy, Move and Test. Following this pattern, as in Multiply-Files, places your code in the mainstream of PowerShell conventions.
It is worth avoiding a common pitfall where a function declares param([string[]]$Files) but inadvertently reads a variable like $progs from outside the function. PowerShell allows this via scope inheritance, where functions can access variables from parent scopes, but it makes maintenance harder and disguises the function's true dependencies. Being explicit about parameters creates more maintainable code.
Simple functions can still do useful work without complexity. A minimal function implementation with basic looping and conditional logic can accomplish useful tasks, and a recurring structure can be reused with minor revisions, swapping one regular expression for another while leaving the looping and logging intact. Replacement chains are flexible; add as many -replace steps as are needed, and no more.
Parameters can be reused meaningfully too, demonstrating parameter reuse where a $Match variable serves double duty: first as a filename filter in -Include, then as a text pattern for -replace. Nested function calls tie output and input together, as when piping a here-string to Out-File (Join-Path ...) to construct a file path at the moment of writing.
15. Comments
Comments play a quiet but essential role. A line starting with # explains why something is the way it is or temporarily disables a line without deleting it, which is invaluable when testing and refining.
16. File Comparison
Comparison across datasets rounds out common tasks, and Compare-Object identifies differences between two sets, telling you which items are unique to each side or shared by both. Side indicators in the output are compact: <= shows the first set, => the second, and == indicates an item present in both.
17. Common Parameters
Across many commands, common parameters behave consistently. -Force allows operations that would otherwise be blocked and overwrites existing items without prompting in contexts that support it, -LiteralPath treats a path exactly as written without interpreting wildcards, and -Append adds content to existing files rather than overwriting them. These options smooth edges when you know what you want a command to do and prefer to avoid interactive questions or unintended pattern expansion.
18. Advanced Scripting Features
A number of advanced features make scripts sturdier. Automatic variables such as $MyInvocation.MyCommand.Path provide information about the running script, including its full path, which is practical for locating resources relative to the script file. Set-StrictMode -Version Latest enforces stricter rules that turn common mistakes into immediate errors, such as using an uninitialised variable or referencing a property that does not exist. Clearing the console at the outset with Clear-Host gives a clean slate for output when a script begins.
19. .NET Framework Integration
Integration with the .NET Framework extends PowerShell's reach, and here are some examples. For instance, calling [System.IO.Path]::GetFileNameWithoutExtension() extracts a base filename using a tested library method. To gain more control over file I/O, [System.IO.File]::Open() and System.IO.StreamWriter expose low-level handles that specify sharing and access modes, which can help when you need to coordinate writing without blocking other readers. File sharing options like [System.IO.FileShare]::Read allow other processes to read a log file while the script writes to it, reducing contention and surprises.
20. Error Handling
Error handling deserves a clear pattern. Wrapping risky operations in try { } catch { } blocks captures exceptions, so a script can respond gracefully, perhaps by writing a warning and moving on. A finally block can be added for clean-up operations that must run regardless of success or failure.
When transient conditions are expected, a retry logic pattern is often enough, pairing a counter with Start-Sleep to attempt an operation several times before giving up. Waiting for a brief period such as Start-Sleep -Milliseconds 200 gives other processes time to release locks or for temporary conditions to clear.
Alongside this, checking for null values keeps assumptions in check, so conditions like if ($null -ne $process) ensure that you only read properties when an object was created successfully. This defensive approach prevents cascading errors when operations fail to return expected objects.
21. External Process Management
Managing external programmes is a common requirement, and PowerShell's Start-Process offers a controlled route. Several parameters control its behaviour precisely:
The -Wait parameter makes PowerShell pause until the external process completes, essential for sequential processing where later steps depend on earlier ones. The -PassThru parameter returns a process object, allowing you to inspect properties like exit codes after execution completes. The -NoNewWindow parameter runs the external process in the current console rather than opening a new window, keeping output consolidated. If a command expects the Command Prompt environment, calling it via cmd.exe /c $cmd integrates cleanly, ensuring compatibility with programmes designed for the CMD shell.
Exit codes reported with $process.ExitCode indicate success with zero and errors with non-zero values in most tools, so checking these numbers preserves confidence in the sequence of steps. The script demonstrates synchronous execution, processing files one at a time rather than in parallel, which can be an advantage when dependencies exist between stages or when you need to ensure ordered completion.
22. Script Termination
Scripts need to finish in a way that other tools understand. Exiting with Exit 0 signals success to schedulers and orchestrators that depend on numeric codes, while non-zero values indicate error conditions that trigger alerts or retries.
Bringing It All Together
Because this is a granular selection, it leaves it to us to piece everything together to accomplish the tasks that we have to complete. In that way, we can embed the knowledge so that we are vibe coding all the time, ensuring that a more deterministic path is followed.
Platform-independent file deletion in SAS programming
11th September 2025There are times when you need to delete a file from within a SAS program that is not one of its inbuilt types, such as datasets or catalogues. Many choose to use X commands or the SYSTASK facility, even if these issue operating system-specific commands that obstruct portability. Also, it should be recalled that system administrators can make these unavailable to users as well. Thus, it is better to use methods that are provided within SAS itself.
The first step is to define a file reference pointing to the file that you need to delete:
filename myfile '<full path of file to be removed>';
With that accomplished, you can move on to deleting the file in a null data step, one where no output file is generated. The FDELETE function does that in the code below using the MYFILE file reference declared already. That step issues a return code used in the provision of feedback to the user, with 0 indicated success and non-zero values indicating failure.
data _null_;
rc = fdelete('myfile');
if rc = 0 then put 'File deleted successfully.';
else put 'Error deleting file. RC=' rc;
run;
With the above out of the way, you then can clear the file reference to tidy up things afterwards:
filename myfile clear;
Some may consider that SAS programs are single use items that may not be moved from one system to another. However, I have been involved in several of these, particularly between Windows, UNIX and Linux, so I know the value of keeping things streamlined and some SAS programs are multi-use anyway. Anything that cuts down on workload when there is much else to do cannot be discounted.
File comparison using PowerShell
16th August 2025In the past, I have compared files on the Linux/UNIX command line as well as the legacy Windows command line. Recently, I decided to try it using PowerShell. Here is the command structure:
Compare-Object (Get-Content ".\[name of one text file]") (Get-Content ".\[name of another text file]") > [path and name of output file]
Admittedly, this is more verbose than the others that I have mentioned above. Nevertheless, it does the job and sends everything to a text file for review. The Compare-Object piece does the comparison once the Get-Content portions have read in the content.
Getting rsync to resolve symbolic links
11th September 2024Given how Dropbox changed its handling of symbolic links in 2019 such that internal links within a Dropbox file hierarchy got fixed and links leading outside from the Dropbox area no longer worked. Thankfully, the rsync utility found in many Linux and UNIX settings does not do that, as long as you have called it correctly.
By default, symbolic links are synchronised like any other file. That is what Dropbox does now. To get rsync to resolve the links as shortcuts to either a single file or more likely a folder containing more than one file, it needs the -L switch or option in the command. When that is present, the linked file or files will get synchronised and honours the point of having these links in the first place: allowing more flexibility with folder structures and avoiding any duplication of files and folders.
Generating PNG files in SAS using ODS Graphics
21st December 2019Recently, I had someone ask me how to create PNG files in SAS using ODS Graphics, so I sought out the answer for them. Normally, the suggestion would have been to create RTF or PDF files instead, but there was a specific need that needed a different approach. Adding something like the following lines before an SGPLOT, SGPANEL or SGRENDER procedure should do the needful:
ods listing gpath='E:\';
ods graphics / imagename="test" imagefmt=png;
Here, the ODS LISTING statement declares the destination for the desired graphics file, while the ODS GRAPHICS statement defines the file name and type. In the above example, the file test.png would be created in the root of the E drive of a Windows machine. However, this also works with Linux or UNIX directory paths.
Updating Piwik using the Linux Command Line
28th November 2016Because updating Piwik using its web interface has proved tempestuous, I have decided to update the self-hosted analytics application on an SSH session. The production web servers that I use are hosted on Linux systems, so that is why any commands apply to the Linux or UNIX command line only. What is needed for Windows servers may differ.
The first step is to down the required ZIP file with this command:
wget https://builds.piwik.org/piwik.zip
Once the download is complete, the contents of the ZIP archive are extracted into a new subfolder. This is a process that I carry out in a separate folder to that where the website files are kept before copying everything from the extraction folder in there. Here is the unzip command, and the -o switch turns on overwriting of any previously existing files:
unzip -o piwik.zip
Without the required folder in the web server area to be updated, the next step is to do the actual system update that includes any updates to the Piwik database that you are using. There are two commands that you can use once you have specified the location of your Piwik installation. The second is needed when the first option cannot find where the PHP executable is stored. My systems had something more specific than these because both PHP 5.6 and PHP 7.0 are installed. Looking in /usr/bin was enough to find what I needed to execute in place of PHP below. Otherwise, the command was the same.
./[path to piwik]/console core:update
php [path to piwik]/console core:update
While the upgrade is ongoing, it prompts you to permit it to continue before it goes and modifies the database. This did not take long on my systems, but that depends on how much data there is. Once, the process has completed, you can delete any extraneous files using the rm command.
Creating soft and hard symbolic links using the Windows command line
19th August 2015In the world of UNIX and Linux, symbolic links are shortcuts, but they do not work like normal Windows shortcuts because you do not jump from one location to another with the file manager's address bar changing what it shows. Instead, it is as if you see the contents of the directory at another quicker to access location in the file system, and the same sort of thinking applies to files too. In some ways, it is like giving files and directories alternative aliases. There are soft links that point to the name of a given directory or file, and hard links that point to actual files or directories.
For a long time, I was under the mistaken impression that such things did not exist on Windows until I came across the mklink command, which came with the launch of Windows Vista at the start of 2007. While this feature might not be widely known, it demonstrates that Windows did adopt some UNIX and Linux capability long before other UNIX-like features, such as virtual desktops, were introduced in Windows 10.
By default, the aforementioned command sets up symbolic links to files and the /D switch allows the same to be done for directories too. The /H switch makes a hard link instead of a soft link, so we get much of the functionality of the ln command in UNIX and Linux. Here is an example that creates a soft symbolic link for a directory:
mklink /D shortcut target_directory
Above, shortcut is the name of the symbolic link file and target_directory is the destination to which it links. In my experience, it works best for destinations beyond your home folder and, from what I have read, hard links may not be possible across different disks either.
Windows commands for setting default applications for opening certain file types
18th August 2015On Friday, I was working on a system where a session is instantiated from a stored virtual machine that produces a fresh session every time, meaning that all previous changes get lost. What I have is a batch script that I run to reinstate what I need, and I encountered another task that I wanted it to do.
Part of my work involves the creation of plain text files with the extension lst and this is getting associated with SAS instead of Notepad. While you can reassign such associations using the GUI, it would be a bonus to do it via the command line too, so the assoc and ftype commands caught my interest. The first of these associates a file with a given extension with a desired file type, while the second shows the available file types together with the associated applications that open them. The assoc command also shows all the associations that are in place when it is executed with no parameters, and the ftype command does the same for file types.
Once you have picked out a file type with the ftype command, then the assoc can be used like the following:
assoc .lst=txtfile
The above associates an extension with a file type. In this, .lst files are going to get opened by Notepad because of the txtfile association. Though it did not do what I wanted on Friday due to system lockdown, it is good to know that this is possible and that even the Windows command line supports goodies like these.
Copying only updated new or updated files by command line in Linux or Windows
2nd August 2014With a growing collection of photographic images, I often find myself making backups of files using copy commands and the data volumes are such that I don't want to keep copying the same files over and over again, so incremental file transfers are what I need. So commands like the following often get issued from a Linux command line:
cp -pruv [source] [destination]
Because this is on Linux, it is the bash shell that I use, so the switches may not apply with others like ssh, fish or ksh. For my case, p preserves file properties such as its time and date and the cp command does not do this always, so it needs adding. The r switch is useful because the copy then in recursive, so only a directory needs to be specified as the source and the destination needs to be one level up from a folder with the same name there to avoid file duplication. It is the u switch that makes the file copy incremental, and the v one issues messages to the shell that show how the copying is going. Seeing a file name issued by the latter does tell you how much more needs to be copied and that the files are going where they should.
What inspired this post though is my need to do the same in a Windows session, and issuing xcopy commands will achieve the same end. Here are two that will do the needful:
xcopy [source] [destination] /d /s
xcopy [source] [destination] /d /e
In both cases, it is the d switch that ensures that the copy is incremental, and you can add a date too, with a colon between it and the /d, if you see fit. The s switch copies only directories that contain files, while the e one copies even empty directories. Using the d switch without either of those did not trigger any copying action when I tried, so I reckon that you cannot do without either of them. By default, both of these commands issue output to the command line so you can keep an eye on what is happening, and this especially is useful when ensuring that files are going to the right destination because the behaviour differs from that of the bash shell on Linux.
Renaming multiple files in Linux
19th August 2012The Linux and UNIX command mv has a number of limitations, such as not overwriting destination files and not renaming multiple files using wildcards. The only solution to the first that I can find is one that involves combining the cp and rm commands. For the second, there's another command: rename. Here's an example like what I used recently:
rename s/fedora/fedora2/ fedora.*
The first argument in the above command is a regular expression much like what Perl is famous for implementing; in fact, it is Perl-compatible ones (PCRE) that are used. The s before the first slash stands for substitute, with fedora being the string that needs to be replaced and fedora2 being what replaces it. The third command is the file name glob that you want to use, fedora.* in this case. Therefore, all files in a directory named fedora will be renamed fedora2 regardless of the file type. The same sort of operation can be performed for all files with the same extension when it needs to be changed, htm to html, for instance. Of course, there are other uses, but these are handy ones to know.