Using a BASH command to count the files in a directory
12th March 2024As part of my backup workflow, I maintain a machine running OpenMediaVault that I only power up when backups are to be performed. Typically, this often happens when I have new photography images to load, and I have a NAS that acts as an online backup system. The OpenMediaVault machine is a near-offline counterpart to the NAS for added safety.
Recently, I needed to check on the number of image files in a directory from an SSH session because of a need to create a new repository for 2024. Some files from this year had ended up in the 2023 one, and I needed to be sure that nothing from last year ended in the 2024 folder, or vice versa. Getting a file count from a trusted source was a quick way of doing exactly this.
Due to clumsiness with the NAS, I had to do this using the OpenMediaVault machine. While I could go mounting drives on an interim basis, it was quicker to work from a BASH session. The trick was to use the wc command for counting the lines output by an invocation of the ls command. An example follows:
ls -l | wc -l
The -l (as in l for Lima) switch forces wc to count lines, while the counterpart (same letter) for ls forces it to list the contents in long form, one item per line. Thus, counting the number of lines gets you the count of the number of files. The call to the ls command can be customised to add other things life the number of dot files, but the above was enough for my purposes. When the files in both 2023 directories matched, I was satisfied that all was in order.
AttributeError: module 'PIL' has no attribute 'Image'
11th March 2024One of my websites has an online photo gallery. This has been a long-term activity that has taken several forms over the years. Once HTML and JavaScript based, it then was powered by Perl before PHP and MySQL came along to take things from there.
While that remains how it works, the publishing side of things has used its own selection of mechanisms over the same time span. Perl and XML were the backbone until Python and Markdown took over. There was a time when ImageMagick and GraphicsMagick handled image processing, but Python now does that as well.
That was when the error message gracing the title of this post came to my notice. Everything was working well when executed in Spyder, but the message appears when I tried running things using Python on the command line. PIL is the abbreviated name for the Python 3 pillow package; there was one called PIL in the Python 2 days.
For me, pillow loads, resizes and creates new images, which is handy for adding borders and copyright/source information to each image as well as creating thumbnails. All this happens in memory and that makes everything go quickly, much faster than disk-based tools like ImageMagick and GraphicsMagick.
Of course, nothing is going to happen if the package cannot be loaded, and that is what the error message is about. Linux is what I mainly use, so that is the context for this scenario. What I was doing was something like the following in the Python script:
import PIL
Then, I referred to PIL.Image when I needed it, and this could not be found when the script was run from the command line (BASH). The solution was to add something like the following:
from PIL import Image
That sorted it, and I must have run into trouble with PIL.ImageFilter too, since I now load it in the same manner. In both cases, I could just refer to Image or ImageFilter as I required and without the dot syntax. However, you need to make sure that there is no clash with anything in another loaded Python package when doing this.
Upgrading Julia packages
23rd January 2024Whenever a new version of Julia is released, I have certain actions to perform. With Julia 1.10, installing and updating it has become more automated thanks to shell scripting or the use of WINGET, depending on your operating system. Because my environment predates this, I found that the manual route still works best for me, and I will continue to do that.
Returning to what needs doing after an update, this includes updating Julia packages. In the REPL, this involves dropping to the PKG subshell using the ] key if you want to avoid using longer commands or filling your history with what is less important for everyday usage.
Previously, I often ran code to find a package was missing after updating Julia, so the add command was needed to reinstate it. That may raise its head again, but there also is the up command for upgrading all packages that were installed. This could be a time saver when only a single command is needed for all packages and not one command for each package as otherwise might be the case.
OWASP Top 10 for Large Language Model Applications
21st January 2024OWASP stands for Open Web Application Security Project, and it is an online community dedicated to web application security. They are well known for their Top 10 Web Application Security Risks and late last year, they added a Top 10 for
Large Language Model (LLM) Applications.
Given that large language models made quite a splash last year, this was not before time. ChatGPT gained a lot of attention (OpenAI also has had DALL-E for generation of images for quite a while now), there are many others with Anthropic Claude and Perplexity also being mentioned more widely.
Figuring out what to do with any of these is not as easy as one might think. For someone more used to working with computer code, using natural language requests is quite a shift when you no longer have documentation that tells what can and what cannot be done. It is little wonder that prompt engineering has emerged as a way to deal with this.
Others have been plugging in LLM capability into chatbots and other applications, so security concerns have come to light, so far, I have not heard anything about a major security incident, but some are thinking already about how to deal with AI-suggested code that others already are using more and more.
Given all that, here is OWASP's summary of their Top 10 for LLM Applications. This is a subject that is sure to draw more and more interest with the increasing presence of artificial intelligence in our everyday working and no-working lives.
LLM01: Prompt Injection
This manipulates an LLM through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.
LLM02: Insecure Output Handling
This vulnerability occurs when an LLM output is accepted without scrutiny, exposing backend systems. Misuse may lead to severe consequences such as Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), Server-Side Request Forgery (SSRF), privilege escalation, or remote code execution.
LLM03: Training Data Poisoning
This occurs when LLM training data are tampered, introducing vulnerabilities or biases that compromise security, effectiveness, or ethical behaviour. Sources include Common Crawl, WebText, OpenWebText and books.
LLM04: Model Denial of Service
Attackers cause resource-heavy operations on LLMs, leading to service degradation or high costs. The vulnerability is magnified due to the resource-intensive nature of LLMs and the unpredictability of user inputs.
LLM05: Supply Chain Vulnerabilities
LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks. Using third-party datasets, pre-trained models, and plugins can add vulnerabilities.
LLM06: Sensitive Information Disclosure
LLMs may inadvertently reveal confidential data in its responses, leading to unauthorized data access, privacy violations, and security breaches. It’s crucial to implement data sanitization and strict user policies to mitigate this.
LLM07: Insecure Plugin Design
LLM plugins can have insecure inputs and insufficient access control. This lack of application control makes them easier to exploit and can result in consequences such as remote code execution.
LLM08: Excessive Agency
LLM-based systems may undertake actions leading to unintended consequences. The issue arises from excessive functionality, permissions, or autonomy granted to the LLM-based systems.
LLM09: Overreliance
Systems or people overly depending on LLMs without oversight may face misinformation, miscommunication, legal issues, and security vulnerabilities due to incorrect or inappropriate content generated by LLMs.
LLM10: Model Theft
This involves unauthorized access, copying, or exfiltration of proprietary LLM models. The impact includes economic losses, compromised competitive advantage, and potential access to sensitive information.
Adding titles and footnotes to Excel files created using SAS
14th August 2023Using the Excel and ExcelXP destinations in the Output Delivery System (ODS), SAS can generate reports as XLSX workbooks with one or more worksheets. Recently, I was updating a SAS Macro that created one of these and noticed that there were no footnotes. The fix was a simple: add to the options specified on the initial ODS Excel statement.
ods excel file="&outdir./&file_name..xlsx" options(embedded_titles="yes" embedded_footnotes="yes");
Notice in the code above that there are EMBEDDED_TITLES and EMBEDDED_FOOTNOTES options. Without both of these being set to YES, no titles or footnotes will appear in a given worksheet, even if they have been specified in a program using TITLE or FOOTNOTE statements. In my case, it was the EMBEDDED_FOOTNOTES option that was missing, so adding that set things to rights.
The thing applies to the ExcelXP tag set, as you will find from a code sample that SAS has shared on their website. That was what led me to the solution to what was happening in the Excel ODS destination in my case.
Using associative arrays in scripting for BASH version 4 and above
29th April 2023Associated arrays get called different names in different computing languages: dictionaries, hash tables and so on. What is held in common is that they essentially are lists of key value pairs. In the case of BASH, you need at least version 4 to make use of this facility. In Linux Mint, I get 5.1.16, but macOS users apparently are still on BASH 3, so this post may not help them.
To declare an associative array in a later version of BASH, the following command gets issued:
declare -A hashtable
The code to add a key value pair then takes the following form:
hashtable[key1]=value1
Several values can be added to an empty array like this:
hashtable=( ["key1"]="value1" ["key2"]="value2" )
Declaration and instantiation of an associative can be done in the same line as follows:
declare -A hashtable=( ["key1"]="value1 ["key2"]="value2")
Handily, it is possible to loop through the entries in an associative array. It is possible to do this for keys and for values, once you expand out the appropriate list. The following expands a list of values:
"${hashtable[@]}"
Expanding a list of values needs something like this:
"${!hashtable[@]}"
Looping through a list of values needs something like the following:
for val in "${hashtable[@]}"; do echo "$val"; done;
The above has been placed on a single line with semicolon delimiters for brevity, but this can be put on several lines with no semicolons for added clarity as long as correct indentation is followed. It is also possible to similarly loop through a list of keys:
for key in "${!hashtable[@]}"; do echo "key: $key, value ${hashtable[$key]}"; done;
For the example associative array declared earlier, the last line produces this output, resolving the value using the supplied key:
key: key2, value value2
key: key1, value value1
All of this found a use in a script that I created for adding new Markdown files to a Hugo instance because there was more than one shortcode that I wished to apply due to my having more than one content directory in use.
String replacement in BASH scripting
28th April 2023During creation of new posts for a Hugo deployed website, I found myself using the same directories again and again. Since I invariably ended up making typing mistakes when I did so, I fancied the idea of using shortcodes instead.
Because I wanted to turn the shortcode into the actual directory name, I chose the use of text replacement in BASH scripting. Thankfully, this is simple and avoids the use of regular expressions, which can bring their own problems. The essential syntax is as follows:
variable="${variable/search text/replacement}"
For the variable, the search text is substituted with the replacement straightforwardly. It is even possible to include the search and replacement text in variables. In the example below, this is achieved using variables called original and replacement.
variable="${variable/$original/$replacement}"
Doing this got me my translatable shortcodes and converted them into actual directory names for the hugo command to process. There may be other uses yet.
Dealing with Error 1064 in MySQL queries
27th April 2023Recently, I was querying a MySQL database table and got a response like the following:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use
The cause was that one of the data column columns was named using the MySQL reserved word key. While best practice is not to use reserved words like this at all, this was not something that I could alter. Therefore, I enclosed the offending keyword in backticks (`) to make the query work.
There is more information in the MySQL documentation on Schema Object Names and on Keywords and Reserved Words for dealing with or avoiding this kind of situation. While I realise that things change over time and that every implementation of SQL is a little different, it does look as if not using established keywords should be a minimum expectation when any database tables get created.
Another way to supply the terminal output of one BASH command to another
26th April 2023My usual way for sending the output of one command to another is to be place one command after another, separated by the pipe (|) operator, adjusting the second command as needed. However, I recently found that this approach does not work well for docker pull commands until I uncovered another option.
The solution is to enclose the input command in $( ) within the output command. Within the parentheses, any kind of command can be declared and includes anything with piping as part of it. As long as text is being printed to the terminal, it can be fed to the second command and used as required. Thus, you can have something like the following:
docker pull $([command outputting name of image to download])
This approach has helped with other kinds of automation of docker image and container use and deployment because it is so general. There may be other uses found for the approach yet.
Changing the number of lines produced by the tail command in a Linux or UNIX session
25th April 2023Since I often use the tail command to look at the end of a log file and occasionally in combination with the watch command for constant updates, I got to wondering if the number of lines issued by the tail command could be changed. That is a simple thing to do with the -n switch. All you need is something like the following:
tail -n 20 logfile.log
Here the value of 20 is the number of lines produced when it would be 10 by default, and logfile.log gets replaced by the path and name of what you are examining. One thing to watch is that your terminal emulator can show all the lines being displayed. If you find that you are not seeing all the lines that you expect, then that might be the cause.
While you could find this by looking through the documentation, things do not always register with you during dry reading of something laden with lists of parameters or switches. That is an affliction with tools that do a lot and/or allow a lot of customisation.