Technology Tales

A new look

11th October 2021

Things have been changing on here. Much of that has been behind the scenes with a move to a new VPS for extra speed and all the upheaval that brings. It also gained me a better system for less money than the old upgrade path was costing me and everything feels more responsive as well. Extra work has gone into securing the website as well and I have learned a lot as that has progressed. New lessons were added to older, and sometimes forgotten, ones.

The more obvious change for those who have been here before is that the visual appearance has been refreshed. A new theme has been applied with a multitude of tweaks to make it feel unique and to iron out any rough edges that there may be. This remains a WordPress-based website and new theme is a variant of the Appointee subtheme of the Appointment theme. WordPress does only supports child theming but not grandchild theming so I had to make a copy of Appointee of my own so I could modify things as I see fit.

To my eyes, things do look cleaner, crisper and brighter so I hope that it feels the same to you. Like so many designs these days, the basis is the Bootstrap framework and that is no bad thing in my mind though the standardisation may be too much for some tastes. What has become challenging is that it is getter harder to find new spins on more traditional layouts with everything going for a more magazine-like appearance and summaries being shown on the front page instead of complete articles. That probably reflects how things are going for websites these days so it may be that the next refresh could be more home grown and that is a while away yet.

As the website heads towards its sixteenth year, there is bound to be continuing change. In some ways, I prefer that some things remain unchanged so I use the classic editor instead of Gutenburg because that works best for me. Block-based editing is not for me since I prefer to tinker with code anyway. Still, not all of its influences can be avoided and I have needed to figure out the new widgets interface. It did not feel that intuitive but I suppose that I will grow accustomed to it.

My interest in technology continues even if it saddens me at time and some things do not impress me; the Windows 11 taskbar is one of those so I will not be in any hurry to move away from Windows 10. Still, the pandemic has offered its own learning with virtual conferencing allowing one to lurk and learn new things. For me, this has included R, Python, Julia and DevOps among other things. That proved worthwhile during a time with many restrictions. All that could yield more content yet and some already is on the way.

As ever, it is my own direct working with technology that yields some real niche ideas that others have not covered. With so many technology blogs out there, they may be getting less and less easy to find but everyone has their own journey so I hope to encounter more of them. There remain times when doing precedes telling and that is how it is on here. It is not all about appearances since content matters as much as it ever did.

Something to watch with the SYSODSESCAPECHAR automatic SAS macro variable

10th October 2021

Recently, a client of mine updated one of their systems from SAS 9.4 M5 to SAS 9.4 M7. In spite of performing due diligence regarding changes between the maintenance release, a change in behaviour of the SYSODSESCAPECHAR automatic macro variable surprised them. The macro variable captures the assignment of the ODS escape character used to prefix RTF codes for page numbering and other things. That setting is made using an ODS ESCAPECHAR statement like the following:

ods escapechar="~";

In the M5 release, the tilde character in this example was output by the automatic macro variable but that changed in the M7 release to 7E, the hexadecimal code for the same and this tripped up one of their validated macro programs used in output production. The adopted solution was to use the escape sequence (*ESC*) that gave the same outcome that was there before the change. That was less verbose than alternative code changing the hexadecimal code into the expected ASCII character that follows.

data _null_; call symput("new",byte(input("&sysodsescapechar.",hex.))); run;

The above supplies a hexadecimal code to the BYTE function for correct rendering with the SYMPUT routine assigning the resulting value to a macro variable named new. Just using the escape sequence is far more succinct though there is now an added validation need once user pilot testing has completed. In my line of business, the updating of code is the quickest part of many such changes; documentation and testing always take longer.

Limiting Google Drive upload & synchronisation speeds using Trickle

9th October 2021

Having had a mishap that lost me some photos in the early days of my dalliance with digital photography, I have been far more careful since then and that now applies to other files as well. Doing regular backups is a must that you find reiterated by many different authors and the current computing climate makes doing that more vital than it ever was.

So, as well as having various local backups, I also have remote ones in the form of OneDrive, Dropbox and Google Drive. These more correctly are file synchronisation services but disciplined use can make them useful as additional storage facilities in the interests of maintaining added resilience. There also are dedicated backup services that I have seen reviewed in the likes of PC Pro magazine but I have to make use of those.

Insync

Part of my process for dealing with new digital photo files is to back them up to Google Drive and I did that with a Windows client in the early days but then moved to Insync running on Linux Mint. One drawback to the approach is that this hogs the upload bandwidth of an internet connection that has yet to move to fibre from copper cabling. Having fibre connections to a local cabinet helps but a 100 KiB/s upload speed is easily overwhelmed and digital photo file sizes keep increasing. It does not help that I insist on using more flexible raw formats like DNG, CR2 or CR3 either.

Making fewer images could help to cut the load but I still come away from an excursion with many files because I get so besotted with my surroundings. This means that upload sessions take numerous hours and can extend across calendar days. Ultimately, this makes my internet connection far less usable so I want to throttle upload speed much like what is possible in the Transmission BitTorrent client or in the Dropbox client. Unfortunately, this is not available in Insync so I have tried using the trickle command instead and an example is below:

trickle -d 2000 -u 50 insync

Here, the upload speed is limited to 50 KiB/s while the download speed is limited to 2000 KiB/s. In my case, the latter of these hardly matters while the former leaves me with acceptable internet usability. Insync does not work smoothly with this, however, so occasional restarts are needed to keep file uploads progressing and CPU load also is higher. As rough as the user experience feels, uploads can continue in parallel with other work.

gdrive

One other option that I am exploring is the use of the command-line tool gdrive and this appears to work well with trickle. After downloading and installing the tool, getting going is a matter of issuing the following command and following the instructions:

gdrive about

On web servers, I even have the tool backing up things to Google Drive on a scheduled basis. Because of a Google Drive limitation that I have encountered not only with gdrive but also with Insync and Google’s own Windows Google Drive client, synchronisation only can happen with two new folders, one local and the other remote. Handily, gdrive supports the usual bash style commands for working with remote directories so something like the following will create a directory on Google Drive:

gdrive mkdir ttdc [ID for parent folder]

Here, the ID for the parent folder may be omitted but it can be obtained by going to Google Drive online and getting a link location by right-clicking on a folder and choosing the appropriate context menu item. This gets you something like the following and the required identifier is found between the last slash and the first question mark in the address string (so as not to share any real links, I made the address more general below):

https://drive.google.com/drive/folders/[remote folder ID]?usp=sharing

Then, synchronisation uses a command like the following:

gdrive sync upload [local folder or file path] [remote folder ID]

There also is the option to do a one-way upload and this is the form of the command used:

gdrive upload [local folder or file path] -p [remote folder ID]

Because every file or folder object has its own ID on Google Drive, it is possible to create two objects on there that appear to have the same name though that is sure to cause confusion even if you know what is happening. It is possible in each of the above to throttle them using trickle as well:

trickle -d 2000 -u 50 gdrive sync upload [local folder or file path] [remote folder ID] trickle -d 2000 -u 50 gdrive upload [local folder or file path] -p [remote folder ID]

Handily, this works without the added drama seen with Insync and lends itself to scripting as well so it could be something that I will incorporate into my current workflow. One thing that needs to be watched is file upload failures but there may be ways to catch those and retry them so that would another thing that needs doing. This is built into Insync and it would be a learning opportunity if I was to stick with gdrive instead.

When CRON is stalled by incorrect file and folder permissions

8th October 2021

During the past week, I rebooted my system only to find that a number of things no longer worked and my Pi-hole DNS server was among them. Having exhausted other possibilities by testing out things on another machine, I did a status check when I spotted a line like the following in my system logs and went investigating further:

cron[322]: (root) INSECURE MODE (mode 0600 expected) (crontabs/root)

It turned out to be more significant than I had expected because this was why every CRON job was failing and that included the network set up needed by Pi-hole; a script is executed using the @reboot directive to accomplish this and I got Pi-hole working again by manually executing it. The evening before, I did make some changes to file permissions under /var/www but I was not expecting it to affect other parts of /var though that may have something to do with some forgotten heavy-handedness. The cure was to issue a command like the following for execution in a terminal session:

sudo chmod -R 600 /var/spool/cron/crontabs/

Then, CRON itself needed starting since it had not not running at all and executing this command did the needful without restarting the system:

sudo systemctl start cron

That outcome was proved by executing the following command to issue some terminal output that include the welcome text “active (running)” highlighted in green:

sudo systemctl status cron

There was newly updated output from a frequently executing job that checked on web servers for me but this was added confirmation. It was a simple solution to a perplexing situation that led up all sorts of blind alleys before I alighted on the right solution to the problem.

Some books and other forms of documentation on R

11th September 2021

The thrust of an exhortation from a computing handbook publisher comes to mind here: don’t just look things up on Google, read a book so you really understand what you are doing. Something like those words was used to sell an eBook on Github but the same sentiment applies to R or any other computing language. Using a search engine will get you going or add to existing knowledge but only a book or a training course will help to embed real competence.

In the case of R, there is a myriad of blogs out there that can be consulted as well as function and package documentation on RDocumentation or rrdr.io. For the former, R-bloggers or R Weekly can make good places to start while ones like Stats and R, Statistics Globe, STHDA, PSI’s VIS-SIG and anything from Posit (including their main blog as well as their AI one) can be worth consulting. Additionally, there is also RStudio Education and the NHS-R Community, which also have a Github repository together with a YouTube channel. Many packages have dedicated websites as well so there is no lack of documentation with all of these so here is a selection:

Tidyverse

forcats

tidyr

Distill for R Markdown

To come to the real subject of this post, R is unusual in that books that you can buy also have companions websites that contain the same content with the same structure. Whatever funds this approach (and some appear to be supported by RStudio itself by the looks of things), there certainly are a lot of books available freely online in HTML as you will see from the list below while a few do not have a print counterpart as far as I know:

Big Book of R

R Programming for Data Science

Hands-On Programming with R

Advanced R

Cookbook for R

R Graphics Cookbook

R Markdown: The Definitive Guide

R Markdown Cookbook

RMarkdown for Scientists

bookdown: Authoring Books and Technical Documents with R Markdown

blogdown: Creating Websites with R Markdown

pagedown: Create Paged HTML Documents for Printing from R Markdown

Dynamic Documents with R and knitr

Mastering Shiny

Engineering Production-Grade Shiny Apps

Outstanding User Interfaces with Shiny

R Packages

Mastering Spark with R

Happy Git and GitHub for the useR

JavaScript for R

HTTP Testing in R

Outstanding User Interfaces with Shiny

Engineering Production-Grade Shiny Apps

The Shiny AWS Book

Many of the above have counterparts published by O’Reilly or Chapman & Hall, to name the two publishers that I have found so far. Aside from sharing these with you, there is also the personal motivation of having the collection of links somewhere so I can close tabs in my Firefox session. There are other web articles open in other tabs that I need to retain and share but these will need to do for now and I hope that you find them as useful as I do.

A little bit of abstraction

21st August 2021

A little bit of abstraction

Data science has remained in my awareness since 2017 though my work is more on its fringes in clinical research. In fact, I have been involved more in the standardisation and automation of more traditional data reporting than in the needs of data modelling such as data engineering or other similar disciplines. Much of this effort has meant the use of SAS, with which I have programmed since 2000 and for which I have a licence (an expensive commodity, it has to be said), but other technologies are being explored with R, Python and Julia being among them.

The change in technological scope does bring an element of excitement and new interest but there is also some sadness when tried and trusted technologies meet with newer competition and valued skills are no longer as career securing as they once were. Still, there is plenty of online training out there and I already have collected some of my thoughts on this. The learning continues and the need for repositioning is also clear.

A little bit of abstraction

The journey also has brought some curios to my notice. One of these is This Person Does Not Exist, a website building photos of non-existent faces using machine learning. Recently, I learned of others like it such as This Artwork Does Not Exist, This Cat Does Not Exist, This Horse Does Not Exist, and This Chemical Does Not Exist. The last of these probably should be entitled “This Molecule Does Not Exist (Yet)” since it is a fictitious molecular structure that has been created and what you get is an actual moving image that spins it around in three-dimensional space. The one with dynamically generated abstract art is the main inspiration for this piece and is of more interest to me while the other two are more explanatory though the horse website is not so successful in its execution and one can ask why we need more cat pictures.

To some, the idea of creating fake pictures may feel a little foreboding and that especially applies to photos of people and the livelihoods of any content creators. Nevertheless, these sources of imagery have their legitimate uses such as decorating websites or brochures and that is where my interest is piqued. After all, there are some subjects where pictures can be scarce so any form of decoration that enlivens an article has to have some use. Technology websites like this one can feature images too with screenshots and device photos being commonplace but they can all look like each other, hence the need for a little more variety and having pictures often increases the choice of website themes as well since so many need images to make them work or stand out. As ever, being sparing with any new innovations remains in order so that is how I approach this matter as well.

When a hard drive is unrecognised by the Linux hddtemp command

15th August 2021

One should not do a new PC build in the middle of a heatwave if you do not want to be concerned about how fast fans are spinning and how hot things are getting. Yet, that is what I did last month after delaying the act for numerous months.

My efforts mean that I have a system built around an AMD Ryzen 9 5950X CPU and a Gigabyte X570 Aorus Pro with 64 GB of memory and things are settling down after the initial upheaval. That also meant some adjustments to the CPU fan profile in the BIOS for quieter running while the the use of Be Quiet! Dark Rock 4 cooler also helps as does a Be Quiet! Silent Wings 3 case fan. All are components from trusted brands though I wonder how much abuse they got during their installation and subsequent running in.

Fan noise is a non-quantitative indicator of heat levels as much as touch so more quantitative means are in order. Aside from using a thermocouple device, there are in-built sensors too. My using Linux Mint means that I have the sensors command from the lm-sensors package for checking on CPU and other temperatures though hddtemp is what you need for checking on the same for hard drives. The latter can be used as follows:

sudo hddtemp /dev/sda /dev/sdb

This has to happen using administrator access and a list of drives needs to be provided because it cannot find them by itself. In my case, I have no mechanical hard drives installed in non-NAS systems and I even got to replacing a 6 TB Western Digital Green disk with an 8 TB SSD but I got the following when I tried checking on things with hddtemp:

WARNING: Drive /dev/sda doesn't seem to have a temperature sensor. WARNING: This doesn't mean it hasn't got one. WARNING: If you are sure it has one, please contact me ([email protected]). WARNING: See --help, --debug and --drivebase options. /dev/sda: Samsung SSD 870 QVO 8TB: no sensor

The cause of the message for me was that there is no entry for Samsung SSD 870 QVO 8TB in /etc/hddtemp.db so that needed to be added there. Before that could be rectified, I needed to get some additional information using smartmontools and these needed to be installed using the following command:

sudo apt-get install smartmontools

What I needed to do was check the drive’s SMART data output for extra information and that was achieved using the following command:

sudo smartctl /dev/sda -a | grep -i Temp

What this does is to look for the temperature information from smartctl output using the grep command with output from the first being passed to the second through a pipe. This yielded the following:

190 Airflow_Temperature_Cel 0x0032 072 050 000 Old_age Always - 28

The first number in the above (190) is the thermal sensor’s attribute identifier and that was needed in what got added to /etc/hddtemp.db. The following command added the necessary data to the aforementioned file:

echo \"Samsung SSD 870 QVO 8TB\" 190 C \"Samsung SSD 870 QVO 8TB\" | sudo tee -a /etc/hddtemp.db

Here, the output of the echo command was passed to the tee command for adding to the end of the file. In the echo command output, the first part is the name of the drive, the second is the heat sensor identifier, the third is the temperature scale (C for Celsius or F for Fahrenheit) and the last part is the label (it can be anything that you like but I kept it the same as the name). On re-running the hddtemp command, I got output like the following so all was as I needed it to be.

/dev/sda: Samsung SSD 870 QVO 8TB: 28°C

Since then, temperatures may have cooled and the weather become more like what we usually get but I am still keeping an eye on things, especially when the system is put under load using Perl, R, Python or SAS. There may be further modifications such as changing the case or even adding water cooling, not least to have a cooler power supply unit, but nothing is being rushed as I monitor things to my satisfaction.

Changing the UUID of a VirtualBox Virtual Disk Image in Linux

11th July 2021

Recent experimentation centring around getting my hands on a test version of Windows 11 had me duplicating virtual machines and virtual disk images though VirtualBox still is not ready for the next Windows version; it has no TPM capability at the moment. Nevertheless, I was able to get something after a fresh installation that removed whatever files were on the disk image. That meant that I needed to mount the old version to get at those files again.

Renaming partially helped with this but what I really needed to do was change the UUID so VirtualBox would not report a collision between two disk images with the same UUID. To avoid this, the UUID of one of the disk images had to be changed and a command like the following was used to accomplish this:

VBoxManage internalcommands sethduuid [Virtual Disk Image Name].vdi

Because I was doing this on Linux Mint, I could call VBoxManage without need to tell the system where it was as would be the case in Windows. Otherwise, it is the sethduuid portion that changes the UUID as required. Another way around this is to clone the VDI file using the following command but I had not realised that at the time:

VBoxManage clonevdi [old virtual disk image].vdi [new virtual disk image].vdi

It seems that there can be more than way to do things in VirtualBox at times so the second way will remain on reference for the future.

Online learning

18th April 2021

Recently, I shared my thoughts on learning new computing languages by oneself using books, online research and personal practice. As successful as that can be, there remains a place for getting some actual instruction as well. Maybe that is why so many turn to YouTube, where there is a multitude of video channels offering such possibilities without cost. What I have also discovered is that this is complemented by a host of other providers whose services attract a fee, and there will be a few of those mentioned later in this post. Paying for online courses does mean that you can get the benefit of curation and an added assurance of quality in what appears to be a growing market.

The variation in quality can dog the YouTube approach, and it also can be tricky to find something good, even if the platform does suggest new videos based on what you have been watching. Much of what is found there does take the form of webinars from the likes of the Why R? Foundation, Posit or the NHSR Community. These can be useful, and there are shorter videos from such providers as the Association of Computing Machinery or SAS Users. These do help more if you already have some knowledge about the topic area being discussed, so they may not make the best starting points for someone who is starting from scratch.

Of course, working your way through a good book will help, and it is something that I have been known to do, but supplementing this with one or more video courses really adds to the experience and I have done a few of these on LinkedIn. That part of the professional platform came from the acquisition of Lynda.com and the topic areas range from soft skills like time management through to computing skills courses with R, SAS and Python seeing coverage among the data science portfolio. Even O’Reilly has ventured into the area in an expansion from the book publishing activities for which so many of us know the organisation.

The available online instructor community does not stop at the above since there are others like Degreed, Baeldung, Udacity, Programiz, Udemy, Business Science and Datanovia. Some of these tend towards online education provision that feels more like an online university course and those are numerous as well as you will find through Data Science Central or KDNuggets. Both of these earn income from advertising to pay for featured blog posts and newsletters, while the former also organises regular webinars and was my first port of call when I became curious about the world of data science during the autumn of 2017.

My point of approach into the world of online training has been as a freelance information professional needing to keep up to date with a rapidly changing field. The mix of content that is both free of charge and that which attracts a fee is one that can work. Both kinds do complement each other while possessing their unique advantages and disadvantages. The need to continually expand skills and knowledge never goes away, so it is well worth spending some time working what you are after, since you need to be sure that any training always adds to your own knowledge and skill level.

Self-learning new computing languages

10th April 2021

Over the years, I have taught myself a number of computing languages with some coming in useful for professional work while others came in handy for website development and maintenance. The collection has grown to include HTML, CSS, XML, Perl, PHP and UNIX Shell Scripting. The ongoing pandemic allowed to me added two more to the repertoire: R and Python.

My interest in these arose from my work as an information professional concerned with standardisation and automation of statistical results delivery. To date, the main focus has been on clinical study data but ongoing changes in the life sciences sector could mean that I may need to look further afield so having extra knowledge never hurts. Though I have been a SAS programmer for more than twenty years, its predominance in the clinical research field is not what it was so that I am having to rethink things.

As it happens, I would like to continue working with SAS since it does so much and thoughts of leaving it after me bring sadness. It also helps to know what the alternatives might be and to reject some management hopes about any newcomers, especially with regard to the amount of code being produced and the quality of graphs being created. Use cases need to be assessed dispassionately even when emotions loom behind the scenes.

Both R and Python bring large scripting ecosystems with active communities so the attraction of their adoption makes a deal of sense. SAS is comparable in the scale of its own ecosystem though there are considerable differences and the platform is catching up when it comes to Data Science. The aforementioned open source languages may have had a head start but it seems that others are not standing still either. It is a time to have wider awareness and online conference attendance helps with that.

The breadth of what is available for any programming language more than stymies any attempt to create a truly all encompassing starting point and I have abandoned thoughts of doing anything like that for R. Similarly, I will not even try such a thing for Python. Consequently, this means that my sharing of anything learned will be in the form of discrete postings from time to time, especially given ho easy it is to collect numerous website links for sharing.

The learning has been facilitated by ongoing pandemic restrictions though things are opening up a little now. The pandemic also has given us public data that can be used for practice since much can be gained from having one’s own project instead of completing exercises from a book. Having an interesting data set with which to work is a must and COVID-19 data contain a certain self-interest as well though one always is mindful of the suffering and loss of life that has been happening since the pandemic first took hold.

« Older Entries « » Newer Entries »

Technology Tales

A new look

Something to watch with the SYSODSESCAPECHAR automatic SAS macro variable

Limiting Google Drive upload & synchronisation speeds using Trickle

Insync

gdrive

When CRON is stalled by incorrect file and folder permissions

Some books and other forms of documentation on R

A little bit of abstraction

When a hard drive is unrecognised by the Linux hddtemp command

Changing the UUID of a VirtualBox Virtual Disk Image in Linux

Online learning

Self-learning new computing languages

Technology Media

Translate

Insync

gdrive

Topics Discussed

Technology Media

Translate