Technology Tales

Adventures & experiences in contemporary technology

Useful Python packages for working with data

14th October 2021

My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. One of these has been R but Python is another that has taken up my attention and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.

While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool in order to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup  though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.

During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family so as to have a more stable income. The approach is that a student selects a project and works their way through it with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course and that point came through in the webinar.

That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic pandemic supplying data and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Some subjects are uplifting while others are more foreboding but the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is and I hope that it will help with learning Julia too.

In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message but books always are worth reading even if that is the slower route. Those from the Dummies series or from O’Reilly have proved must useful so far but I do need to read them more completely than I already have; it is all too tempting to go with the try the “programming and search for solutions as you go” approach instead.

To get going, many choose the Anaconda distribution to get Jupyter notebook functionality but I prefer a more traditional editor so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Spyder itself is written in Python so it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities but these get installed behind the scenes.

The packages that I first met in 2019 may be the mainstays for doing data science but I have discovered others since then. It also seems that there is porosity between the worlds of R an Python so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dply and ggplot2 in the form of Siuba and Plotnine, respectively. The syntax of these packages are not direct copies of what is executed in R but they are close enough for there to be enough familiarity for added user friendliness compared to Pandas or Matplotlib. The interoperability does not stop there for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.

Pyhton may not have the speed of Julia but there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have there uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available and there is no need not to check out more. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurst to keep an open mind either.

Contents not displaying for Shared Folders on a Fedora 32 guest instance in VirtualBox

26th July 2020

While some Linux distros like Fedora install VirtualBox drivers during installation time, I prefer to install the VirtualBox Guest Additions themselves. Before doing this, it is best to remove the virtualbox-guest-additions package from Fedora to avoid conflicts. After that, execute the following command to ensure that all prerequisites for the VirtualBox Guest Additions are in place prior to mounting the VirtualBox Guest Additions ISO image and installing from there:

sudo dnf -y install gcc automake make kernel-headers dkms bzip2 libxcrypt-compat kernel-devel perl

During the installation, you may encounter a message like the following:

ValueError: File context for /opt/VBoxGuestAdditions-<VERSION>/other/mount.vboxsf already defined

This is generated by SELinux so the following commands need executing before the VirtualBox Guest Additions installation is repeated:

sudo semanage fcontext -d /opt/VBoxGuestAdditions-<VERSION>/other/mount.vboxsf
sudo restorecon /opt/VBoxGuestAdditions-<VERSION>/other/mount.vboxsf

Without doing the above step and fixing the preceding error message, I had an issue with mounting of Shared Folders whereby the mount point was set up but no folder contents were displayed. This happened even when my user account was added to the vboxsf group and it proved to be the SELinux context issue that was the cause.

Removing obsolete libraries from Flatpak

1st February 2020

Along with various pieces of software, Flatpak also installs KDE and GNOME libraries needed to support them. However, it does not always remove obsolete versions of those libraries whenever software gets updated. One result is that messages regarding obsolete versions of GNOME may be issued and this has been known to cause confusion because there is the GNOME instance that is part of a Linux distro like Ubuntu and using Flatpak adds another one for its software packages to use. My use of Linux Mint may lesson the chances of misunderstanding.

Thankfully, executing a single command will remove any obsolete Flatpak libraries so the messages no longer appear and there then is no need to touch your actual Linux installation. This then is the command that sorted it for me:

flatpak uninstall --unused && sudo flatpak repair

The first part that removes any unused libraries is run as a normal user so there is no error in the above command. Administrative privileges are needed for the second section that does any repairs that are needed. It might be better if Flatpak did all this for you using the update command but that is not how the thing works. At least, there is a quick way to address this state of affairs and there might be some good reasons for having things work as they do.

Ensuring that Flatpak remains up to date on Linux Mint 19.2

25th October 2019

The Flatpak concept offers a useful way of getting the latest version of software like LibreOffice or GIMP on Linux machines because repositories are managed conservatively when it comes to the versions of included software. Ubuntu has Snaps, which are similar in concept. Both options bundle dependencies with the packaged software so that its operation can use later versions of system libraries that what may be available with a particular distribution.

However, even Flatpak depends on what is available through the repositories for a distribution as I found when a software update needed a version of the tool. The solution was to add PPA using the following command and agreeing to the prompts that arise (answering Y, in other words):

sudo add-apt-repository ppa:alexlarsson/flatpak

With the new PPA instated, the usual apt commands were used to update the Flatpak package and continue with the required updates. Since then, all has gone smoothly as expected.

Updating Flatpack applications on Linux Mint 19

10th August 2018

Since upgrading to Linux Mint 19, I have installed some software from Flatpack. The cause for my curiosity was that you could have the latest versions of applications like GIMP or Libreoffice without having to depend on a third-party PPA. Installation is straightforward given the support built into Linux Mint. You just need to download the relevant package from the Flatpack website and running the file through the GUI installer. Because the packages come with extras to ensure cross-compatibility, more disk space is used but there is no added system overhead beyond that from what I have seen. Updating should be as easy as running the following single command too:

flatpack update

However, I needed to do a little extra work before this was possible. The first step was to update the configuration file at ~/.local/share/flatpak/repo/config to add the following lines:

[remote "flathub"]
gpg-verify=true
gpg-verify-summary=true
url=https://flathub.org/repo/
xa.title=Flathub

Once that was completed, I ran the following commands to import the required GPG key:

wget https://flathub.org/repo/flathub.gpg
flatpak --user remote-modify --gpg-import=flathub.gpg flathub

With this complete, I was able to run the update process and update any applications as necessary. After that first run, it has been integrated in to my normal processes by adding the command to the relevant alias definition.

Batch conversion of DNG files to other file types with the Linux command line

8th June 2016

At the time of writing, Google Drive is unable to accept DNG files, the Adobe file type for RAW images from digital cameras. The uploads themselves work fine but the additional processing at the end that I believe is needed for Google Photos appears to be failing. Because of this, I thought of other possibilities like uploading them to Dropbox or enclosing them in ZIP archives instead; of these, it is the first that I have been doing and with nothing but success so far. Another idea is to convert the files into an image format that Google Drive can handle and TIFF came to mind because it keeps all the detail from the original image. In contrast, JPEG files lose some information because of the nature of the compression.

Handily, a one line command does the conversion for all files in a directory once you have all the required software installed:

find -type f | grep -i “DNG” | parallel mogrify -format tiff {}

The find and grep commands are standard with the first getting you a list of all the files in the current directory and sending (piping) these to the grep command so the list only retains the names of all DNG files. The last part uses two commands for which I found installation was needed on my Linux Mint machine. The parallel package is the first of these and distributes the heavy workload across all the cores in your processor and this command will add it to your system:

sudo apt-get install parallel

The mogrify command is part of the ImageMagick suite along with others like convert and this is how you add that to your system:

sudo apt-get install imagemagick

In the command at the top, the parallel command works through all the files in the list provided to it and feeds them to mogrify for conversion. Without the use of parallel, the basic command is like this:

mogrify -format tiff *.DNG

In both cases, the -format switch specifies the output file type with tiff triggering the creation of TIFF files. The *.DNG portion itself captures all DNG files in a directory but {} does this in the main command at the top of this post. If you wanted JPEG ones, you would replace tiff with jpg. Shoudl you ever need them, a full list of what file types are supported is produced using the identify command (also part of ImageMagick) as follows:

identify -list format

ERROR: Can’t find the archive-keyring

10th April 2014

When I recently did my usual system update for the stable version Ubuntu GNOME, there were some updates pertaining to apt and the process failed when I executed the following command:

sudo apt-get upgrade

Usefully, some messages were issued and here’s a flavour:

Setting up apt (0.9.9.1~ubuntu3.1) …
ERROR: Can’t find the archive-keyring
Is the ubuntu-keyring package installed?
dpkg: error processing apt (--configure):
subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
apt
E: Sub-process /usr/bin/dpkg returned an error code (1)

Some searching on the web revealed that the problem was that there were no files in /usr/share/keyring when there should have been and I had not removed them myself so I have no idea how they disappeared. Various remedies were tried and any that needed software installed were non-starters because apt was disabled by the lack of keyring files. The workaround that restored things for me was to take a copy of the files in /usr/share/keyring from an Ubuntu GNOME 14.04 installation in a VirtualBox VM and copy them in to the same location in its Ubuntu GNOME 13.10 host. For those without such resources, I have packaged them in a zip file below. Other remedies like Y PPA also were suggested where I was reading but that software package needed installing beforehand so it was little use to me when the likes of Synaptic were disabled. If there are other remedies that do not involve an operating system re-installation, I would like to know about them too as well as possible causes for the file loss in the first place and how to avoid these.

Ubuntu Keyrings

Sorting out a system update failure for FreeBSD

3rd April 2014

With my tendency to apply Linux updates using the command, I was happy to see that something similar was possible in FreeBSD too. The first step is to fire up a terminal session and drop into root using the su command. That needs the root superuser password in order to continue and the next step is to update the local repositories using the following command:

pkg update

After that, it is time download updated packages and install these by issuing this command:

pkg upgrade

Most of the time, that is sufficient but I discovered that there are times when the above fails and additional interventions are needed. What I had uncovered were dependency error messages and I set to looking around the web for remedies to this. One forum question that was similar to what I had met with the suggestion of consulting the file called UPDATING in /usr/ports/. An answer like that looks unhelpful but for the inclusion of advice where extra actions were needed. Also, there is a useful article on updating FreeBSD ports that gives more in the way of background knowledge so you understand the more about what needs doing.

Following both that and the UPDATING  file resulted in my taking the following sequence of steps. The first act was to download and initialise the Ports Collection, a set of build instructions.

portsnap fetch extract

The above is a one time only action so future updates are done as follows:

portsnap fetch update

With an up to date Ports Collection in place, it was time to install portman:

pkg install portman

A look through /usr/ports/UPDTAING revealed the commands I needed for updating Python and Perl to address the dependency problem that I was having:

portmaster -o devel/py-setuptools27 devel/py-setuptools
portmaster -r py\*setuptools

With those completed, I re-ran pkg update again and all was well. The extra actions needed to get that result will not get forgotten and I am sharing them on here so I know where they are. If anyone else has use for them, that would be even better.

A fallback method of installing Nightingale in Linux

3rd December 2013

When I upgraded to Ubuntu GNOME 13.10 and went for the 64-bit variant, I tried a previously tried and tested approach for installing Nightingale that used a PPA only for it not to work. At that point, the repository had not caught up with the latest Ubuntu release (it has by the time of writing) and other pre-compiled packages would not work either. However, there was one further possibility left and that was downloading a copy of the source code and compiling that. My previous experiences of doing that kind of thing have not been universally positive so it was not my first choice but I gave it a go anyway.

In order to get the source code, I first needed to install Git so I could take a copy from the version controlled repository and the following command added the tool and all its dependencies:

sudo apt-get install git autoconf g++ libgtk2.0-dev libdbus-glib-1-dev libtag1-dev libgstreamer-plugins-base0.10-dev zip unzip

With that lot installed, it was time to checkout a copy of the latest source code and I went with the following:

git clone https://github.com/nightingale-media-player/nightingale-hacking.git

The next step was to go into the nightingale-hacking sub-folder and issue the following command:

./build.sh

That should produce a sub-directory named nightingale that contains the compiled executable files. If this exists, it can be copied into /opt. If not, then create a folder named nightingale under /opt using copy the files from ~/nightingale-hacking/compiled/dist into that location. Ubuntu GNOME 13.10 comes with GNOME Shell 3.8, the next step took a little fiddling before it was sorted: adding an icon to application menu or dashboard. This involved adding a file called nightingale.desktop in /usr/share/applications/ with the following contents:

[Desktop Entry]
Name=Nightingale
Comment=Play music
TryExec=/opt/nightingale/nightingale
Exec=/opt/nightingale/nightingale
Icon=/usr/share/pixmaps/nightingale.xpm
Type=Application
X-GNOME-DocPath=nightingale/index.html
X-GNOME-Bugzilla-Bugzilla=Nightingale
X-GNOME-Bugzilla-Product=nightingale
X-GNOME-Bugzilla-Component=BugBuddyBugs
X-GNOME-Bugzilla-Version=1.1.2
Categories=GNOME;Audio;Music;Player;AudioVideo;
StartupNotify=true
OnlyShowIn=GNOME;Unity;
Keywords=Run;
Actions=New
X-Ubuntu-Gettext-Domain=nightingale

[Desktop Action New]
Name=Nightingale
Exec=/opt/nightingale/nightingale
OnlyShowIn=Unity

It was created from a copy of another *.desktop file and the categories in there together with the link to the icon were as important as the title and took a little tinkering before all was in place.  Also, you may find that /opt/nightingale/chrome/icons/default/default.xpm needs to be become /usr/share/pixmaps/nightingale.xpm using the cp command before your new menu entry gains an icon to go with it. While the steps that I describe here worked for me, there is more information on the Nightingale wiki if you need it.

Installing Citrix Receiver 13.0 in Ubuntu GNOME 13.10 64-bit

28th November 2013

Installing the latest version of Citrix Receiver (13.0 at the time of writing) on 64-bit Ubuntu should be as simple as downloading the required DEB package and double-clicking on the file so that Ubuntu Software Centre can work its magic. Unfortunately, the 64-bit DEB file is faulty so that means that the Ubuntu community how-to guide for Citrix still is needed. In fact, any user of Linux Mint or another distro that uses Ubuntu as its base would do well to have a look at that Ubuntu link.

For sake of completeness, I still am going to let you in on the process that worked for me. Once the DEB file has been downloaded, the first task is to creating a temporary folder where the DEB file’s contents can be extracted:

mkdir ica_temp

With that in place, it then is time to do the extraction and it needs two commands with the second of these need to extract the control file while the first extracts everything else.

sudo dpkg-deb -x icaclient- ica_temp
sudo dpkg-deb --control icaclient- ica_temp/DEBIAN

It is the control file that has been the cause of all the bother because it refers to unavailable dependencies that it really doesn’t need anyway. To open the file for editing, issue the following command:

sudo gedit ica_temp/DEBIAN/control

Then change line 7 (it should begin with Depends:) to: Depends: libc6-i386 (>= 2.7-1), lib32z1, nspluginwrapper. There are other software packages in there that Ubuntu no longer supports and they are not needed anyway. With the edit made and the file saved, the next step is to build a new DEB package with the corrected control file:

dpkg -b ica_temp icaclient-modified.deb

Once you have the package, the next step is to install it using the following command:

sudo dpkg -i icaclient-modified.deb

If it fails, then you have missing dependencies and the following command should sort these before a re-run of the above command again:

sudo apt-get install libmotif4:i386 nspluginwrapper lib32z1 libc6-i386

With Citrix Receiver installed, there is one more thing that is needed before you can use it freely. This is to put Thawte security certificate files into /opt/Citrix/ICAClient/keystore/cacerts. What I had not realised until recently was that many of these already are in /usr/share/ca-certificates/mozilla and linking to them with the following command makes them available to Citrix receiver:

sudo ln -s /usr/share/ca-certificates/mozilla/* /opt/Citrix/ICAClient/keystore/cacerts/

Another approach is to download the Thawte certificates and extract the archive to /tmp/. From there they can be copied to /opt/Citrix/ICAClient/keystore/cacerts and I copied the Thawte Personal Premium certificate as follows:

sudo cp /tmp/Thawte Root Certificates/Thawte Personal Premium CA/Thawte Personal Premium CA.cer /opt/Citrix/ICAClient/keystore/cacerts/

Until I found out about what was in the Mozilla folder, I simply picked out the certificate mentioned in the Citrix error message and copied it over like the above. Of course, all of this may seem like a lot of work to those who are non-tinkerers and I have added a repaired 64-bit DEB package that incorporates all of the above and should not need any further intervention aside from installing it using GDebi, Ubuntu’s Software Centre, dpkg or anything else that does what’s needed.

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.