Tag Archive MySQL

Useful Python packages for working with data

October 14th, 2021

My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. One of these has been R but Python is another that has taken up my attention and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.

While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool in order to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup  though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.

During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family so as to have a more stable income. The approach is that a student selects a project and works their way through it with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course and that point came through in the webinar.

That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic pandemic supplying data and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Some subjects are uplifting while others are more foreboding but the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is and I hope that it will help with learning Julia too.

In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message but books always are worth reading even if that is the slower route. Those from the Dummies series or from O’Reilly have proved must useful so far but I do need to read them more completely than I already have; it is all too tempting to go with the try the “programming and search for solutions as you go” approach instead.

To get going, many choose the Anaconda distribution to get Jupyter notebook functionality but I prefer a more traditional editor so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Spyder itself is written in Python so it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities but these get installed behind the scenes.

The packages that I first met in 2019 may be the mainstays for doing data science but I have discovered others since then. It also seems that there is porosity between the worlds of R an Python so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dply and ggplot2 in the form of Siuba and Plotnine, respectively. The syntax of these packages are not direct copies of what is executed in R but they are close enough for there to be enough familiarity for added user friendliness compared to Pandas or Matplotlib. The interoperability does not stop there for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.

Pyhton may not have the speed of Julia but there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have there uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available and there is no need not to check out more. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurst to keep an open mind either.

Installing Perl modules using CPAN on Linux Mint 19.2

September 28th, 2019

My online travel photo gallery is a self-coded set of PHP scripts that read data from tables in a MySQL database. These tables are built from input XML files using a Perl script that itself creates and executes an SQL script. The Perl script also does some image processing using GraphicsMagick commands to resize images and to add copyright information and image framing. Because this processed one image at a time in a sequential manner, it was taking several minutes to complete and only partly used the capacity of the PC that I used.

This led me to look at adding parallel processing and that is what brought me to looking at the Parallel::ForkManager Perl module. An alternative approach might have been to add new images in such a way as not to need the full run involving hundreds of image files but that will take more work and I fancied having a look at parallelising things anyway.

If it was not there already, the first act would have been to install build-essential to get access to the cpan command. The following command accomplishes this:

sudo apt-get install build-essential

Once that is there, the cpan command needs running and some questions answered to get things going. The first question to answer is whether you want setup to be as automated as possible and the default answer of yes worked for me. The next question to answer regards the approach that cpan takes when installing modules and I chose sudo here (local::lib is the default value and manual is another option). After this, cpan drops into its own command shell. Here, I issued two more commands to continue the basic setup by updating CPAN.pm to the latest version and adding Bundle::CPAN to optimise the module further:

make install
install Bundle::CPAN

Continuing the last of these may need extra intervention to confirmation the suggested default of exit at one point in its operation and that takes a little time to complete. It is after this that Parallel::ForkManager can be installed using the following command:

install Parallel::ForkManager

That completed quickly and the cpan shell was exited using its exit command. Then, the new module was available in scripting after that. The actual use of this module is something that hope to describe in another post so i am ending this one here and the same process is just as applicable to setting up cpan and adding any other Perl CPAN module.

Moving a website from shared hosting to a virtual private server

November 24th, 2018

This year has seen some optimisation being applied to my web presences guided by the results of GTMetrix scans. It is was then that I realised how slow things were so server loads were reduced. Anything that slowed response times, such as WordPress plugins, got removed. Matomo usage also was curtailed in favour of Google Analytics while HTML, CSS and JS minification followed. What had not happened was a search for a faster server and another website has been moved onto a virtual private server (VPS) to see how that would go.

Speed was not the only consideration since security was a factor too. After all, a VPS is more locked away from other users that a folder on a shared server. There also is the added sense of control so Let’s Encrypt SSL certificates can be added using the Electronic Frontier Foundation’s Certbot. That avoids the expense of using an SSL certificate provided through my shared hosting provider and a successful transition for my travel website may mean that this one undergoes the same move.

For the VPS, I chose Ubuntu 18.04 as its operating system and it came with the LAMP stack already in place. Have offload development websites, the mix of Apache, MySQL and PHP is more familiar to me than anything using Nginx or Python. It also means that .htaccess files become more useful than they were on my previous Nginx-based platform. Having full access to the operating system by means of SSH helps too and should mean that I have less calls on technical support since I can do more for myself. Any extra tinkering should not affect others either since this type of setup is well known to me and having an offline counterpart means that anything riskier is tried there beforehand.

Naturally, there were niggles to overcome with the move. The first to fix was to make the MySQL instance accept calls from outside the server so that I could migrate data there from elsewhere and I even got my shared hosting setup to start using the new database to see what performance boost it might give. To make all this happen, I first found the location of the relevant my.cnf configuration file using the following command:

find / -name my.cnf

Once I had the right file, I commented out the following line that it contained and restarted the database service afterwards using another command to stop the appearance of any error 111 messages:

bind-address 127.0.0.1
service mysql restart

After that, things worked as required and I moved onto another matter: uploading the requisite files. That meant installing an FTP server so I chose proftpd since I knew that well from previous tinkering. Once that was in place, file transfer commenced.

When that was done, I could do some testing to see if I had an active web server that loaded the website. Along the way, I also instated some Apache modules like mod-rewrite using the a2enmod command, restarting Apache each time I enabled another module.

Then, I discovered that Textpattern needed php-7.2-xml installed so the following command was executed to do this:

apt install php7.2-xml

Then, the following line was uncommented in the correct php.ini configuration file that I found using the same method as that described already for the my.cnf configuration and that was followed by yet another Apache restart:

extension=php_xmlrpc.dll

Addressing the above issues yield enough success for me to change the IP address in my Cloudflare dashboard so it point at the VPS and not the shared server. The changeover happened seamlessly without having to await DNS updates as once would have been the case. It had the added advantage of making both WordPress and Textpattern work fully.

With everything working to my satisfaction, I then followed the instructions on Certbot to set up my new Let’s Encrypt SSL certificate. Aside from a tweak to a configuration file and another Apache restart, the process was more automated than I had expected so I was ready to embark on some fine tuning to embed the new security arrangements. That meant updating .htaccess files and Textpattern has its own so the following addition was needed there:

RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

This complemented what was already in the main .htaccess file and WordPress allows you include http(s) in the address it uses so that was another task completed. The general .htaccess only needed the following lines to be added:

RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.assortedexplorations.com/$1 [R,L]

What all these achieve is to redirect insecure connections to secure ones for every visitor to the website. After that, internal hyperlinks without https needed updating along with any forms so that a padlock sign could be shown for all pages.

With the main work completed, it was time to sort out a lingering niggle regarding the appearance of an FTP login page every time a WordPress installation or update was requested. The main solution was to make the web server account the owner of the files and directories but the following line was added to wp-config.php as part of the fix even if it probably is not necessary:

define('FS_METHOD', 'direct');

There also was the non-operation of WP Cron and that was addressed using WP-CLI and a script from Bjorn Johansen. To make double sure of its effectiveness, the following was added to wp-config.php to turn off the usual WP-Cron behaviour:

define('DISABLE_WP_CRON', true);

Intriguingly, WP-CLI offers a long list of possible commands that are worth investigating. A few have been examined but more await attention.

Before those, I still need to get my new VPS to send emails. So far, sendmail has been installed, the hostname changed from localhost and the server restarted. More investigations are needed but what I have not is faster that what was there before so the effort has been rewarded already.

Setting up MySQL on Sabayon Linux

September 27th, 2012

For quite a while now, I have offline web servers for doing a spot of tweaking and tinkering away from the gaze of web users that visit what I have on there. Therefore, one of the tests that I apply to any prospective main Linux distro is the ability to set up a web server on there. This is more straightforward for some than for others. For Ubuntu and Linux Mint, it is a matter of installing the required software and doing a small bit of configuration. My experience with Sabayon is that it needs a little more effort than this and I am sharing it here for the installation of MySQL.

The first step is too install the software using the commands that you find below. The first pops the software onto the system while second completes the set up. The --basedir option is need with the latter because it won’t find things without it. It specifies the base location on the system and it’s /usr in my case.

sudo equo install dev-db/mysql
sudo /usr/bin/mysql_install_db --basedir=/usr

With the above complete, it’s time to start the database server and set the password for the root user. That’s what the two following commands achieve. Once your root password is set, you can go about creating databases and adding other users using the MySQL command line

sudo /etc/init.d/mysql start
mysqladmin -u root password ‘password’

The last step is to set the database server to start every time you start your Sabayon system. The first command adds an entry for MySQL to the default run level so that this happens. The purpose of the second command is check that this happened before restarting your computer to discover if it really happens. This procedure also is needed for having an Apache web server behave in the same way so the commands are worth having and even may have a use for other services on your system. ProFTP is another that comes to mind, for instance.

sudo rc-update add mysql default
sudo rc-update show | grep mysql

An in situ upgrade to Linux Mint 12

December 4th, 2011

Though it isn’t the recommended approach, I have ended up upgrading to Linux Mint 12 from Linux Mint 11 using an in situ route. Having attempted this before with a VirtualBox hosted installation, I am well aware of the possibility of things going wrong. Then, a full re-installation was needed to remedy the situation. With that in mind, I made a number of backups in the case of an emergency fresh installation of the latest release of Linux Mint. Apache and VirtualBox configuration files together with MySQL backups were put where they could be retrieved should that be required. The same applied to the list of installed packages on my system. So far, I haven’t needed to use these but there is no point in taking too many chances.

The first step in an in-situ Linux Mint upgrade is to edit /etc/apt/sources.list. In the repository location definitions, any reference to katya (11) was changed to lisa (for 12) and the same applied to any appearance of natty (Ubuntu 11.04) which needed to become oneiric (Ubuntu 11.10). With that done, it was time to issue the following command (all one line even if it is broken here):

sudo apt-get update && sudo apt-get upgrade && sudo apt-get
dist-upgrade

Once that had completed, it was time to add the new additions that come with Linux Mint 12 to my system using a combination of apt-get, aptitude and Synaptic; the process took a few cycles. GNOME already was in place from prior experimentation  so there was no need to add this anew. However, I need to instate MGSE to gain the default Linux Mint customisations of GNOME 3. Along with that, I decided to add MATE, the fork of GNOME 2. That necessitated the removal of two old libraries (libgcr0 and libgpp11, if I remember correctly but it will tell you what is causing any conflict) using Synaptic. With MGSE and MATE in place, it was time to install LightDM and its Unity greeter to get the Linux Mint login screen. Using GDM wasn’t giving a very smooth visual experience and Ubuntu, the basis of Linux Mint, uses LightDM anyway. Even using the GTK greeter with LightDM produced a clunky login box in front of a garish screen. Configuration tweaks could have improved on this but it seems that using LightDM and Unity greeter is what gives the intended set up and experience.

With all of this complete, the system seemed to be running fine until the occasional desktop freeze occurred with Banshee running. Blaming that, I changed to Rhythmbox instead though that helped only marginally. While this might be blamed on how I did the upgraded my system, things seemed to have steadied themselves in the week since then. As a test, I had the music player going for a few hours and there was no problem. With the call for testing of an update to MATE a few days ago, it now looks as if there may have been bugs in the original release of Linux Mint 12. Daily updates have added new versions of MGSE and MATE so that may have something to do with the increase in stability. Even so, I haven’t discounted the possibility of needing to do a fresh installation of Linux Mint 12 just yet. However, if things continue as they are, then it won’t be needed and that’s an upheaval avoided should things go that way. That’s why in situ upgrades are attractive though rolling distros like Arch Linux (these words are being written on a system running this) and LMDE are moreso.