Open formats | Technology Tales

Rendering Markdown in WordPress without plugins by using Parsedown

4^th November 2025

Much of what is generated using GenAI as articles is output as Markdown, meaning that you need to convert the content when using it in a WordPress website. Naturally, this kind of thing should be done with care to ensure that you are the creator and that it is not all the work of a machine; orchestration is fine, regurgitation does that add that much. Naturally, fact checking is another need as well.

Writing plain Markdown has secured its own following as well, with WordPress plugins switching over the editor to facilitate such a mode of editing. When I tried Markup Markdown, I found it restrictive when it came to working with images within the text, and it needed a workaround for getting links to open in new browser tabs as well. Thus, I got rid of it to realise that it had not converted any Markdown as I expected, only to provide rendering at post or page display time. Rather than attempting to update the affected text, I decided to see if another solution could be found.

This took me to Parsedown, which proved to be handy for accomplishing what I needed once I had everything set in place. First, that meant cloning its GitHub repo onto the web server. Next, I created a directory called includes under that of my theme. Into there, I copied Parsedown.php to that location. When all was done, I ensured that file and directory ownership were assigned to www-data to avoid execution issues.

Then, I could set to updating the functions.php file. The first line to get added there included the parser file:

require_once get_template_directory() . '/includes/Parsedown.php';

After that, I found that I needed to disable the WordPress rendering machinery because that got in the way of Markdown rendering:

remove_filter('the_content', 'wpautop'); remove_filter('the_content', 'wptexturize');

The last step was to add a filter that parsed the Markdown and passed its output to WordPress rendering to do the rest as usual. This was a simple affair until I needed to deal with code snippets in pre and code tags. Hopefully, the included comments tell you much of what is happening. A possible exception is $matches[0]which itself is an array of entire <pre>...</pre> blocks including the containing tags, with $i => $block doing a $key (not the same variable as in the code, by the way) => $value lookup of the values in the array nesting.

add_filter('the_content', function($content) {
    // Prepare a store for placeholders
    $placeholders = [];

    // 1. Extract pre blocks (including nested code) and replace with safe placeholders
    preg_match_all('//si', $content, $pre_matches);
    foreach ($pre_matches[0] as $i => $block) {
        $key = "§PREBLOCK{$i}§";
        $placeholders[$key] = $block;
        $content = str_replace($block, $key, $content);
    }

    // 2. Extract standalone code blocks (not inside pre)
    preg_match_all('/).*?<\/code>/si', $content, $code_matches);
    foreach ($code_matches[0] as $i => $block) {
        $key = "§CODEBLOCK{$i}§";
        $placeholders[$key] = $block;
        $content = str_replace($block, $key, $content);
    }

    // 3. Run Parsedown on the remaining content
    $Parsedown = new Parsedown();
    $content = $Parsedown->text($content);

    // 4. Restore both pre and code placeholders
    foreach ($placeholders as $key => $block) {
        $content = str_replace($key, $block, $content);
    }

    // 5. Apply paragraph formatting
    return wpautop($content);
}, 12);

All of this avoided dealing with extra plugins to produce the required result. Handily, I still use the Classic Editor, which makes this work a lot more easily. There still is a Markdown import plugin that I am tempted to remove as well to streamline things. That can wait, though. It best not add any more of them any way, not least avoid clashes between them and what is now in the theme.

Moves to Hugo

30^th November 2022

What amazes me is how things can become more complicated over time. As long as you knew HTML, CSS and JavaScript, building a website was not as onerous as long as web browsers played ball with it. Since then, things have got easier to use but more complex at the same time. One example is WordPress: in the early days, themes were much simpler than they are now. The web also has got more insecure over time, and that adds to complexity as well. It sometimes feels as if there is a choice to make between ease of use and simplicity.

It is against that background that I reassessed the technology that I was using on my public transport and Irish history websites. The former used WordPress, while the latter used Drupal. The irony was that the simpler website was using the more complex platform, so the act of going simpler probably was not before time. Alternatives to WordPress were being surveyed for the first of the pair, but none had quite the flexibility, pervasiveness and ease of use that WordPress offers.

There is another approach that has been gaining notice recently. One part of this is the use of Markdown for web publishing. This is a simple and distraction-free plain text format that can be transformed into something more readable. It sees usage in blogs hosted on GitHub, but also facilitates the generation of static websites. The clutter is absent for those who have no need of the Gutenberg Editor on WordPress.

With the content written in Markdown, it can be fed to a static website generator like Hugo. Using defined templates and fixed assets like CSS together with images and other static files, it can slot the content into HTML files very speedily since it is written in the Go programming language. Once you get acclimatised, there are no folder structures that cannot be used, so you get full flexibility in how you build out your website. Sitemaps and RSS feeds can be built at the same time, both using the same input as the HTML files.

In a nutshell, it automates what once needed manual effort used a code editor or a visual web page editor. The use of HTML snippets and layouts means that there is no necessity for hand-coding content, like there was at the start of the web. It also helps that Bootstrap can be built in using Node, so that gives a basis for any styling. Then, SCSS can take care of things, giving even more automation.

Given that there is no database involved in any of this, the required information has to be stored somewhere, and neither the Markdown content nor the layout files contain all that is needed. The main site configuration is defined in a single TOML file, and you can have a single one of these for every publishing destination; I have development and production servers, which makes this a very handy feature. Otherwise, every Markdown file needs a YAML header where titles, template references, publishing status and other similar information gets defined. The layouts then are linked to their components, and control logic and other advanced functionality can be added too.

Because static files are being created, it does mean that site searching and commenting, or contact pages cannot work like they would on a dynamic web platform. Often, external services are plugged in using JavaScript. One that I use for contact forms is Getform.io. Then, Zapier has had its uses in using the RSS feed to tweet site updates on Twitter when new content gets added. Though I made different choices, Disqus can be used for comments and Algolia for site searching. Generally, though, you can find yourself needing to pay, particularly if you need to remove advertising or gain advanced features.

Some commenting service providers offer open source self-hosted options, but I found these difficult to set up and ended up not offering commenting at all. That was after I tried out Cactus Comments only to find that it was not discriminating between pages, so it showed the same comments everywhere. There are numerous alternatives like Remark42, Hyvor Talk, Commento, FastComments, Utterances, Isso, Mouthful, Muut and HyperComments but trying them all out was too time-consuming for what commenting was worth to me. It also explains why some static websites even send readers to Twitter if they have something to say, though I have not followed this way of working.

For searching, I added a JavaScript/JSON self-hosted component to the transport website, and it works well. However, it adds to the size of what a browser needs to download. That is not a major issue for desktop browsers, but the situation with mobile browsers is such that it has a sizeable effect. Testing with PageSpeed and Lighthouse highlighted this, even if I left things as they are. The solution works well in any case.

One thing that I have yet to work out is how to edit or add content while away from home. Editing files using an SSH connection is as much a possibility as setting up a Hugo publishing setup on a laptop. After that, there is the question of using a tablet or phone, since content management systems make everything web based. These are points that I have yet to explore.

As is natural with a code-based solution, there is a learning curve with Hugo. Reading a book provided some orientation, and looking on the web resolved many conundrums. There is good documentation on the project website, while forum discussions turn up on many a web search. Following any research, there was next to nothing that could not be done in some way.

Migration of content takes some forethought and took quite a bit of time, though there was an opportunity to carry some housekeeping as well. The history website was small, so copying and pasting sufficed. For the transport website, I used Python to convert what was on the database into Markdown files before refining the result. That provided some automation, but left a lot of work to be done afterwards.

The results were satisfactory, and I like the associated simplicity and efficiency. That Hugo works so fast means that it can handle large websites, so it is scalable. The new Markdown method for content production is not problematical so far apart from the need to make it more portable, and it helps that I found a setup that works for me. This also avoids any potential dealbreakers that continued development of publishing platforms like WordPress or Drupal could bring. For the former, I hope to remain with the Classic Editor indefinitely, but now have another option in case things go too far.

Improving a website contact form

23^rd April 2018

On another website, I have had a contact form, but it was missing some functionality. For instance, it stored the input in files on a web server instead of emailing them. That was fixed more easily than expected using the PHP mail function. Even so, it remains useful to survey corresponding documentation on the W3Schools website.

The other changes affected the way the form looked to a visitor. There was a reset button, and that was removed on finding that such things are out of favour these days. Thinking again, there hardly was any need for it any way.

Newer additions that came with HTML5 had their place too. Including user hints using the placeholder attribute should add some user-friendliness, although I have avoided experimenting with browser-powered input validation for now. Use of the required attribute has its uses for telling a visitor that they have forgotten something, but I need to check how that is handled in CSS more thoroughly before I go with that since there are new :required, :optional, :valid and :invalid pseudoclasses that can be used to help.

It appears that there is much more to learn about setting up forms since I last checked. This is perhaps a hint that a few books need reading as part of catching with how things are done these days. There also is something new to learn.

Easier to print?

20^th February 2010

One matter that really came to light was how well or not the pages on here and on my hill walking and photography website came out on the printed page. After spotting a WordPress Codex article and with an eye on improving things, I have made a distinction between screen and print stylesheets. The code in the XHTML looks like this:

<link rel="stylesheet" href="/style.css" type="text/css" media="screen" /> <link rel="stylesheet" href="/style_print.css" type="text/css" media="print" />

The media attribute seems to be respected by the browsers that I have been using for testing (latest versions of Firefox, MSIE and Opera) so it then was a matter of using CSS to control what was shown and how it was displayed. Extraneous items like sidebars were excluded from the printed page in favour of the real content that visitors would be wanting anyway, and everything else was made as monochrome as possible, with images being the only things to escape. After all, people don't want to be wasting paper and ink in these cash strained times, and there's no need to have any more colour than necessary either. Then, there's the distraction caused by non-functioning hyperlinks that has inspired the sharing of some wisdom on A List Apart. Returning to my implementation, please let me know in the comments what you think of what I have done on here and if there remains any room for improvement.

Converting from CGM to Postscript

24^th November 2009

One thing that I recently had to investigate was the possibility of converting CGM vector graphics files into Postscript and from there into PDF. Having used ImageMagick for converting images before, that was an obvious option. However, that cannot process CGM files on its own and needs a delegate or helper application as well. This is the case with raw digital camera files too, with UFRaw being the program chosen. For CGM images, the more obscure RALCGM is what's needed, and tracking it down is a bit of an art. Though the history is that it was developed at the U.K.'s Rutherford Appleton Laboratory, it appears that it was left to go off into the wilderness rather than someone keeping an eye on things. With that in mind, here are the installation packages for Windows and Linux (RPM):

Windows Installer

Linux RPM

RALCGM is a handy command line tool that can covert from CGM to Postscript on its own without any need for ImageMagick at all. From what I have seen, fonts on graphical output may look greyer than black, but it otherwise does its job well. However, considering that it is a freely available tool, one cannot complain too much. There are other packages for doing vector to raster conversion and the ones that I have seen do have GUI's but the freedom to look at for cost software wasn't mine to have. The required command looks something like the following:

ralcgm -d PS -oL test.cgm test.ps

The switch -d PS uses the software's Postscript driver and -oL specifies landscape orientation. If you like to find out more, here's a PDF rendition of the help file that comes with the thing:

RALCGM Documentation

Bumping newly edited older articles in Textpattern

10^th July 2009

Whether this is intended or not, you can put a pre-existing article to the top of your website's Atom or RSS feed by saving it as draft while it is being modified before restoring its status to live again. This is handy when you have got permanent articles that you are enhancing over the course of time, and you want to give your visitors a reason to return and maybe even prompt search engines too. Though new articles will achieve this always, it's nice to see that older articles don't get lost in space either. While this may be a hack, I am using Textpattern for permanent postings, rather than blogging, so I remain well pleased to see the availability of the feature.

Harnessing the power of ImageMagick

26^th October 2008

Using the command line to process images might sound senseless, only for the tools offered by ImageMagick certainly prove that it has its place. I have always been wary of using bulk processing for my digital photo files (some digitised from film prints with a scanner) but I do agree that some of it is needed to free up some time for other more necessary things. With this in mind, it is encouraging to see the results from ImageMagick and I can see it making a major difference to how I maintain my online photo gallery.

For instance, making thumbnail images for the gallery certainly seems to be one of those operations where command line bulk processing comes into its own, and ImageMagick's own convert command is heaven sent for this one. For resizing images, all that's needed is the following:

convert -resize 40% input.jpg output.jpg

Add a spot of further shell scripting and even a dash of Perl and the possibilities for this sort of thing become clearer, and this is but the pinnacle of the proverbial iceberg. The -rotate switch will do what the name suggests, while there are a whole plethora of other options on tap. So long as you have Ghostscript on your system, conversion of graphics to Postscript (and Encapsulated Postscript too) and PDF files is possible with the -page option controlling the margin around the image itself in the resulting outputs. Unfortunately, portrait is the sole orientation on offer, yet a bit of judicious post-processing will turn things around. Here's a command that'll do the trick:

convert -page 792x612+72+72 input.png ps2:output.ps

For retrieving image metadata like its resolution and size, the identify command comes into play. The -verbose option invokes the output of all manner of image metadata, so using grep or egrep is perhaps advisable, especially for bulking processing with the likes of Perl. Having the ability to stream image metadata makes loading databases like MySQL less of a chore than the manual data entry that has been my way of doing things until now.

Another way to control line breaks in (X)HTML

22^nd October 2008

While you can use <br /> tags, there is another way to achieve similar results: the   or non-breaking space entity. Put one of them between two words, and you stop them getting separated by a line break; I have been using this in the latest design tweaks that I made to my online photo gallery. Turning this on its head, if you see two words together acting without regard to normal wrapping conventions, then you can suspect that a non-breaking space could be a cause. There might be CSS options too, but their effectiveness in different browsers may limit their usefulness.

Ghostscript: **** Unable to open the initial device, quitting.

6^th October 2008

The above error message has been greeting me when creating PDF's with Ghostscript on a Solaris box and does need some translation. If you are directing output to a real printer, I suppose that it is sensible enough: nothing will happen unless you can connect to it. It gets a little less obvious when associated with PDF creation and seems to mean that the pdfwrite virtual device is unable to create the specified output file. A first port of call would be to check that you can write to the directory where you are putting the new PDF file. In my case, there appears to be another cause, so I'll have to keep looking for a solution.

Update: I have since discovered the cause of this: a now defunct TEMP assignment in the .profile file for my user account. Removing that piece of code resolved the problem.

A way to combine PDF files in UNIX and Linux

4^th October 2008

My latest adventure in the world of computing has led me into the world of automated PDF generation. When my first approach didn't prove to be completely trouble-free, I decided to look at the idea of going part of the way with it and finishing off the job with the open source utility Ghostscript. It is that which got me thinking about combining bookmarked PDF files and I can say that Ghostscript is capable of producing what I need as long it doesn't generate any errors along the way. Here's the command that does the trick:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=final.pdf source_file1.pdf source_file2.pdf

The various switches of the gs command have very useful roles with dBATCH ensuring that Ghostscript shuts down when all is done, dNOPAUSE removing any prompts that would otherwise be given, q for quiet mode, sDEVICE using Ghostscript's own PDF creation functionality and sOutputFile creates the output file, stopping Ghostscript from sending it to its default stream. All of this applies to Windows Ghostscript too, though the name of the executable is gswin32c for 32-bit Windows instead of gs.

When it comes to any debugging, it is useful to consider that Ghostscript is case-sensitive with its command line switches, something that I have seen to trip up others. I am getting initial device initialisation, so it strikes me that dropping some of the ones that reduce the number of messages might help me work out what's going on. It's a useful idea that I have yet to try.

There is also online documentation if you fancy learning more, and Linux.com has an article that considers other possible PDF combination tools as well. All in all, it's nice to have command line tools to do these sorts of things rather than having to use GUI applications all the time.

« Older Entries «