Technology Tales

Adventures & experiences in contemporary technology

Overriding replacement of double or triple hyphenation in WordPress

7th June 2016

On here, I have posts with example commands that include double hyphens and they have been displayed merged together, something that has resulted in a comment posted by a visitor to this part of the web. All the while, I have been blaming the fonts that I have been using only for it to be the fault of WordPress itself.

Changing multiple dashes to something else has been a feature of Word autocorrect but I never expected to see WordPress aping that behaviour and it has been doing so for a few years now. The culprit is wptexturize and that cannot be disabled for it does many other useful things.

What happens is that the wptexturize filter changes ‘---‘ (double hyphens) to ‘–’ (– in web entity encoding) and ‘---‘ (triple hyphens) to ‘—’ (— in web entity encoding). The solution is to add another filter to the content that changes these back to the way they were and the following code does this:

add_filter( ‘the_content’ , ‘mh_un_en_dash’ , 50 );
function mh_un_en_dash( $content ) {
$content = str_replace( ‘–’ , ‘--‘ , $content );
$content = str_replace( ‘—’ , ‘---‘ , $content );
return $content;
}

The first line of the segment adds in the new filter that uses the function defined below it. The third and fourth lines above do the required substitution before the function returns the post content for display in the web page. The whole code block can be used to create a plugin or placed the theme’s functions.php file. Either way, things appear without the substitution confusing your readers. It makes me wonder if a bug report has been created for this because the behaviour looks odd to me.

A few more SAS functions to know

22nd January 2010

There are whole pile of SAS functions for testing text strings that hadn’t come to my attention until this week. Until then, I’d have gone about using functions like INDEX and PRXMATCH functions for the same sort of ends but it’s never any load to have a few different ways of doing things and to use the right one for the job. Here’s a quick list of my recent discoveries:

ANYALNUM: First position of any alphanumeric character, returns 0 if absent

ANYALPHA: First position of any alphabetic character (letter of the alphabet), returns 0 if absent

ANYCNTRL: First position of any control character, returns 0 if absent

ANYDIGIT: First position of any numeric character, returns 0 if absent

ANYFIRST: First position of any character that can be used as the start of a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent

ANYGRAPH: First position of any printable character that isn’t white space, returns 0 if absent

ANYLOWER: First position of any lowercase letter, returns 0 if absent

ANYNAME: First position of any character that can be used in a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent

ANYPRINT: First position of any printable character, returns 0 if absent

ANYPUNCT: First position of any punctuation character, returns 0 if absent

ANYSPACE: First position of any whitespace character (tabs, carriage returns and the like), returns 0 if absent

ANYUPPER: First position of any uppercase letter, returns 0 if absent

ANYXDIGIT: First position of any hexadecimal character, returns 0 if absent

NOTALNUM: First position of any non-alphanumeric character, returns 0 if absent

NOTALPHA: First position of any non-alphabetic character, returns 0 if absent

NOTCNTRL: First position of anything that isn’t a control character, returns 0 if absent

NOTDIGIT: First position of any non-numeric character, returns 0 if absent

NOTFIRST: First position of any character that cannot be used as the start of a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent

NOTGRAPH: First position of anything that isn’t a printable character that isn’t white space, returns 0 if absent

NOTLOWER: First position of anything that isn’t a lowercase letter, returns 0 if absent

NOTNAME: First position of any character that cannot be used in a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent

NOTPRINT: First position of any non-printable character, returns 0 if absent

NOTPUNCT: First position of anything that isn’t a punctuation character, returns 0 if absent

NOTSPACE: First position of anything that isn’t a whitespace character, returns 0 if absent

NOTUPPER: First position of anything that isn’t an uppercase letter, returns 0 if absent

NOTXDIGIT: First position of anything that isn’t a hexadecimal character, returns 0 if absent

Apart from simpler cases where other techniques would work well with the a similar amount of effort, there are others that would need some investigation if you were program them without using one of the above functions. For that reason, I’ll be keeping them in mind for when I might meet one of those more complex scenarios.

Escaping brackets in SAS macro language

14th November 2007

Rendering opening and closing brackets as pieces in SAS macro language programming caused me a bit of grief until I got it sorted a few months back. All of the usual suspects for macro quoting (or escaping in other computer languages) let me down: even the likes of %SUPERQ or %NRBQUOTE didn’t do the trick. The honours were left to %NRQUOTE(%(), which performed what was required very respectably indeed. The second "%" escapes the bracket for %NRQUOTE to do the rest.

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.