-->
Adventures & experiences in contemporary technology
On here, I have posts with example commands that include double hyphens and they have been displayed merged together, something that has resulted in a comment posted by a visitor to this part of the web. All the while, I have been blaming the fonts that I have been using only for it to be the fault of WordPress itself.
Changing multiple dashes to something else has been a feature of Word autocorrect but I never expected to see WordPress aping that behaviour and it has been doing so for a few years now. The culprit is wptexturize and that cannot be disabled for it does many other useful things.
What happens is that the wptexturize filter changes ‘---‘ (double hyphens) to ‘–’ (– in web entity encoding) and ‘---‘ (triple hyphens) to ‘—’ (— in web entity encoding). The solution is to add another filter to the content that changes these back to the way they were and the following code does this:
add_filter( ‘the_content’ , ‘mh_un_en_dash’ , 50 );
function mh_un_en_dash( $content ) {
$content = str_replace( ‘–’ , ‘--‘ , $content );
$content = str_replace( ‘—’ , ‘---‘ , $content );
return $content;
}
The first line of the segment adds in the new filter that uses the function defined below it. The third and fourth lines above do the required substitution before the function returns the post content for display in the web page. The whole code block can be used to create a plugin or placed the theme’s functions.php file. Either way, things appear without the substitution confusing your readers. It makes me wonder if a bug report has been created for this because the behaviour looks odd to me.
There are whole pile of SAS functions for testing text strings that hadn’t come to my attention until this week. Until then, I’d have gone about using functions like INDEX and PRXMATCH functions for the same sort of ends but it’s never any load to have a few different ways of doing things and to use the right one for the job. Here’s a quick list of my recent discoveries:
ANYALNUM: First position of any alphanumeric character, returns 0 if absent
ANYALPHA: First position of any alphabetic character (letter of the alphabet), returns 0 if absent
ANYCNTRL: First position of any control character, returns 0 if absent
ANYDIGIT: First position of any numeric character, returns 0 if absent
ANYFIRST: First position of any character that can be used as the start of a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent
ANYGRAPH: First position of any printable character that isn’t white space, returns 0 if absent
ANYLOWER: First position of any lowercase letter, returns 0 if absent
ANYNAME: First position of any character that can be used in a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent
ANYPRINT: First position of any printable character, returns 0 if absent
ANYPUNCT: First position of any punctuation character, returns 0 if absent
ANYSPACE: First position of any whitespace character (tabs, carriage returns and the like), returns 0 if absent
ANYUPPER: First position of any uppercase letter, returns 0 if absent
ANYXDIGIT: First position of any hexadecimal character, returns 0 if absent
NOTALNUM: First position of any non-alphanumeric character, returns 0 if absent
NOTALPHA: First position of any non-alphabetic character, returns 0 if absent
NOTCNTRL: First position of anything that isn’t a control character, returns 0 if absent
NOTDIGIT: First position of any non-numeric character, returns 0 if absent
NOTFIRST: First position of any character that cannot be used as the start of a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent
NOTGRAPH: First position of anything that isn’t a printable character that isn’t white space, returns 0 if absent
NOTLOWER: First position of anything that isn’t a lowercase letter, returns 0 if absent
NOTNAME: First position of any character that cannot be used in a SAS variable name when VALIDVARNAME is set to V7, returns 0 if absent
NOTPRINT: First position of any non-printable character, returns 0 if absent
NOTPUNCT: First position of anything that isn’t a punctuation character, returns 0 if absent
NOTSPACE: First position of anything that isn’t a whitespace character, returns 0 if absent
NOTUPPER: First position of anything that isn’t an uppercase letter, returns 0 if absent
NOTXDIGIT: First position of anything that isn’t a hexadecimal character, returns 0 if absent
Apart from simpler cases where other techniques would work well with the a similar amount of effort, there are others that would need some investigation if you were program them without using one of the above functions. For that reason, I’ll be keeping them in mind for when I might meet one of those more complex scenarios.
Rendering opening and closing brackets as pieces in SAS macro language programming caused me a bit of grief until I got it sorted a few months back. All of the usual suspects for macro quoting (or escaping in other computer languages) let me down: even the likes of %SUPERQ or %NRBQUOTE didn’t do the trick. The honours were left to %NRQUOTE(%(), which performed what was required very respectably indeed. The second "%" escapes the bracket for %NRQUOTE to do the rest.