Technology Tales

Adventures in consumer and enterprise technology

TOPIC: REGULAR EXPRESSION

Searching file contents using PowerShell

25th October 2018

Having made plenty of use of grep on the Linux/UNIX command and findstr on the legacy Windows command line, I wondered if PowerShell could be used to search the contents of files for a text string. Usefully, this turns out to be the case, but I found that the native functionality does not use what I have used before. The form of the command is given below:

Select-String -Path <filename search expression> -Pattern "<search expression>" > <output file>

While you can have the output appear on the screen, it always seems easier to send it to a file for subsequent use, and that is what I am doing above. The input to the -Path switch can be a filename or a wildcard expression, while that to the -Pattern can be a text string enclosed in quotes or a regular expression. Given that it works well once you know what to do, here is an example:

Select-String -Path *.sas -Pattern "proc report" > c:\temp\search.txt

The search.txt file then includes both the file information and the text that has been found for the sake of checking that you have what you want. What you do next is up to you.

About Perl's Binding Operator

20th May 2009

While this piece is as much an aide de memoire for myself as anything else, putting it here seems worthwhile if it answers questions for others. The binding operators, =~ or !~, come in handy when you are framing conditional statements in Perl using Regular Expressions, for example, testing whether x =~ /\d+/ or not. The =~ variant is also used for changing strings using the s/[pattern1]/[pattern2]/ regular expression construct (here, s stands for "substitute"). What has brought this to mind is that I wanted to ensure that something was done for strings that did not contain a certain pattern, and that's where the !~ binding operator came in useful; ^~ might have come to mind for some reason, but it wasn't what I needed.

Tidying dynamic URL’s

15th June 2007

A few years back, I came across a very nice article discussing how you would make a dynamic URL more palatable to a search engine, and I made good use of its content for my online photo gallery. The premise was that URL's that look like that below are no help to search engines indexing a website. Though this is received wisdom in some quarters, it doesn't seem to have done much to stall the rise of WordPress as a blogging platform.

http://www.mywebsite.com/serversidescript.php?id=394

That said, WordPress does offer a friendlier URL display option too, which you can see in use on this blog; they look a little like the example URL that you see below, and the approach is equally valid for both Perl and PHP. Since I have been using the same approach for the Perl scripts powering my online phone gallery, now want to apply the same thinking to a gallery written in PHP:

http://www.mywebsite.com/serversidescript.pl/id/394

The way that both expressions work is that a web server will chop pieces from a URL until it reaches a physical file. For a query URL, the extra information after the question mark is retained in its QUERY_STRING variable, while extraneous directory path information is passed in the variable PATH_INFO. For both Perl and PHP, these are extracted from the entries in an array; for Perl, this array is called is $ENV and $_SERVER is the PHP equivalent. Thus, $ENV{QUERY_STRING} and $_SERVER{'QUERY_STRING'} traps what comes after the ? while $ENV{PATH_INFO} and $_SERVER{'PATH_INFO'} picks up the extra information following the file name (/id/394/ in the example). From there on, the usual rules apply regarding cleaning of any input but changing from one to another should be too arduous.

HennessyBlog theme update

12th February 2007

Over the weekend, I have been updating the theme on my other blog, HennessyBlog. It has been a task that projected me onto a learning curve with the WordPress 2.1 codebase. Thus, I have collected what I encountered, so I know that it’s out there on the web for you (and I) to use and peruse. It took some digging to get to know some of what you find below. Since any function used to power WordPress takes some finding, I need to find one place on the web where the code for WordPress is more fully documented. The sites presenting tutorials on how to use WordPress are more often than not geared towards non-techies rather than code cutters like myself. Then again, they might be waiting for someone to do it for them…

The changes made are as follows:

Tweaks to the interface

These are subtle, with the addition of navigation controls to the sidebar and the change in location of the post metadata being the most obvious enhancements. “Decoration” with solid and dashed lines (using CSS border attributes rather than the deprecated hr tagset) and standards compliance links.

Standards compliance

Adding standards compliance links does mean that you’d better check that all is in order; it was then that I discovered that there was work to be done. There is an issue with the WordPress wpautop function (it lives in the formatting.php file) in that it sometimes doesn’t add closing tags. Finding out that it was this function that is implicated took a trip to the WordPress.org website; while a good rummage in the wp-includes folder does a lot, it can’t achieve everything.

Like many things in the WordPress code, the wpautop function isn’t half buried. The the_content function (see template-functions-post.php) used to output blog entries calls the get_content function (also in template-functions-post.php) to extract the data from MySQL. The add_filter function (in plugin.php) associates the wpautop function and others with the get_the_content function to add the p tags to the output.

To return to the non-ideal behaviour that caused me to start out on the above quest, an example is where you have an img tag enclosed by div tags. The required substitution involves the use of regular expressions that work most of the time but get confused here. So adding a hack to the wpautop function was needed to change the code so that the p end tag got inserted. I’ll be keeping an eye out for any more scenarios like this that slip through the net and for any side effects. Otherwise, compliance is just making sure that all those img tags have their alt attributes completed.

Tweaks to navigation code

Most of my time has been spent on tweaking of the PHP code supporting the navigation. Because different functions were being called in different places, I wanted to harmonise things. To accomplish this, I created new functions in the functions.php for my theme and needed to resolve a number of issues along the way. Not least among these were regular expressions used for subsetting with the preg_match function; these were not Perl-compliant to my eyes, as would be implied by the choice of function. Now that I have found that PCRE’s in PHP use a more pragmatic syntax, there appeared to be issues with the expressions that were being used. These seemed to behave OK in their native environment but fell out of favour within the environs of my theme. Being acquainted with Perl, I went for a more familiar expression style and the issue has been resolved.

Along the way, I broke the RSS feed. This was on my off-line test blog so no one, apart from myself, that is, would have noticed. After a bit of searching, I realised that some stray white-space from the end of a PHP file (wp-config.php being a favourite culprit), after the PHP end tag in the script file as it happens, was finding its way into the feed and causing things to fall over. Feed readers don’t take too kindly to the idea of the XML declaration not making an appearance on the first line of the file. Some confusion was caused by the refusal of Firefox to refresh things as it should before I realised that a forced refresh of the feed display was needed. Sometimes, it takes a while for an addled brain to think of these kinds of things.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.