Technology Tales

Notes drawn from experiences in consumer and enterprise technology

TOPIC: GITHUB

Preventing authentication credentials from entering Git repositories

6th February 2026

Keeping credentials out of version control is a fundamental security practice that prevents numerous problems before they occur. Once secrets enter Git history, remediation becomes complex and cannot guarantee complete removal. Understanding how to structure projects, configure tooling, and establish workflows that prevent credential commits is essential for maintaining secure development practices.

Understanding What Needs Protection

Credentials come in many forms across different technology stacks. Database passwords, API keys, authentication tokens, encryption keys, and service account credentials all represent sensitive data that should never be committed. Configuration files containing these secrets vary by platform but share common characteristics: they hold information that would allow unauthorised access to systems, data, or services.

# This should never be in Git
database:
  password: secretpassword123
api:
  key: sk_live_abc123def456
# Neither should this
DB_PASSWORD=secretpassword123
API_KEY=sk_live_abc123def456
JWT_SECRET=mysecretkey

Even hashed passwords require careful consideration. Whilst bcrypt hashes with appropriate cost factors (10 or higher, requiring approximately 65 milliseconds per computation) provide protection against immediate exploitation, they still represent sensitive data. The hash format includes a version identifier, cost factor, salt, and hash components that could be targeted by offline attacks if exposed. Plaintext credentials offer no protection whatsoever and represent immediate critical exposure.

Establishing Git Ignore Rules from the Start

The foundation of credential protection is proper .gitignore configuration established before any sensitive files are created. This proactive approach prevents problems rather than requiring remediation after discovery. Begin every project by identifying which files will contain secrets and excluding them immediately.

# Credentials and secrets
.env
.env.*
!.env.example
config/secrets.yml
config/database.yml
config/credentials/

# Application-specific sensitive files
wp-config.php
settings.php
configuration.php
appsettings.Production.json
application-production.properties

# User data and session storage
/storage/credentials/
/var/sessions/
/data/users/

# Keys and certificates
*.key
*.pem
*.p12
*.pfx
!public.key

# Cache and logs that might leak data
/cache/
/logs/
/tmp/
*.log

The negation pattern !.env.example demonstrates an important technique: excluding all .env files whilst explicitly including example files that show structure without containing secrets. This pattern ensures that developers understand what configuration is needed without exposing actual credentials.

Notice the broad exclusions for entire categories rather than specific files. Excluding *.key prevents any private key files from being committed, whilst !public.key allows the explicit inclusion of public keys that are safe to share. This defence-in-depth approach catches variations and edge cases that specific file exclusions might miss.

Separating Examples from Actual Configuration

Version control should contain example configurations that demonstrate structure without exposing secrets. Create .example or .sample files that show developers what configuration is required, whilst keeping actual credentials out of Git entirely.

# config/secrets.example.yml
database:
  host: localhost
  username: app_user
  password: REPLACE_WITH_DATABASE_PASSWORD

api:
  endpoint: https://api.example.com
  key: REPLACE_WITH_API_KEY
  secret: REPLACE_WITH_API_SECRET

encryption:
  key: REPLACE_WITH_32_BYTE_ENCRYPTION_KEY

Documentation should explain where developers obtain the actual values. For local development, this might involve running setup scripts that generate credentials. For production, it involves deployment processes that inject secrets from secure storage. The example file serves as a template and checklist, ensuring nothing is forgotten whilst preventing accidental commits of real secrets.

Using Environment Variables

Environment variables provide a standard mechanism for separating configuration from code. Applications read credentials from the environment rather than from files tracked in Git. This pattern works across virtually all platforms and languages.

// Instead of hardcoding
$db_password = 'secretpassword123';

// Read from environment
$db_password = getenv('DB_PASSWORD');
// Instead of requiring a config file with secrets
const apiKey = 'sk_live_abc123def456';

// Read from environment
const apiKey = process.env.API_KEY;

Environment files (.env) provide convenience for local development, but must be excluded from Git. The pattern of .env for actual secrets and .env.example for structure becomes standard across many frameworks. Developers copy the example to create their local configuration, filling in actual values that never leave their machine.

Implementing Pre-Commit Hooks

Pre-commit hooks provide automated checking before changes enter the repository. These hooks scan staged files for patterns that match secrets and block commits when suspicious content is detected. This automated enforcement prevents mistakes that manual review might miss.

The pre-commit framework manages hooks across multiple repositories and languages. Installation is straightforward, and configuration defines which checks run before each commit.

pip install pre-commit

Create a configuration file defining which hooks to run:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: check-added-large-files
      - id: check-json
      - id: check-yaml
      - id: detect-private-key
      - id: end-of-file-fixer
      - id: trailing-whitespace

  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

Install the hooks in your repository:

pre-commit install

Now every commit triggers these checks automatically. The detect-private-key hook catches SSH keys and other private key formats. The detect-secrets hook uses entropy analysis and pattern matching to identify potential credentials. When suspicious content is detected, the commit is blocked and the developer is alerted to review the flagged content.

Configuring Git-Secrets

The git-secrets tool from AWS specifically targets secret detection. It scans commits, commit messages, and merges to prevent credentials from entering repositories. Installation and configuration establish patterns that identify secrets.

# Install via Homebrew
brew install git-secrets

# Install hooks in a repository
cd /path/to/repo
git secrets --install

# Register AWS patterns
git secrets --register-aws

# Add custom patterns
git secrets --add 'password\s*=\s*["\047][^\s]+'
git secrets --add 'api[_-]?key\s*=\s*["\047][^\s]+'

The tool maintains a list of prohibited patterns and scans all content before allowing commits. Custom patterns can be added to match organisation-specific secret formats. The --register-aws command adds patterns for AWS access keys and secret keys, whilst custom patterns catch application-specific credential formats.

For teams, establishing git-secrets across all repositories ensures consistent protection. Template directories provide a mechanism for automatic installation in new repositories:

# Create a template with git-secrets installed
git secrets --install ~/.git-templates/git-secrets

# Use the template for all new repositories
git config --global init.templateDir ~/.git-templates/git-secrets

Now, every git init automatically includes secret scanning hooks.

Enabling GitHub Secret Scanning

GitHub Secret Scanning provides server-side protection that cannot be bypassed by local configuration. GitHub automatically scans repositories for known secret patterns and alerts repository administrators when matches are detected. This works for both new commits and historical content.

For public repositories, secret scanning is enabled by default. For private repositories, it requires GitHub Advanced Security. Enable it through repository settings under Security & Analysis. GitHub maintains partnerships with service providers to detect their specific secret formats, and when a partner pattern is found, both you and the service provider are notified.

The scanning covers not just code but also issues, pull requests, discussions, and wiki content. This comprehensive approach catches secrets that might be accidentally pasted into comments or documentation. The detection happens continuously, so even old content gets scanned when new patterns are added.

Custom patterns extend detection to organisation-specific secret formats. Define regular expressions that match your internal API key formats, authentication tokens, or other proprietary credentials. These custom patterns apply across all repositories in your organisation, providing consistent protection.

Structuring Projects for Credential Isolation

Project structure itself can prevent credentials from accidentally entering Git. Establish clear separation between code that belongs in version control and configuration that remains environment-specific. Create dedicated directories for credentials and ensure they are excluded from tracking.

project/
├── src/                    # Code - belongs in Git
├── tests/                  # Tests - belongs in Git
├── config/
│   ├── app.yml            # General config - belongs in Git
│   ├── secrets.example.yml # Example - belongs in Git
│   └── secrets.yml        # Actual secrets - excluded from Git
├── credentials/           # Entire directory excluded
│   ├── database.yml
│   └── api-keys.json
├── .env.example           # Example - belongs in Git
├── .env                   # Actual secrets - excluded from Git
└── .gitignore             # Defines exclusions

This structure makes it obvious which files contain secrets. The credentials/ directory is clearly separated from source code, and its exclusion from Git is explicit. Developers can see at a glance that this directory requires different handling.

Documentation should explain the structure and the reasoning behind it. New team members need to understand why certain directories are empty in their fresh clones and where to obtain the configuration files that populate them. Clear documentation prevents confusion and ensures everyone follows the same patterns.

Managing Development Credentials

Development environments require credentials but should never use production secrets. Generate separate development credentials that provide access to development resources only. These credentials can be less stringently protected whilst still not being committed to Git.

Development credential management varies by organisation size and infrastructure. For small teams, shared development credentials stored in a team password manager might suffice. For larger organisations, each developer receives individual credentials for development resources, with access controlled through identity management systems.

Some teams commit development credentials intentionally, arguing that development databases contain no sensitive data and convenience outweighs risk. This approach is controversial and depends on your security model. If development credentials can access any production resources or if development data has any sensitivity, they must be protected. Even purely synthetic development data might reveal business logic or system architecture worth protecting.

The safer approach maintains the same credential handling patterns across all environments. This ensures that developers build habits that prevent production credential exposure. When development and production follow identical patterns, muscle memory built during development prevents mistakes in production.

Provisioning Production Credentials

Production credentials should never touch developer machines or version control. Deployment processes inject credentials at runtime through environment variables, secret management services, or deployment-time configuration.

Continuous deployment pipelines read credentials from secret stores and make them available to applications without exposing them to humans. GitHub Actions, GitLab CI, Jenkins, and other CI/CD systems provide secure variable storage that is injected during builds and deployments.

# .github/workflows/deploy.yml
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Deploy to production
        env:
          DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
          API_KEY: ${{ secrets.API_KEY }}
        run: |
          ./deploy.sh

The secrets.DB_PASSWORD syntax references encrypted secrets stored in GitHub's secure storage. These values are never exposed in logs or visible to anyone except during the deployment process. The deployment script receives them as environment variables and can configure the application appropriately.

Secret management services like HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager provide centralised credential storage with access controls, audit logging, and automatic rotation. Applications authenticate to these services and retrieve credentials at runtime, ensuring that secrets are never stored on disk or in environment files.

Rotating Credentials Regularly

Regular credential rotation limits exposure duration if secrets are compromised. Establish rotation schedules based on credential sensitivity and access patterns. Database passwords might rotate quarterly, API keys monthly, and authentication tokens weekly or daily. Automated rotation reduces operational burden and ensures consistency.

Rotation requires coordination between secret generation, distribution, and application updates. Secret management services can automate much of this process, generating new credentials, updating secure storage, and triggering application reloads. Manual rotation involves generating new credentials, updating all systems that use them, and verifying functionality before disabling old credentials.

The rotation schedule balances security against operational complexity. More frequent rotation provides better security but increases the risk of service disruption if processes fail. Less frequent rotation simplifies operations but extends exposure windows. Find the balance that matches your risk tolerance and operational capabilities.

Training and Culture

Technical controls provide necessary guardrails, but security ultimately depends on people understanding why credentials matter and how to protect them. Training should cover the business impact of credential exposure, the techniques for keeping secrets out of Git, and the procedures for responding if mistakes occur.

New developer onboarding should include credential management as a core topic. Before developers commit their first code, they should understand what constitutes a secret, why it must stay out of Git, and how to configure their local environment properly. This prevents problems rather than correcting them after they occur.

Regular security reminders reinforce good practices. When new secret types are introduced or new tools are adopted, communicate the changes and update documentation. Security reviews should check credential handling practices, not just looking for exposed secrets, but also verifying that proper patterns are followed.

Creating a culture where admitting mistakes is safe encourages early reporting. If a developer accidentally commits a credential, they should feel comfortable immediately alerting the security team, rather than hoping no one notices. Early detection enables faster response and reduces damage.

Responding to Detection

Despite best efforts, secrets sometimes enter repositories. Rapid response limits damage. Immediate credential rotation assumes compromise and prevents exploitation. Removing the file from future commits whilst leaving it in history provides no security benefit, as the exposure has already occurred.

Tools like BFG Repo-Cleaner can remove secrets from Git history, but this is complex and cannot guarantee complete removal. Anyone who cloned the repository before clean-up retains the compromised credentials in their local copy. Forks, clones on other systems, and backup copies may all contain the secrets.

The most reliable response is assuming the credential is compromised and rotating it immediately. History clean-up can follow as a secondary measure to reduce ongoing exposure, but it should never be the primary response. Treat any secret that entered Git as if it were publicly posted because once in Git history, it effectively was.

Continuous Improvement

Credential management practices should evolve with your infrastructure and team. Regular reviews identify gaps and opportunities for improvement. When new credential types are introduced, update .gitignore patterns, secret scanning rules, and documentation. When new developers join, gather feedback on clarity and completeness of onboarding materials.

Metrics help track effectiveness. Monitor secret scanning alerts, track rotation compliance, and measure time-to-rotation when credentials are exposed. These metrics identify areas needing improvement and demonstrate progress over time.

Summary

Preventing credentials from entering Git repositories requires multiple complementary approaches. Establish comprehensive .gitignore configurations before creating any credential files. Separate example configurations from actual secrets, keeping only examples in version control. Use environment variables to inject credentials at runtime rather than storing them in configuration files. Implement pre-commit hooks and server-side scanning to catch mistakes before they enter history. Structure projects to clearly separate code from credentials, making it obvious what belongs in Git and what does not.

Train developers on credential management and create a culture where security is everyone's responsibility. Provision production credentials through deployment processes and secret management services, ensuring they never touch developer machines or version control. Rotate credentials regularly to limit exposure windows. Respond rapidly when secrets are detected, assuming compromise and rotating immediately.

Security is not a one-time configuration but an ongoing practice. Regular reviews, continuous improvement, and adaptation to new threats and technologies keep credential management effective. The investment in prevention is far less than the cost of responding to exposed credentials, making it essential to get right from the beginning.

Related Resources

Four technical portals that still deliver after decades online

3rd February 2026

The early internet was built on a different kind of knowledge sharing, one driven by individual expertise, community generosity and the simple desire to document what worked. Four informative websites that started in that era, namely MDN Web Docs, AskApache, WindowsBBS and Office Watch, embody that spirit and remain valuable today. They emerged at a time when technical knowledge was shared through forums, documentation and personal blogs rather than social media or algorithm-driven platforms, and their legacy persists in offering clarity and depth in an increasingly fragmented digital landscape.

MDN Web Docs

MDN Web Docs stands as a cornerstone of modern web development, offering comprehensive coverage of HTML, CSS, JavaScript and Web APIs alongside authoritative references for browser compatibility. Mozilla started the project in 2005 under the name Mozilla Developer Centre, and it has since grown into a collaborative effort of considerable scale. In 2017, Mozilla announced a formal partnership with Google, Microsoft, Samsung and the W3C to consolidate web documentation on a single platform, with Microsoft alone redirecting over 7,700 of its MSDN pages to MDN in that year.

For developers, the site is not merely a reference tool but a canonical guide that ensures standards are adhered to and best practices followed. Its tutorials, guides and learning paths make it indispensable for beginners and seasoned professionals alike. The site's community-driven updates and ongoing contributions from browser vendors have cemented its reputation as the primary source for anyone building for the web.

AskApache

AskApache is a niche but invaluable resource for those managing Apache web servers, built by a developer whose background lies in network security and penetration testing on shared hosting environments. The site grew out of the founder's detailed study of .htaccess files, which, unlike the main Apache configuration file httpd.conf, are read on every request and offer fine-grained, per-directory control without requiring root access to the server. That practical origin gives the content its distinctive character: these are not generic tutorials, but hard-won techniques born from real-world constraints.

The site's guides on blocking malicious bots, configuring caching headers, managing redirects with mod_rewrite and preventing hot-linking are frequently cited by system administrators and WordPress users. Its specificity and longevity have made it a trusted companion for those maintaining complex server environments, covering territory that mainstream documentation rarely touches.

WindowsBBS

WindowsBBS offers a clear window into the era when online forums were the primary hub for technical support. Operating in the tradition of classic bulletin board systems, the site has long been a resource for users troubleshooting Windows installations, hardware compatibility issues and malware removal. It remains completely free, sustained by advertisers and community donations, which reflects the ethos of mutual aid that defined early internet culture.

During the Windows XP and Windows 7 eras, community forums of this kind were essential for solving problems that official documentation often overlooked, with volunteers providing detailed answers to questions that Microsoft's own support channels would not address. While the rise of social media and centralised support platforms has reduced the prominence of such forums, WindowsBBS remains a testament to the power of community-driven problem-solving. Its straightforward structure, with users posting questions and experienced volunteers providing answers, mirrors the collaborative spirit that made the early web such a productive environment.

Office Watch

Office Watch has served as an independent source of Microsoft Office news, tips and analysis since 1996, making it one of the longer-running specialist publications of its kind. Its focus on Microsoft Office takes in advanced features and hidden tools that are seldom documented elsewhere, from lesser-known functions in Excel to detailed comparisons between Office versions and frank assessments of Microsoft's product decisions. That independence gives it a voice that official resources cannot replicate.

The site serves power users seeking to make the most of the software they use every day, with guides and books that extend its reach beyond the website itself. In an era where software updates are frequent and often poorly explained, Office Watch provides the kind of context and plain-spoken clarity that official documentation rarely offers.

The Enduring Value of Depth and Community

These four sites share a common thread: they emerged when technical knowledge was shared openly by experts and enthusiasts rather than filtered through algorithms or paywalls, and they retain the value that comes from that approach. Their continued relevance speaks to what depth, specificity and community can achieve in the digital world. While platforms such as Stack Overflow and GitHub Discussions have taken over many of the roles these sites once played, the original resources remain useful for their historical context and the quality of their accumulated content.

As the internet continues to evolve, the lessons from these sites are worth remembering. The most useful knowledge is often found at the margins, where dedicated individuals take the time to document, explain and share what they have learned. Whether you are a developer, a server administrator or an everyday Office user, these resources are more than archives: they are living repositories of expertise, built by people who cared enough to write things down properly.

WordPress Starter Themes: From bare foundations to modern workflows

1st February 2026

Starter themes have long occupied an important place in WordPress development. They sit between a completely blank project and a fully styled off-the-shelf theme, offering enough structure to speed up work without dictating how the finished site must look. For agencies, freelancers and in-house teams, that balance can save considerable time, allowing developers to begin with a lean codebase and concentrate on the parts that make each site distinct.

That broad appeal helps explain why starter themes continue to evolve in different directions. Some remain deliberately minimal and close to WordPress core conventions, while others embrace modern tooling such as Composer, Vite, Tailwind CSS and component-based templating. Alongside these are starter themes intended for visual builders and users who want a gentler route into customisation. Taken together, they demonstrate that there is no single definition of a WordPress starter theme, with the common thread being that each provides a starting point rather than a finished product.

wd_s: A Generator-Driven Approach

wd_s is a generator-driven starter theme from WebDevStudios. The wd_s generator makes the setup process more tailored by asking for project details: name, URL, description, namespace, text domain, author, author URL, author email and development URL. Once those details are entered, the script performs a find-and-replace and delivers a ZIP file ready to extract into wp-content/themes. The process is straightforward, though it also reflects a more structured approach to project setup than many starter themes provide.

The generator highlights details that matter in real-world theme development, but are sometimes overlooked when beginning from a generic scaffold. Name-spacing, text domains and project metadata all play a part in maintainability and localisation, and wd_s brings those choices to the fore from the very beginning. The example values shown by the generator, such as "Acme Inc." for the name field and a namespace using underscores, are illustrative rather than prescriptive. What stands out more than any one field is the intention to reduce repetitive manual setup and encourage consistency from the very start of a project.

Sage: A Modern Development Workflow

Roots Sage represents a significant shift towards a modern development workflow. Sage is a Tailwind CSS WordPress starter theme with Laravel Blade templating, currently at version 11.1.0 and with over 13,191 GitHub stars at the time of writing. Setup uses Composer and NPM rather than a simple ZIP download, and a typical installation begins in wp-content/themes with composer create-project roots/sage my-theme, followed by npm install and npm run build.

That workflow signals the audience Sage is aimed at. Rather than merely wrapping WordPress templates in a minimal theme shell, Sage introduces tooling and conventions familiar to developers who work with Laravel and modern front-end stacks. The Vite build process generates files including a manifest, compiled CSS and JavaScript and a theme.json, completing the whole build in a matter of seconds and demonstrating that WordPress development can be integrated into contemporary asset pipelines without giving up compatibility with the CMS.

Blade Templating and Component-Driven Design

Blade templating is central to Sage's proposition. The base layout in resources/views/layouts/app.blade.php shows a clean separation of structure and content using directives such as @include, @yield and @hasSection. Header and footer hooks still call familiar WordPress functions including wp_head, wp_body_open and wp_footer, but the surrounding syntax is closer to Laravel than traditional PHP-heavy WordPress templates. This gives developers access to template inheritance, reusable components and directives, making larger codebases considerably easier to organise.

Reusable components illustrate this style compactly. Properties define type and message, a PHP match expression selects classes based on alert type and the component merges those classes into its final markup. The result is not merely an isolated snippet but a demonstration of how Sage encourages component-driven design, reducing repetition and making presentation logic easier to follow, particularly in projects with many shared interface elements.

Tailwind CSS and the Block Editor

Sage places strong emphasis on integrating Tailwind CSS with the WordPress block editor. It automatically generates theme.json from the Tailwind configuration, making colours, font families and sizes immediately available in the block editor with zero additional configuration. The sample app.css imports Tailwind and points at views and app files as content sources, while the generated theme.json includes settings for layout, background, colour palettes, spacing and typography. The palette includes eleven shades of grey along with black and white, and the typography settings mirror Tailwind's familiar scale from xs through 9xl.

This addresses a long-standing friction point on WordPress theming: keeping front-end design systems in sync with the editing experience. In older workflows, editor styles and front-end styles often drifted apart, creating extra maintenance work and inconsistency for content editors. Sage's approach narrows that gap by deriving editor settings directly from the same Tailwind configuration used for the front end, with theme.json generated during the build process rather than maintained by hand.

Theme Structure and the Roots Ecosystem

The theme structure for Sage reinforces its emphasis on organisation. The app directory contains providers, view composers, filters.php and setup.php, while resources holds CSS, JavaScript and Blade views grouped into components, layouts, partials and sections. Public assets, composer.json, package.json, theme.json and vite.config.js complete the structure, paired with PSR-4 autoloading, service providers and Acorn, which brings Laravel-style patterns into WordPress. The Vite configuration includes Tailwind, the Laravel Vite plugin and Roots plugins for WordPress support and theme.json generation, plus aliases for scripts, styles, fonts and images.

Another notable feature is hot module replacement in the WordPress block editor, with style changes updating instantly without page refreshes. Sage sits within the broader Roots ecosystem, which also includes Bedrock (a WordPress boilerplate for Composer and Git-based projects), Trellis (a server provisioning and deployment tool), Acorn, Radicle (which bundles the entire Roots stack into a single starting point) and WP Packages (a Composer repository for WordPress plugins and themes). Testimonials on the Roots website emphasise that many developers regard this ecosystem as a route to a more structured and modern WordPress experience, with Sage having been actively maintained for over a decade.

Visual Composer Starter Theme: A Builder-Friendly Option

Not every starter theme is aimed at developers working with command-line tooling and component-based templates. The Visual Composer Starter Theme occupies a different place in the landscape, described as a free bundle of a lightweight theme and a powerful WordPress page builder. It is aimed at building blogs, WooCommerce stores, business sites and personal websites, and the language surrounding it stresses ease of use, intuitive theme options and layout customisation tools, presented as a free resource intended to support the WordPress community.

Its feature set reflects that broader audience. The theme is easy to customise through the WordPress customiser, SEO-friendly and responsive by default, covering use cases such as personal blogs, landing pages, business sites, portfolios, startups and online stores. WooCommerce compatibility receives particular emphasis, with support for adjusting design preferences via the customiser and building online shops at no cost. Hero and featured images, unlimited colour options, page-level design controls and a choice between regular and mobile sandwich-style menus are all included.

There is also a strong focus on compatibility. The theme is fully translation-ready and compatible with WPML, qTranslate and Polylang, while support for Advanced Custom Fields and Toolset for custom post type development is highlighted. It is also presented as ready to combine with the Visual Composer website builder and is developed openly on GitHub, where anyone can contribute. This is less about offering an unvarnished code scaffold and more about giving users a flexible visual base with a broad range of built-in options, though it remains part of the starter theme conversation because it is designed to be extended rather than merely installed and left untouched.

Bones: Speed, Control and Pragmatism

Bones, designed and developed by Eddie Machado, returns more closely to the classic developer-oriented concept while retaining a distinctive voice. It is described as an HTML5, mobile-first starter theme for rapid WordPress development, and it makes clear that it is not a framework. Frameworks can introduce their own conventions and complexity, whereas Bones is designed to be as bare and minimalistic as possible, intended to be used on a per-project basis with no child themes.

The mobile-first emphasis is one of Bones' defining characteristics. Its Sass setup serves minimal resources to smaller screens before scaling up for larger viewports, an approach tied to performance as well as responsiveness, and Bones includes extensive comments and examples to help developers get started with Sass. It also provides a well-documented example for custom post types and functions to customise the WordPress admin area for clients, though these are entirely optional and can be removed if not needed. The project is released under the WTFPL, one of the most permissive licences available, and takes pride in removing unnecessary elements from the WordPress header to keep output clean and lightweight. The philosophy is to keep what is useful and discard the rest, building from a solid and speedy foundation.

Selecting a Starter Theme to Match Your Workflow

When viewed together, these themes reveal how varied the starter theme category has become. A Speckyboy roundup of top starter and bare-bones themes for WordPress development in 2026 (last updated on the 8th of March 2026) places Sage alongside newer and more editor-focused options including Blockbase, GeneratePress, Air, WDS BT, Byvex and Flynt. The roundup notes that every website serves different goals and that WordPress is flexible enough to support them all, but also makes clear that starting each project from scratch leads to repeated work, with starter themes offering a way to avoid that repetition while preserving freedom over design and functionality.

The same roundup provides a useful framework for evaluating starter themes. Ongoing maintenance matters because themes need to keep pace with WordPress and surrounding technologies, and themes that have not been updated in years should be avoided. The distinction between classic and block themes is important, since developers need a starting point that aligns with their preferred editing model. Features that genuinely speed development, whether block patterns, a comprehensive settings panel or development tools, can make a significant difference over time. A starter theme should also stay out of the way rather than burden projects with an opinionated design direction, and compatibility with a preferred editor or page builder remains central to choosing well.

Whether the preference is for the generator-based setup of wd_s, the modern tooling of Sage, the builder-friendly versatility of Visual Composer Starter Theme or the stripped-back classic structure of Bones, each represents a different answer to the same challenge. Developers and site builders often need a head start rather than a finished design, and a good starter theme provides exactly that, while leaving enough room for the final result to become something entirely its own.

Modernising SAS: The 4GL Apps and SASjs Ecosystem

31st January 2026

Custom interfaces to the world's most powerful analytics platform are no longer a niche concern. In many organisations, SAS remains central to reporting, modelling and operational decision-making, yet the way users interact with that capability can vary widely. Some teams still rely on desktop applications, batch processes, shared drives and manual interventions, while others are moving towards web-based interfaces, stronger governance and a more modern development workflow. The material at sasapps.io points to an ecosystem built around precisely that transition, blending long-standing SAS expertise with open-source tooling and documented delivery methods.

The Company Behind the Ecosystem

At the centre of this transition is 4GL Apps. The company's positioning is straightforward: help organisations leverage their SAS investment through services, solutions and products that fit specific needs. Rather than replacing SAS, the aim is to extend it with custom interfaces and delivery approaches that are maintainable, transparent and based on standard frameworks. An emphasis on documentation appears throughout the site, suggesting that projects are intended either for handover to internal teams or for ongoing support under clearly defined packages.

That proposition matters because many SAS environments have grown over years, sometimes decades. In such settings, technical capability is rarely the issue. The challenge is more often how to expose that capability in ways that are usable, secure and sustainable. A powerful analytics platform can still be hampered by awkward user journeys, brittle desktop tooling or resource-heavy support arrangements, and the 4GL Apps model tries to address those practical concerns without discarding existing SAS infrastructure.

Services

The service offering gives a useful sense of how this approach is organised. One strand is SAS App Delivery, framed not merely as building applications, but also as building tools that make SAS app development faster. That detail points to an emphasis on repeatability rather than one-off implementation. Another strand is SAS App Support, aimed at organisations with existing SAS-powered applications but insufficient internal resource to keep them running. Fixed-price plans are offered to keep those interfaces active, which implies an attempt to make operational costs more predictable. A third service area is SASjs Enhancement, where new features can be added to SASjs at a discounted rate to support particular use cases.

Solutions

These services sit alongside a broader set of solutions. One is the creation of SAS-powered HTML5 applications, described as bespoke builds tailored to specific workflow and reporting requirements, using fully open-source tools, standard frameworks and full documentation. Clients are given a practical choice: maintain the application in-house or use a transparent support package. Another solution addresses end-user computing risk through data capture and control. Here, the approach enables business users to self-load VBA-driven Excel reporting tools into a preferred database while applying data quality checks at source, a four-eyes (or more) approval step at each stage and full audit traceability back to the original EUC artefact. A further solution is the modernisation of legacy AF/SCL desktop applications, with direct migration to SAS 9 or Viya in order to improve user experience, security and scalability while moving to a modern SAS stack supported by open-source technology.

That last area reveals a theme running through the whole ecosystem: modernisation does not necessarily mean abandoning what exists. In many SAS estates, AF/SCL applications remain deeply embedded in business processes, and replacing them outright can be costly and risky, especially when they encode years of operational logic. A migration path that preserves business function while improving maintainability and interface design will naturally appeal to teams that need progress without disruption.

Products

The product range fills out the picture further. Data Controller for SAS enables business users to make controlled changes to data in SAS. The SASjs Framework is a collection of open-source tools to accelerate SAS DevOps and the development of SAS-powered web applications. There is also an AF/SCL Kit, migration tooling for the rapid modernisation of monolithic AF/SCL applications. Together, these products form a stack covering interface delivery, governed data change and development workflow, and they suggest that the company's work is not limited to consultancy but includes reusable software assets with their own documentation and source code.

Data Controller: Governance and Audit

Data Controller receives the richest functional description in the ecosystem's documentation. It is intended for business owners in regulatory reporting environments and, more broadly, for any enterprise that needs to perform manual data uploads with validation, approval, security and control. The rationale is rooted in familiar SAS working practices. Users may place files on network drives for batch loading, update data directly using SAS code, open a dataset in Enterprise Guide and change a value, or ask a database administrator to run a script update. According to the product's own documentation, those approaches are less than ideal: every new piece of data may require a new programme, end users may need to have `modify` access to sensitive data locations, datasets can become locked, and change requests can slow the process.

Data Controller is presented as a response to those weaknesses. The goal is described as focusing on great user experience and auditor satisfaction, while saving years of development and testing compared with a custom-built alternative. It is a SAS-powered web application with real-time capabilities, where intraday concurrent updates are managed using a lock table and queuing mechanism. Updates are aborted if another user has changed the table since the approval difference was generated, which helps preserve consistency in multi-user environments. Authentication and authorisation rely on the existing SASLogon framework, and end users do not require direct access to the target tables.

The governance model is equally central. All data changes require one or more approvals before a table is updated, and the approver sees only the changes that will be applied to the target, including new, deleted and changed rows. The system supports loading tables of different types through SAS libname engines, with support for retained keys, SCD2 loads, bitemporal data and composite primary keys. Full audit history is a prominent feature: users can track every change to data, including who made it, when it was made, why it was made and what the actual change was, all accessible through a History page.

A particularly notable feature is that onboarding new tables requires zero code. Adding a table is a matter of configuration performed within the tool itself, without the need to define column types or lengths manually, as these are determined dynamically at runtime. Workflow extensibility is built in through configurable hook scripts that execute before and after each action, with examples such as running a data quality check after uploading a mapping table or running a model after changing a parameter. Taken together, those features position Data Controller less as a narrow upload utility and more as a governed operational layer for business-managed data change.

The application was designed to work on multiple devices and different screen types, combined with SAS scalability and security to provide flexibility and location independence when managing data. This suggests it is intended for practical day-to-day use by business teams rather than solely by technical specialists at a desktop workstation.

SASjs: DevOps for SAS

Underpinning much of the ecosystem is SASjs, described on its GitHub organisation page as "DevOps for SAS." It is designed to accelerate the development and deployment of solutions on all flavours of SAS, including Viya, EBI and Base. Everything in SASjs is MIT open-source and free for commercial use. The framework also explicitly underpins Data Controller for SAS, which connects the product and framework strands of the wider ecosystem. The GitHub organisation page notes that the SASjs project and its repositories are not affiliated with SAS Institute.

The resources page at sasjs.io lists the key GitHub repositories: the Macro Core library, the SASjs adapter for bidirectional SAS and JavaScript communication, the SASjs CLI, a minimal seed application and seed applications for React and Angular. Documentation sites cover the adapter, CLI, Macro Core library, SASjs Server and Data Controller. Useful external links from the same resources page include guides to building and deploying web applications with the SASjs CLI, scaffolding SAS projects with NPM and SASjs, extending Angular web applications on Viya and building a vanilla JavaScript application on SAS 9 or Viya. There is also mention of a Viya log parser, training resources, guides, FAQs and a glossary, pointing to an effort to support both implementation and adoption.

The SASjs CLI

The command-line tooling, documented at cli.sasjs.io, gives a clearer view of how SASjs approaches DevOps. The CLI is described as a Swiss-army knife with a flexible set of options and utilities for DevOps on SAS Viya, SAS 9 EBI and SASjs Server. Its core functions include creating a SAS Git repository in an opinionated way, compiling each service with all dependent macros, macro variables and pre- or post-code, building the master SAS deployment, deploying through local scripts and remote SAS programmes, running unit tests with coverage and generating a Doxygen documentation site with data lineage, homepage and project logo from the configuration file. There is also a feature for deploying a frontend as a streaming application, bypassing the need to access the SAS web server directly.

The full command set covers the project lifecycle. The CLI can add and authenticate targets, compile and build projects, deploy them to a SAS server location, generate documentation and manage contexts, folders and files. It can execute jobs, run arbitrary SAS code from the terminal, deploy a service pack and generate a snippets file for macro autocompletion in VS Code. It can also lint SAS code to identify common problems and run unit tests while collecting results in JSON or CSV format, together with logs. In effect, this brings SAS development considerably closer to the workflows commonly seen in mainstream software engineering, which may be especially valuable in organisations trying to standardise delivery practices across mixed technology estates.

Presentations and the Wider SAS Community

The slides.sasjs.io collection adds another dimension by showing that these ideas have been presented in conference and user group settings. Available decks cover DevOps for MSUG, SUGG and WUSS, SASjs for application development, SASjs Server, AF and AF/SCL modernisation, SASjs for PHUSE, testing and a legacy SAS apps presentation for FANS in January 2023. While slide decks alone do not prove adoption or outcomes, they do show a sustained effort to communicate methods and patterns to the broader SAS community, consistent with the open documentation and MIT licensing found throughout the ecosystem.

Building a Modern Layer Around an Established Platform

The most useful way to understand this ecosystem is not as a single product but as a layered approach. At one level, there are services for building and supporting applications. At another, there are packaged tools such as Data Controller and the AF/SCL Kit. Underneath both sits SASjs, providing open-source components and delivery practices intended to make SAS development more structured and scalable. The combination of bespoke SAS-powered HTML5 applications, governed data update tooling, AF/SCL migration support and open-source DevOps utilities points to a coherent effort to modernise how SAS is delivered and used, without severing ties to established platforms. SAS remains the analytical engine, but the interfaces, workflows and operational controls around it are updated to reflect current expectations in web application design, governance and DevOps practice.

Rendering Markdown in WordPress without plugins by using Parsedown

4th November 2025

Much of what is generated using GenAI as articles is output as Markdown, meaning that you need to convert the content when using it in a WordPress website. Naturally, this kind of thing should be done with care to ensure that you are the creator and that it is not all the work of a machine; orchestration is fine, regurgitation does that add that much. Naturally, fact checking is another need as well.

Writing plain Markdown has secured its own following as well, with WordPress plugins switching over the editor to facilitate such a mode of editing. When I tried Markup Markdown, I found it restrictive when it came to working with images within the text, and it needed a workaround for getting links to open in new browser tabs as well. Thus, I got rid of it to realise that it had not converted any Markdown as I expected, only to provide rendering at post or page display time. Rather than attempting to update the affected text, I decided to see if another solution could be found.

This took me to Parsedown, which proved to be handy for accomplishing what I needed once I had everything set in place. First, that meant cloning its GitHub repo onto the web server. Next, I created a directory called includes under that of my theme. Into there, I copied Parsedown.php to that location. When all was done, I ensured that file and directory ownership were assigned to www-data to avoid execution issues.

Then, I could set to updating the functions.php file. The first line to get added there included the parser file:

require_once get_template_directory() . '/includes/Parsedown.php';

After that, I found that I needed to disable the WordPress rendering machinery because that got in the way of Markdown rendering:

remove_filter('the_content', 'wpautop');
remove_filter('the_content', 'wptexturize');

The last step was to add a filter that parsed the Markdown and passed its output to WordPress rendering to do the rest as usual. This was a simple affair until I needed to deal with code snippets in pre and code tags. Hopefully, the included comments tell you much of what is happening. A possible exception is $matches[0]which itself is an array of entire <pre>...</pre> blocks including the containing tags, with $i => $block doing a $key (not the same variable as in the code, by the way) => $value lookup of the values in the array nesting.

add_filter('the_content', function($content) {
    // Prepare a store for placeholders
    $placeholders = [];

    // 1. Extract pre blocks (including nested code) and replace with safe placeholders
    preg_match_all('//si', $content, $pre_matches);
    foreach ($pre_matches[0] as $i => $block) {
        $key = "§PREBLOCK{$i}§";
        $placeholders[$key] = $block;
        $content = str_replace($block, $key, $content);
    }

    // 2. Extract standalone code blocks (not inside pre)
    preg_match_all('/).*?<\/code>/si', $content, $code_matches);
    foreach ($code_matches[0] as $i => $block) {
        $key = "§CODEBLOCK{$i}§";
        $placeholders[$key] = $block;
        $content = str_replace($block, $key, $content);
    }

    // 3. Run Parsedown on the remaining content
    $Parsedown = new Parsedown();
    $content = $Parsedown->text($content);

    // 4. Restore both pre and code placeholders
    foreach ($placeholders as $key => $block) {
        $content = str_replace($key, $block, $content);
    }

    // 5. Apply paragraph formatting
    return wpautop($content);
}, 12);

All of this avoided dealing with extra plugins to produce the required result. Handily, I still use the Classic Editor, which makes this work a lot more easily. There still is a Markdown import plugin that I am tempted to remove as well to streamline things. That can wait, though. It best not add any more of them any way, not least avoid clashes between them and what is now in the theme.

AI infrastructure under pressure: Outages, power demands and the race for resilience

1st November 2025

The past few weeks brought a clear message from across the AI landscape: adoption is racing ahead, while the underlying infrastructure is working hard to keep up. A pair of major cloud outages in October offered a stark stress test, exposing just how deeply AI has become woven into daily services.

At the same time, there were significant shifts in hardware strategy, a wave of new tools for developers and creators and a changing playbook for how information is found online. There is progress on resilience and efficiency, yet the system is still bending under demand. Understanding where it held, where it creaked and where it is being reinforced sets the scene for what comes next.

Infrastructure Stress and Outages

The outages dominated early discussion. An AWS incident that lasted around 15 hours and disrupted more than a thousand services was followed nine days later by a global Azure failure. Each cascaded across systems that depend on them, illustrating how AI now amplifies the consequences of platform problems.

This was less about a single point of failure and more about the growing blast radius when connected services falter. The effect on productivity was visible too: a separate 10-hour ChatGPT downtime showed how fast outages of core AI tools now translate into lost work time.

Power Demand and Grid Strain

Behind the headlines sits a larger story about electricity, grids and planning. Data centres accounted for roughly 4% of US electricity use in 2024, about 183 TWh and the International Energy Agency projects around 945 TWh by 2030, with AI as a principal driver.

The averages conceal stark local effects. Wholesale prices near dense clusters have spiked by as much as 267% at times, household bills are rising by about $16–$18 per month in affected areas and capacity prices in the PJM market jumped from $28.92 per megawatt to $329.17. The US grid faces an upgrade bill of about $720 billion by 2030, yet permitting and build timelines are long, creating a bottleneck just as demand accelerates.

Technical Grid Issues

Technical realities on the grid add another layer of challenge. Fast load swings from AI clusters, harmonic distortions and degraded power quality are no longer theoretical concerns. A Virginia incident in which 60 data centres disconnected simultaneously did not trigger a collapse but did reveal the fragility introduced by concentrated high-performance compute.

Security and New Failure Modes

Security risks are evolving in parallel. Agentic systems that can plan, reason and call tools open new failure modes. AI-enabled spear phishing appears to be 350% more effective than traditional attempts and could be 50 times more profitable, a worrying backdrop when outages already have a clear link to lost productivity.

Security considerations now reach into the tools people use to access AI as well. New AI browsers attract attention, and with that comes scrutiny. OpenAI's Atlas and Perplexity's Comet launched with promising features, yet researchers flagged critical issues.

Comet is vulnerable to "CometJacking", a malicious URL hijack that enables data theft, while Atlas suffered a cross-site request forgery weakness that allowed persistent code injection into ChatGPT memory. Both products have been noted for assertive data collection.

Caution and good hygiene are prudent until the fixes and policies settle. It is a reminder that the convenience of integrating models directly into browsing comes with a new attack surface.

Efficiency and Mitigation Strategies

Industry responses are gathering pace. Efficiency remains the first lever. Hyperscalers now report power usage effectiveness around 1.08 to 1.09, compared with more typical figures of 1.5 to 1.6. Direct chip cooling can cut energy needs by up to 40%.

Grid-interactive operations and more work at the edge offer ways to smooth demand and reduce concentration risk, while new power partnerships hint at longer-term change. Microsoft's agreement with Constellation on nuclear power is one example of how compute providers are thinking beyond incremental efficiency gains.

An emerging pattern is becoming visible through these efforts. Proactive regional planning and rapid efficiency improvements could allow computational output to grow by an order of magnitude, while power use merely doubles. More distributed architectures are being explored to reduce the hazard of over-concentration.

A realistic outlook sets data centres at around 3% of global electricity use by 2030, which is notable but still smaller than anticipated growth from electric vehicles or air conditioning. If the $720 billion in grid investment materialises, it could add around 120 GW of capacity by 2030, as much as half of which would be absorbed by data centres. The resilience gap is real, but it appears to be narrowing, provided the sector moves quickly to apply lessons from each failure.

Regional and Policy Responses

Regional policies are starting to encourage resilience too. Oregon's POWER Act asks operators to contribute to grid robustness, Singapore's tight focus on efficiency has delivered around a 30% power reduction even as capacity expands and a moratorium in Dublin has pushed growth into more distributed build-outs. On the U.S. federal government side, the Department of Homeland Security updated frameworks after a 2024 watchdog warning, with AI risk programmes now in place for 15 of the 16 critical infrastructure sectors.

Hardware Competition and Strategy

Competition is sharpening. Anthropic deepened its partnership with Google Cloud to train on TPUs, a move that challenges Nvidia's dominance and signals a broader rebalancing in AI hardware. Nvidia's chief executive has acknowledged TPUs as robust competition.

Another fresh entry came from Extropic, which unveiled thermodynamic sampling units, a probabilistic chip design that claims up to 10,000-fold lower energy use than GPUs for AI workloads. Development kits are shipping and a Z-1 chip is planned for next year, yet as with any radical architecture, proof at scale will take time.

Nvidia, meanwhile, presented an ambitious outlook, targeting $500 billion in chip revenue by 2026 through its Blackwell and Rubin lines. The US Department of Energy plans seven supercomputers comprising more than 100,000 Blackwell GPUs and the company announced partnerships spanning pharmaceuticals, industrials and consumer platforms.

A $1 billion investment in Nokia hints at the importance of AI-centric networks. New open-source models and datasets accompanied the announcements, and the company's share price surged to a record.

Corporate Restructuring

Corporate strategy and hardware choices also entered a new phase. OpenAI completed its restructuring into a public benefit corporation, with a rebranded OpenAI Foundation holding around $130 billion in equity and allocating $25 billion to health and AI resilience. Microsoft's stake now sits at about 27% and is worth roughly $135 billion, with technology rights retained through 2032. Both parties have scope to work with other partners. OpenAI committed around $250 billion to Azure yet retains the ability to use other compute providers. An independent panel will verify claims of artificial general intelligence, an unusual governance step that will be watched closely.

Search and Discovery Evolution

Away from infrastructure, the way audiences find and trust information is shifting. Search is moving from the old aim of ranking for clicks to answer engine optimisation, where the goal is to be quoted by systems such as ChatGPT, Claude or Perplexity.

The numbers explain why. Google handled more than five trillion queries in 2024, while generative platforms now process around 37.5 million prompt-like searches per day. Google's AI Overviews, which surface summary answers above organic results, have reshaped click behaviour.

Independent analyses report top-ranking pages seeing click-through rates fall by roughly a third where Overviews appear, with some keywords faring worse, and a Pew study finds overall clicks on such results dropping from 15% to 8%. Zero-click searches rose from around 56% to 69% between May 2024 and May 2025.

Chegg's non-subscriber traffic fell by 49% in this period, part of an ongoing dispute with Google. Google counters that total engagement in covered queries has risen by about 10%. Whichever way that one reads the data, the direction is clear: visibility is less about rank position and more about being cited by a summarising engine.

In practice, that means structuring content, so a model can parse, trust and attribute it. Clear Q&A-style sections with direct answers, followed by context and cited evidence, help models extract usable statements. Schema markup for FAQs and how-to content improves machine readability.

Measuring success also changes. Traditional analytics rarely show when an LLM quotes a source, so teams are turning to tools that track citations in AI outputs and tying those to conversion quality, branded search volume and more in-depth engagement with pricing or documentation. It is not a replacement for SEO so much as a layer that reinforces it in an AI-first environment.

Developer Tools and Agentic Workflows

On the tools front, developers saw an acceleration in agent-centred workflows. Cursor launched its first in-house coding model, Composer, which aims for near-frontier quality while generating code around four times faster, often in under 30 seconds.

The broader Cursor 2.0 update added multi-agent capabilities, with as many as eight assistants able to work in parallel, alongside browsing, a test browser and voice controls. The direction of travel is away from single-shot completions and towards orchestration and review. Tutorials are following suit, demonstrating how to scaffold tasks such as a Next.js to-do application using planning files, parallel agent tasks and quick integration, with voice prompts in the loop.

Open-source and enterprise ecosystems continue to expand. GitHub introduced Agent HQ for coordinating coding agents, Google released Pomelli to generate marketing campaigns and IBM's Granite 4.0 Nano models brought larger on-device options in the 350 million to 1.5 billion parameter range.

FlowithOS reported strong scores on agentic web tasks, while Mozilla announced an open speech dataset initiative, and Kilo Code, Hailuo 2.3 and other projects broadened choice across coding and video. Grammarly rebranded as Superhuman, adding "Superhuman Go" agents to speed up writing tasks.

Creative Tools and Partnerships

Creative workflows are evolving quickly, too. Adobe used its MAX event to add AI assistants to Photoshop and Express, previewed an agent called Project Moonlight, and upgraded Firefly with conversational "Prompt to Edit" controls, custom image models and new video features including soundtracks and voiceovers. Partnerships mean Gemini, Veo and Imagen will sit inside Adobe tools, and Premiere's editing capabilities now extend to YouTube Shorts.

Figma acquired Weavy and rebranded it as Figma Weave for richer creative collaboration, and Canva unveiled its own foundation "Design Model" alongside a Creative Operating System meant to produce fully editable, AI-generated designs. New Canva features take in a revised video suite, forms, data connectors, email design, a 3D generator and an ad creation and performance tool called Grow, while Affinity is relaunching as a free, integrated professional app. Other entrants are trying to blend model strengths: one agent was trailed with Sora 2 clip stitching, Veo 3.1 visuals and multimodel blending for faster design output.

Music rights and AI found a new footing. Universal Music Group settled a lawsuit with Udio, the AI music generator, and the two will form a joint venture to launch a licensed platform in 2026. Artists who opt in will be paid both for training models on their catalogues and for remixes. Udio disabled song downloads following the deal, which annoyed some users, and UMG also announced a "responsible AI" alliance with Stability AI to build tools for artists. These arrangements suggest a path towards sanctioned use of style and catalogue, with compensation built in from the start.

Research and Introspection

Research and science updates added depth. Anthropic reported that its Claude system shows limited introspection, detecting planted concepts only about 20% of the time, separating injected "thoughts" from text and modulating its internal focus. That highlights both the promise and limits of transparency techniques, and the potential for models to conceal or fail to surface certain internal states.

UC Berkeley researchers demonstrated an AI-driven load balancing algorithm with around 30% efficiency improvements, a result that could ripple through cloud performance. IBM ran quantum algorithms on AMD FPGAs, pointing to progress in hybrid quantum-classical systems.

OpenAI launched an AI-integrated web browser positioned as a challenger to incumbents, Perplexity released a natural-language patents search and OpenAI's Aardvark, a GPT-5-based security agent, entered private beta.

Anthropic opened a Tokyo office and signed a cooperation pact with Japan's AI Safety Institute. Tether released QVAC Genesis I, a large open STEM dataset of more than one million data points and a local workbench app aimed at making development more private and less dependent on big platforms.

Age Restrictions and Policy

Meanwhile, policy considerations are reaching consumer platforms. Character AI will restrict users under 18 from open-ended chatbot conversations from late November, replacing them with creative tools and adding behaviour-based age detection, a response to pressure and proposals such as the GUARD Act.

Takeaways

Put together, the picture is one of rapid interdependence and swift correction. The infrastructure is not breaking, but it is being stretched, and recent failures have usefully mapped the weak points. If the sector continues to learn quickly from its own missteps, the resilience gap will continue to narrow, and the next round of outages will be less disruptive than the last.

Investment is flowing into grids and cooling, policy is nudging towards resilience, and compute providers are hedging hardware bets by searching for efficiency and supply assurance. On the application layer, agents are becoming a primary interface for work, creative tools are converging around editability and control, and discovery is shifting towards being quoted by machines rather than clicked by humans.

Security lapses at the interface are a reminder that novelty often arrives before maturity. The most likely path from here is uneven but forward: data centre power may rise, yet efficiency and distribution can blunt the impact; answer engines may compress clicks, yet they can send higher intent visitors to clear, well-structured sources; hardware competition may fragment the stack, yet it can also reduce concentration risk.

Generating Git commit messages automatically using aicommit and OpenAI

25th October 2025

One of the overheads of using version control systems like Subversion or Git is the need to create descriptive messages for each revision. Now that GitHub has its copilot, it now generates those messages for you. However, that still leaves anyone with a local git repository out in the cold, even if you are uploading to GitHub as your remote repo.

One thing that a Homebrew update does is to highlight other packages that are available, which is how I got to learn of a tool that helps with this, aicommit. Installing is just a simple command away:

brew install aicommit

Once that is complete, you now have a tool that generates messages describing very commit using GPT. For it to work, you do need to get yourself set up with OpenAI's API services and generate a token that you can use. That needs an environment variable to be set to make it available. On Linux (and Mac), this works:

export OPENAI_API_KEY=<Enter the API token here, without the brackets>

Because I use this API for Python scripting, that part was already in place. Thus, I could proceed to the next stage: inserting it into my workflow. For the sake of added automation, this uses shell scripting on my machines. The basis sequence is this:

git add .
git commit -m "<a default message>"
git push

The first line above stages everything while the second commits the files with an associated message (git makes this mandatory, much like Subversion) and the third pushes the files into the GitHub repository. Fitting in aicommit then changes the above to this:

git add .
aicommit
git push

There is now no need to define a message because aicommit does that for you, saving some effort. However, token limitations on the OpenAI side mean that the aicommit command can fail, causing the update operation to abort. Thus, it is safer to catch that situation using the following code:

git add .
if ! aicommit 2>/dev/null; then
    echo "  aicommit failed, using fallback"
    git commit -m "<a default message>"
fi
git push

This now informs me what has happened when the AI option is overloaded and the scripts fallback to a default option that is always available with git. While there is more to my git scripting than this, the snippets included here should get across how things can work. They go well for small push operations, which is what happens most of the time; usually, I do not attempt more than that.

Loading API Keys from Linux shell environment variables in Python with Dotenv

23rd October 2025

Recently, I ran into trouble with getting Python to pick up an API key that I had defined in the underlying bash environment. This was within a Python console running inside the Positron IDE for R and Python scripting. Opening up the folder containing my Python scripts within the IDE was part of the solution. The next part was creating a .env file within the same folder. A line like this was added within the new file:

export API_KEY="<API key value>"

That meant that code like the following then read in the API key in a more robust manner:

import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv('API_KEY', 'default_value')

This imports the os module and the load_dotenv method from the dotenv package. Then, load_dotenv is executed to load the .env file and its contents. After that, the os.getenv function can assign the API key to a Python variable from the value of the environment variable.

Since this also was within a Git repository, a .gitignore file needed creating with the contents .env to avoid that file being uploaded to GitHub, which is the last place where you should be storing credentials like passwords, passphrases and API keys. While my repository may be private, the state of things at these troubled times mean that even that is no failsafe.

Some Data Science newsletters that may be worth your time

19th October 2025

Staying informed about developments in data science and artificial intelligence without drowning in an endless stream of blog posts, research papers and tool announcements presents a genuine challenge for practitioners. The newsletters profiled below offer a solution to this problem by delivering curated digests at weekly or near-weekly intervals, filtering what matters from the constant flow of new content across the field. Each publication serves a distinct purpose, from broad data science coverage and community event notifications to AI business strategy and statistical foundations, allowing readers to select resources that match their specific interests, whether technical depth, practical application, career development or strategic awareness. What follows examines what each newsletter offers, who benefits most from subscribing, and what limitations or trade-offs readers should consider when choosing which digests merit a place in their inbox.

Data Elixir

Launched in 2014 by Lon Reisberg, this newsletter distinguishes itself through expert curation with minimal hype. It maintains strong editorial consistency and neutrality, presenting a handful of carefully selected articles that genuinely matter rather than overwhelming subscribers with dozens of links. The free version delivers this curated digest, whilst the Pro tier (fifty dollars annually) offers searchable archives spanning over 250 issues back to 2019, plus AI-powered learning tools including a SQL tutor and interview coach. The newsletter's defining characteristic is its quality-over-quantity approach, serving professionals who trust expert curation to surface what is genuinely important without the noise and hype that characterises many industry publications.

Data Science Weekly Newsletter

One of the oldest independent data science newsletters, having published over 400 issues since 2014, this publication sets itself apart through longevity and unwavering consistency. It delivers every Thursday without fail, maintaining a simple, distraction-free format with no over-commercialisation or fluff. Its unique value lies in this dependability, with subscribers knowing exactly what to expect each week, making it a practical baseline for staying current without surprises or dramatic shifts in editorial direction.

DataTalks.club

Unlike newsletters that simply curate external content, this publication builds its own ecosystem of learning resources, offering something fundamentally different through its open, community-driven approach. It combines free courses (Zoomcamps), events and a supportive Slack community, with all materials publicly available on GitHub. The newsletter keeps members informed about upcoming cohorts, webinars and talks within this collaborative environment. The defining feature is its entirely open and peer-supported approach, where readers gain access not just to information, but to hands-on learning opportunities and a community of practitioners willing to help each other grow.

KDnuggets

Founded in 1997 by Gregory Piatetsky-Shapiro, this publication stands apart through industry authority spanning nearly three decades. It holds unmatched credibility through its longevity and comprehensive coverage, known for its annual software polls, data science career resources and balanced mix of expert articles, surveys and tool trends that appeal equally to technical practitioners and managers seeking a global overview of the field. What sets it apart is this authoritative position, with few publications able to match its track record or breadth of influence across both technical and strategic aspects of data science and AI.

ODSC AI

Connected to the Open Data Science Conference network, this newsletter distinguishes itself as the gateway to the global data science event ecosystem. It serves as the practitioner's bridge to events, training, webinars and conferences worldwide. It covers the full stack, from tutorials and research to business use cases and career advice, but its distinctive strength lies in connecting readers to the broader data science community through live events and practical learning opportunities. The defining characteristic is this conference-linked, community-rich approach, proving especially valuable for professionals who want to remain active participants in the field rather than passive consumers of content.

Statology

Maintaining a unique position by focusing entirely on statistical foundations, Whilst most data science newsletters chase the latest AI developments, it maintains an unwavering focus on statistics and foundational analysis, providing step-by-step tutorials for Excel, R and Python that emphasise statistical intuition over trendy techniques. This singular focus on fundamentals makes it unique, serving as an essential complement to AI-focused newsletters and helping readers build the statistical knowledge base that underpins sound data science practice.

The AI Report

Created by the makers of KDnuggets, this digital newsletter and media platform carves out a distinctive niche with business-focused AI news for non-technical leaders. It curates AI developments specifically for executives and decision-makers, emphasising practical, non-technical insights about tools, regulations and market moves, backed by an AI tool database and a claimed community of over 400,000 subscribers. What sets it apart is this strategic, implementation-focused perspective, concentrating on what AI means for business strategy rather than explaining how AI works, making it accessible to leaders without deep technical backgrounds.

The Batch

Published weekly by DeepLearning.AI, co-founded by Andrew Ng, this newsletter offers trusted commentary that combines AI news with insightful analysis. Written by leading experts, it provides a balanced view that merges academic grounding with applied, real-world context. The distinguishing feature is this authoritative perspective on implications, helping engineers, product teams and business leaders understand why developments matter and how to think about their practical impact rather than simply reporting what happened.

 

Managing Python projects with Poetry

4th October 2025

Python Poetry has become a popular choice for managing Python projects because it unifies tasks that once required several tools. Instead of juggling pip for installation, virtualenv for isolation and setuptools for packaging, Poetry brings these strands together and aims to make everyday development feel predictable and tidy. It sits in the same family of all-in-one managers as npm for JavaScript and Cargo for Rust, offering a coherent workflow that spans dependency declaration, environment management and package publishing.

At the heart of Poetry is a simple idea: declare what a project needs in one place and let the tool do the orchestration. Projects describe their dependencies, development tools and metadata in a single configuration file, and Poetry ensures that what is installed on one machine can be replicated on another without nasty surprises. That reliability comes from the presence of a lock file. Once dependencies are resolved, their exact versions are recorded, so future installations repeat the same outcome. The intent here is not only convenience but determinism, helping teams avoid the "works on my machine" refrain that haunts software work.

Core Concepts: Configuration and Lock Files

Two files do the heavy lifting. The pyproject.toml file is where a project announces its name, version and description, as well as the dependencies required to run and to develop it. The poetry.lock file captures the concrete resolution of those requirements at a particular moment. Together, they give you an auditable, repeatable picture of your environment. The structure of TOML keeps the configuration readable, and it spares developers from spreading equivalent settings across setup.cfg, setup.py and requirements.txt. A minimal example shows how this looks in practice.

[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "Example project using Poetry"
authors = ["John <john@example.com>"]

[tool.poetry.dependencies]
python = "^3.10"
requests = "^2.31.0"

[tool.poetry.dev-dependencies]
pytest = "^8.0.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Essential Commands

Working with Poetry day to day quickly becomes a matter of a few memorable commands. Initialising a project configuration starts with poetry init, which steps through the creation of pyproject.toml interactively. Adding a dependency is handled by poetry add followed by the package name. Installing everything described in the configuration is done with poetry install, which writes or updates the lock file. When it is time to refresh dependencies within permitted version ranges, poetry update re-resolves and updates what's installed. Removing a dependency is poetry remove, followed by the package name. For environment management, poetry shell opens a shell inside the virtual environment managed by Poetry, and poetry run allows execution of commands within that same environment without entering a shell. Building distributions is as simple as poetry build, which produces a wheel and a source archive, and publishing to the Python Package Index is managed by poetry publish with credentials or an API token.

Advantages and Considerations

There are clear advantages to taking this route. The dependency experience is simplified because you do not need to keep updating a requirements.txt file by hand. With a lock file in place, environments are reproducible across developer machines and continuous integration runners, which stabilises builds and testing. Packaging is integrated rather than an extra chore, so producing and publishing a release becomes a repeatable process that sits naturally alongside development. Virtual environments are created and activated on demand, keeping projects isolated from one another with little ceremony. The configuration in TOML has the benefit of being structured and human-readable, which reduces the likelihood of configuration drift.

There are also points to consider before adopting Poetry. Projects that are deeply invested in setup.py or complex legacy build pipelines may need a clean migration to pyproject.toml for avoiding clashes. Developers who prefer manual venv and pip workflows can find Poetry opinionated at first because it expects to be responsible for the environment and dependency resolution. It is also designed with modern Python versions in mind, with examples here using Python 3.10.

Migration from pip and requirements.txt

For teams arriving from pip and requirements.txt, moving to Poetry can be done in measured steps. The starting point is installation. Poetry provides an installer script that sets up the tool for your user account.

curl -sSL https://install.python-poetry.org | python3 -

If the installer does not add Poetry to your PATH, adding $HOME/.local/bin to PATH resolves that, after which poetry --version confirms the installation. From the root of your existing project, poetry init creates a new pyproject.toml and invites you to provide metadata and dependencies. If you already maintain requirements.txt files for production and development dependencies, Poetry can ingest those in one sweep. A single file can be imported with poetry add $(cat requirements.txt). Where development dependencies live in a separate file, they can be added into Poetry's dev group with poetry add --group dev $(cat dev-requirements.txt). Once added, Poetry resolves and pins exact versions, leaving a lock file behind to capture the resolution. After verifying that everything installs and tests pass, it becomes safe to retire earlier environment artefacts. Many teams remove requirements.txt entirely if they plan to rely solely on Poetry, deleting any remnants of Pipfile and Pipfile.lock that were left by Pipenv and migrate metadata away from setup.py or setup.cfg in favour of pyproject.toml. With that done, using the environment becomes routine. Opening a shell inside the virtual environment with poetry shell makes commands such as python or pytest use the isolated interpreter. If you prefer to avoid entering a shell, poetry run python script.py or poetry run pytest executes the command in the right context.

Package Publishing

Publishing a package is one of the areas where Poetry streamlines the steps. Accurate metadata in pyproject.toml is important, so name, version, description and other fields should be up-to-date. An example configuration shows commonly used fields.

[tool.poetry]
name = "example-package"
version = "1.0.0"
description = "A simple example package"
authors = ["John <john@example.com>"]
license = "MIT"
readme = "README.md"
homepage = "https://github.com/john/example-package"
repository = "https://github.com/john/example-package"
keywords = ["example", "poetry"]

With metadata set, building the distribution is handled by poetry build, which creates a dist directory containing a .tar.gz source archive and a .whl wheel file. Uploading to the official Python Package Index can be done with username and password, though API tokens are the recommended method because they can be scoped and revoked without affecting account credentials. Configuring a token is done once with poetry config pypi-token.pypi, after which poetry publish will use it to upload. When testing a release before publishing for real, TestPyPI provides a safer target. Poetry supports multiple sources and can be directed to use TestPyPI by declaring it as a repository and then publishing to it.

[[tool.poetry.source]]
name = "testpypi"
url = "https://test.pypi.org/legacy/"
poetry publish -r testpypi

Once uploaded, it is sensible to confirm that the package can be installed in a clean environment using pip install example-package, which verifies that dependencies are correctly declared and wheels are intact.

Continuous Integration with GitHub Actions

Beyond local steps, automation closes the loop. Adding a continuous integration workflow that installs dependencies, runs tests and publishes on a tagged release keeps quality checks and distribution consistent. GitHub Actions provides a hosted environment where Poetry can be installed quickly, dependencies cached and tests executed. A straightforward workflow listens for tags that begin with v, such as v1.0.0, then builds and publishes the package once tests pass. The workflow file sits under .github/workflows and looks like this.

name: Publish to PyPI

on:
  push:
    tags:
      - "v*"

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.10"

      - name: Install Poetry
        run: |
          curl -sSL https://install.python-poetry.org | python3 -
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Install dependencies
        run: poetry install --no-interaction --no-root

      - name: Run tests with pytest
        run: poetry run pytest --maxfail=1 --disable-warnings -q

      - name: Build package
        run: poetry build

      - name: Publish to PyPI
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
        run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI

This arrangement checks out the repository, installs a consistent Python version, brings in Poetry, installs dependencies based on the lock file, runs tests, builds distributions and only publishes when the workflow is triggered by a version tag. The API token used for publishing should be stored as a repository secret named PYPI_TOKEN so it is not exposed in the codebase or logs. Creating the tag is done locally with git tag v1.0.0 followed by git push origin v1.0.0, which triggers the workflow and results in a published package, moments later. It is often useful to extend this with a test matrix, so the suite runs across supported Python versions, as well as caching to speed up repeated runs by re-using Poetry and pip caches keyed on the lock file.

Project Structure

Package structure is another place where Poetry encourages clarity. A simple, consistent layout makes maintenance and onboarding easier. A typical library keeps its importable code in a package directory named to match the project name in pyproject.toml, with hyphens translated to underscores. Tests live in a separate tests directory, documentation in docs and examples in a directory of the same name. The repository root contains README.md, a licence file, the lock file and a .gitignore that excludes environment directories and build artefacts. The following tree illustrates a balanced structure for a data-oriented utility library.

data-utils/
├── data_utils/
│   ├── __init__.py
│   ├── core.py
│   ├── io.py
│   ├── analysis.py
│   └── cli.py
├── tests/
│   ├── __init__.py
│   ├── test_core.py
│   └── test_analysis.py
├── docs/
│   ├── index.md
│   └── usage.md
├── examples/
│   └── demo.ipynb
├── README.md
├── LICENSE
├── pyproject.toml
├── poetry.lock
└── .gitignore

Within the package directory, init.py can define a public interface and hide internal details. This allows users of the library to import the essentials without needing to know the module layout.

from .core import clean_data
from .analysis import summarise_data

__all__ = ["clean_data", "summarise_data"]

If the project offers a command-line interface, Poetry makes it simple to declare an entry point, so users can run a console command after installation. The scripts section in pyproject.toml maps a command name to a callable, in this case the main function in a cli module.

[tool.poetry.scripts]
data-utils = "data_utils.cli:main"

A basic CLI might be implemented using Click, passing arguments to internal functions and relaying progress.

import click
from data_utils import core

@click.command()
@click.argument("path")
def main(path):
    """Simple CLI example."""
    print(f"Processing {path}...")
    core.clean_data(path)
    print("Done!")

if __name__ == "__main__":
    main()

Git ignores should filter out files that do not belong in version control. A sensible default for a Poetry project is as follows.

__pycache__/
*.pyc
*.pyo
*.pyd
.env
.venv
dist/
build/
*.egg-info/
.cache/
.coverage

- Testing and Documentation

Testing sits comfortably alongside this. Many projects adopt pytest because it is straightforward to use and integrates well with Poetry. Running tests through poetry run pytest ensures the virtual environment is used, and a simple unit test demonstrates the pattern.

from data_utils.core import clean_data

def test_clean_data_removes_nulls():
    data = [1, None, 2, None, 3]
    cleaned = clean_data(data)
    assert cleaned == [1, 2, 3]

Documentation can be kept in Markdown or built with tools. MkDocs and Sphinx are common choices for generating websites from your docs, and both can be installed as development dependencies using Poetry. Including notebooks in an examples directory is helpful for illustrating usage in richer contexts, especially for data science libraries. The README should present the essentials succinctly, covering what the project does, how to install it, a short usage example and pointers for development setup. A licence file clarifies terms of use; MIT and Apache 2.0 are widely used options in open source. Advanced CI: Quality Checks and Multi-version Testing

Once structure, tests and documentation are in order, quality checks can be expanded in the continuous integration workflow. Adding automated formatting, import sorting and linting tightens consistency across contributions. An enhanced workflow uses Black, isort and Flake8 before running tests and building, and also includes a matrix to test across multiple Python versions. It runs on pull requests as well as on tagged pushes, which means code quality and compatibility are verified before merging changes and again before publishing a release.

name: Lint, Test and Publish

on:
  push:
    tags:
      - "v*"
  pull_request:

jobs:
  build:
    runs-on: ubuntu-latest

    strategy:
      matrix:
        python-version: ["3.9", "3.10", "3.11"]

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}

      - name: Install Poetry
        run: |
          curl -sSL https://install.python-poetry.org | python3 -
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Cache Poetry dependencies
        uses: actions/cache@v4
        with:
          path: |
            ~/.cache/pypoetry
            ~/.cache/pip
          key: poetry-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
          restore-keys: |
            poetry-${{ runner.os }}-

      - name: Install dependencies
        run: poetry install --no-interaction --no-root

      - name: Check code formatting with Black
        run: poetry run black --check .

      - name: Check import order with isort
        run: poetry run isort --check-only .

      - name: Run Flake8 linting
        run: poetry run flake8 .

      - name: Run tests with pytest
        run: poetry run pytest --maxfail=1 --disable-warnings -q

      - name: Build package
        run: poetry build

      - name: Publish to PyPI
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
        run: poetry publish --no-interaction --username __token__ --password $POETRY_PYPI_TOKEN_PYPI

This workflow builds on the earlier one by checking style and formatting before tests. If any of those checks fail, the process stops and surfaces the problems in the job logs. Caching based on the lock file reduces the time spent installing dependencies by reusing packages where nothing has changed. The matrix section ensures that the library remains compatible with the declared range of Python versions, which is especially helpful just before a release. It is possible to extend this further with coverage reports using pytest-cov and Codecov, static type checking with mypy, or pre-commit hooks to keep local development consistent with continuous integration. Publishing to TestPyPI in a separate job can help validate packaging without affecting the real index, and once outcomes look good, the main publishing step proceeds when a tag is pushed.

Conclusion

The result of adopting Poetry is a project that states its requirements clearly, installs them reliably and produces distributions without ceremony. For new work, it removes much of the friction that once accompanied Python packaging. For existing projects, the migration path is gentle and reversible, and the gains in determinism often show up quickly in fewer environment-related issues. When paired with a small amount of automation in a continuous integration system, the routine of building, testing and publishing becomes repeatable and visible to everyone on the team. That holds whether the package is destined for internal use on a private index or a public release on PyPI.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.