Technology Tales

Notes drawn from experiences in consumer and enterprise technology

Preventing authentication credentials from entering Git repositories

6th February 2026

Keeping credentials out of version control is a fundamental security practice that prevents numerous problems before they occur. Once secrets enter Git history, remediation becomes complex and cannot guarantee complete removal. Understanding how to structure projects, configure tooling, and establish workflows that prevent credential commits is essential for maintaining secure development practices.

Understanding What Needs Protection

Credentials come in many forms across different technology stacks. Database passwords, API keys, authentication tokens, encryption keys, and service account credentials all represent sensitive data that should never be committed. Configuration files containing these secrets vary by platform but share common characteristics: they hold information that would allow unauthorised access to systems, data, or services.

# This should never be in Git
database:
  password: secretpassword123
api:
  key: sk_live_abc123def456
# Neither should this
DB_PASSWORD=secretpassword123
API_KEY=sk_live_abc123def456
JWT_SECRET=mysecretkey

Even hashed passwords require careful consideration. Whilst bcrypt hashes with appropriate cost factors (10 or higher, requiring approximately 65 milliseconds per computation) provide protection against immediate exploitation, they still represent sensitive data. The hash format includes a version identifier, cost factor, salt, and hash components that could be targeted by offline attacks if exposed. Plaintext credentials offer no protection whatsoever and represent immediate critical exposure.

Establishing Git Ignore Rules from the Start

The foundation of credential protection is proper .gitignore configuration established before any sensitive files are created. This proactive approach prevents problems rather than requiring remediation after discovery. Begin every project by identifying which files will contain secrets and excluding them immediately.

# Credentials and secrets
.env
.env.*
!.env.example
config/secrets.yml
config/database.yml
config/credentials/

# Application-specific sensitive files
wp-config.php
settings.php
configuration.php
appsettings.Production.json
application-production.properties

# User data and session storage
/storage/credentials/
/var/sessions/
/data/users/

# Keys and certificates
*.key
*.pem
*.p12
*.pfx
!public.key

# Cache and logs that might leak data
/cache/
/logs/
/tmp/
*.log

The negation pattern !.env.example demonstrates an important technique: excluding all .env files whilst explicitly including example files that show structure without containing secrets. This pattern ensures that developers understand what configuration is needed without exposing actual credentials.

Notice the broad exclusions for entire categories rather than specific files. Excluding *.key prevents any private key files from being committed, whilst !public.key allows the explicit inclusion of public keys that are safe to share. This defence-in-depth approach catches variations and edge cases that specific file exclusions might miss.

Separating Examples from Actual Configuration

Version control should contain example configurations that demonstrate structure without exposing secrets. Create .example or .sample files that show developers what configuration is required, whilst keeping actual credentials out of Git entirely.

# config/secrets.example.yml
database:
  host: localhost
  username: app_user
  password: REPLACE_WITH_DATABASE_PASSWORD

api:
  endpoint: https://api.example.com
  key: REPLACE_WITH_API_KEY
  secret: REPLACE_WITH_API_SECRET

encryption:
  key: REPLACE_WITH_32_BYTE_ENCRYPTION_KEY

Documentation should explain where developers obtain the actual values. For local development, this might involve running setup scripts that generate credentials. For production, it involves deployment processes that inject secrets from secure storage. The example file serves as a template and checklist, ensuring nothing is forgotten whilst preventing accidental commits of real secrets.

Using Environment Variables

Environment variables provide a standard mechanism for separating configuration from code. Applications read credentials from the environment rather than from files tracked in Git. This pattern works across virtually all platforms and languages.

// Instead of hardcoding
$db_password = 'secretpassword123';

// Read from environment
$db_password = getenv('DB_PASSWORD');
// Instead of requiring a config file with secrets
const apiKey = 'sk_live_abc123def456';

// Read from environment
const apiKey = process.env.API_KEY;

Environment files (.env) provide convenience for local development, but must be excluded from Git. The pattern of .env for actual secrets and .env.example for structure becomes standard across many frameworks. Developers copy the example to create their local configuration, filling in actual values that never leave their machine.

Implementing Pre-Commit Hooks

Pre-commit hooks provide automated checking before changes enter the repository. These hooks scan staged files for patterns that match secrets and block commits when suspicious content is detected. This automated enforcement prevents mistakes that manual review might miss.

The pre-commit framework manages hooks across multiple repositories and languages. Installation is straightforward, and configuration defines which checks run before each commit.

pip install pre-commit

Create a configuration file defining which hooks to run:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: check-added-large-files
      - id: check-json
      - id: check-yaml
      - id: detect-private-key
      - id: end-of-file-fixer
      - id: trailing-whitespace

  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

Install the hooks in your repository:

pre-commit install

Now every commit triggers these checks automatically. The detect-private-key hook catches SSH keys and other private key formats. The detect-secrets hook uses entropy analysis and pattern matching to identify potential credentials. When suspicious content is detected, the commit is blocked and the developer is alerted to review the flagged content.

Configuring Git-Secrets

The git-secrets tool from AWS specifically targets secret detection. It scans commits, commit messages, and merges to prevent credentials from entering repositories. Installation and configuration establish patterns that identify secrets.

# Install via Homebrew
brew install git-secrets

# Install hooks in a repository
cd /path/to/repo
git secrets --install

# Register AWS patterns
git secrets --register-aws

# Add custom patterns
git secrets --add 'password\s*=\s*["\047][^\s]+'
git secrets --add 'api[_-]?key\s*=\s*["\047][^\s]+'

The tool maintains a list of prohibited patterns and scans all content before allowing commits. Custom patterns can be added to match organisation-specific secret formats. The --register-aws command adds patterns for AWS access keys and secret keys, whilst custom patterns catch application-specific credential formats.

For teams, establishing git-secrets across all repositories ensures consistent protection. Template directories provide a mechanism for automatic installation in new repositories:

# Create a template with git-secrets installed
git secrets --install ~/.git-templates/git-secrets

# Use the template for all new repositories
git config --global init.templateDir ~/.git-templates/git-secrets

Now, every git init automatically includes secret scanning hooks.

Enabling GitHub Secret Scanning

GitHub Secret Scanning provides server-side protection that cannot be bypassed by local configuration. GitHub automatically scans repositories for known secret patterns and alerts repository administrators when matches are detected. This works for both new commits and historical content.

For public repositories, secret scanning is enabled by default. For private repositories, it requires GitHub Advanced Security. Enable it through repository settings under Security & Analysis. GitHub maintains partnerships with service providers to detect their specific secret formats, and when a partner pattern is found, both you and the service provider are notified.

The scanning covers not just code but also issues, pull requests, discussions, and wiki content. This comprehensive approach catches secrets that might be accidentally pasted into comments or documentation. The detection happens continuously, so even old content gets scanned when new patterns are added.

Custom patterns extend detection to organisation-specific secret formats. Define regular expressions that match your internal API key formats, authentication tokens, or other proprietary credentials. These custom patterns apply across all repositories in your organisation, providing consistent protection.

Structuring Projects for Credential Isolation

Project structure itself can prevent credentials from accidentally entering Git. Establish clear separation between code that belongs in version control and configuration that remains environment-specific. Create dedicated directories for credentials and ensure they are excluded from tracking.

project/
├── src/                    # Code - belongs in Git
├── tests/                  # Tests - belongs in Git
├── config/
│   ├── app.yml            # General config - belongs in Git
│   ├── secrets.example.yml # Example - belongs in Git
│   └── secrets.yml        # Actual secrets - excluded from Git
├── credentials/           # Entire directory excluded
│   ├── database.yml
│   └── api-keys.json
├── .env.example           # Example - belongs in Git
├── .env                   # Actual secrets - excluded from Git
└── .gitignore             # Defines exclusions

This structure makes it obvious which files contain secrets. The credentials/ directory is clearly separated from source code, and its exclusion from Git is explicit. Developers can see at a glance that this directory requires different handling.

Documentation should explain the structure and the reasoning behind it. New team members need to understand why certain directories are empty in their fresh clones and where to obtain the configuration files that populate them. Clear documentation prevents confusion and ensures everyone follows the same patterns.

Managing Development Credentials

Development environments require credentials but should never use production secrets. Generate separate development credentials that provide access to development resources only. These credentials can be less stringently protected whilst still not being committed to Git.

Development credential management varies by organisation size and infrastructure. For small teams, shared development credentials stored in a team password manager might suffice. For larger organisations, each developer receives individual credentials for development resources, with access controlled through identity management systems.

Some teams commit development credentials intentionally, arguing that development databases contain no sensitive data and convenience outweighs risk. This approach is controversial and depends on your security model. If development credentials can access any production resources or if development data has any sensitivity, they must be protected. Even purely synthetic development data might reveal business logic or system architecture worth protecting.

The safer approach maintains the same credential handling patterns across all environments. This ensures that developers build habits that prevent production credential exposure. When development and production follow identical patterns, muscle memory built during development prevents mistakes in production.

Provisioning Production Credentials

Production credentials should never touch developer machines or version control. Deployment processes inject credentials at runtime through environment variables, secret management services, or deployment-time configuration.

Continuous deployment pipelines read credentials from secret stores and make them available to applications without exposing them to humans. GitHub Actions, GitLab CI, Jenkins, and other CI/CD systems provide secure variable storage that is injected during builds and deployments.

# .github/workflows/deploy.yml
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Deploy to production
        env:
          DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
          API_KEY: ${{ secrets.API_KEY }}
        run: |
          ./deploy.sh

The secrets.DB_PASSWORD syntax references encrypted secrets stored in GitHub's secure storage. These values are never exposed in logs or visible to anyone except during the deployment process. The deployment script receives them as environment variables and can configure the application appropriately.

Secret management services like HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager provide centralised credential storage with access controls, audit logging, and automatic rotation. Applications authenticate to these services and retrieve credentials at runtime, ensuring that secrets are never stored on disk or in environment files.

Rotating Credentials Regularly

Regular credential rotation limits exposure duration if secrets are compromised. Establish rotation schedules based on credential sensitivity and access patterns. Database passwords might rotate quarterly, API keys monthly, and authentication tokens weekly or daily. Automated rotation reduces operational burden and ensures consistency.

Rotation requires coordination between secret generation, distribution, and application updates. Secret management services can automate much of this process, generating new credentials, updating secure storage, and triggering application reloads. Manual rotation involves generating new credentials, updating all systems that use them, and verifying functionality before disabling old credentials.

The rotation schedule balances security against operational complexity. More frequent rotation provides better security but increases the risk of service disruption if processes fail. Less frequent rotation simplifies operations but extends exposure windows. Find the balance that matches your risk tolerance and operational capabilities.

Training and Culture

Technical controls provide necessary guardrails, but security ultimately depends on people understanding why credentials matter and how to protect them. Training should cover the business impact of credential exposure, the techniques for keeping secrets out of Git, and the procedures for responding if mistakes occur.

New developer onboarding should include credential management as a core topic. Before developers commit their first code, they should understand what constitutes a secret, why it must stay out of Git, and how to configure their local environment properly. This prevents problems rather than correcting them after they occur.

Regular security reminders reinforce good practices. When new secret types are introduced or new tools are adopted, communicate the changes and update documentation. Security reviews should check credential handling practices, not just looking for exposed secrets, but also verifying that proper patterns are followed.

Creating a culture where admitting mistakes is safe encourages early reporting. If a developer accidentally commits a credential, they should feel comfortable immediately alerting the security team, rather than hoping no one notices. Early detection enables faster response and reduces damage.

Responding to Detection

Despite best efforts, secrets sometimes enter repositories. Rapid response limits damage. Immediate credential rotation assumes compromise and prevents exploitation. Removing the file from future commits whilst leaving it in history provides no security benefit, as the exposure has already occurred.

Tools like BFG Repo-Cleaner can remove secrets from Git history, but this is complex and cannot guarantee complete removal. Anyone who cloned the repository before clean-up retains the compromised credentials in their local copy. Forks, clones on other systems, and backup copies may all contain the secrets.

The most reliable response is assuming the credential is compromised and rotating it immediately. History clean-up can follow as a secondary measure to reduce ongoing exposure, but it should never be the primary response. Treat any secret that entered Git as if it were publicly posted because once in Git history, it effectively was.

Continuous Improvement

Credential management practices should evolve with your infrastructure and team. Regular reviews identify gaps and opportunities for improvement. When new credential types are introduced, update .gitignore patterns, secret scanning rules, and documentation. When new developers join, gather feedback on clarity and completeness of onboarding materials.

Metrics help track effectiveness. Monitor secret scanning alerts, track rotation compliance, and measure time-to-rotation when credentials are exposed. These metrics identify areas needing improvement and demonstrate progress over time.

Summary

Preventing credentials from entering Git repositories requires multiple complementary approaches. Establish comprehensive .gitignore configurations before creating any credential files. Separate example configurations from actual secrets, keeping only examples in version control. Use environment variables to inject credentials at runtime rather than storing them in configuration files. Implement pre-commit hooks and server-side scanning to catch mistakes before they enter history. Structure projects to clearly separate code from credentials, making it obvious what belongs in Git and what does not.

Train developers on credential management and create a culture where security is everyone's responsibility. Provision production credentials through deployment processes and secret management services, ensuring they never touch developer machines or version control. Rotate credentials regularly to limit exposure windows. Respond rapidly when secrets are detected, assuming compromise and rotating immediately.

Security is not a one-time configuration but an ongoing practice. Regular reviews, continuous improvement, and adaptation to new threats and technologies keep credential management effective. The investment in prevention is far less than the cost of responding to exposed credentials, making it essential to get right from the beginning.

Related Resources

When Operations and Machine Learning meet

5th February 2026

Here's a scenario you'll recognise: your SRE team drowns in 1,000 alerts daily. 95% are false positives. Meanwhile, your data scientists built five ML models last quarter, and none have reached production. These problems are colliding, and solving each other. Machine learning is moving out of research labs and into the operations that keep your systems running. At the same time, DevOps practices are being adapted to get ML models into production reliably. Since this convergence has created three new disciplines (AIOps, MLOps and LLM observability), here is what you need to know.

Why Traditional Operations Can't Keep Up

Modern systems generate unprecedented volumes of operational data. Logs, metrics, traces, events and user interaction signals create a continuous stream that's too large and too fast for manual analysis.

Your monitoring system might send thousands of alerts per day, but most are noise. A CPU spike in one microservice cascades into downstream latency warnings, database connection errors and end-user timeouts, generating dozens or hundreds of alerts from a single root cause. Without intelligent correlation, engineers waste hours manually connecting the dots.

Meanwhile, machine learning models that could solve real business problems sit in notebooks, never making it to production. The gap between data science and operations is costly. Data scientists lack the infrastructure to deploy models reliably. Operations teams lack the tooling to monitor models that do make it live.

The complexity of cloud-native architectures, microservices and distributed systems has outpaced traditional approaches. Manual processes that worked for simpler systems simply cannot scale.

Three Emerging Practices Changing the Game

Three distinct but related practices have emerged to address these challenges. Each solves a specific problem whilst contributing to a broader transformation in how organisations build and run digital services.

AIOps: Intelligence for Your Operations

AIOps (Artificial Intelligence for IT Operations) applies machine learning to the work of IT operations. Originally coined by Gartner, AIOps platforms collect data from across your environment, analyse it in real-time and surface patterns, anomalies or likely incidents.

The key capability is event correlation. Instead of presenting 1,000 raw alerts, AIOps systems analyse metadata, timing, topological dependencies and historical patterns to collapse related events into a single coherent incident. What was 1,000 alerts becomes one actionable event with a causal chain attached.

Beyond detection, AIOps platforms can trigger automated responses to common problems, reducing time to remediation. Because they learn from historical data, they can offer predictive insights that shift operations away from constant firefighting.

Teams implementing AIOps report measurable improvements: 60-80% reduction in alert volume, 50-70% faster incident response and significant reductions in operational toil. The technology is maturing rapidly, with Gartner predicting that 60% of large enterprises will have adopted AIOps platforms by 2026.

MLOps: Getting Models into Production

Whilst AIOps uses ML to improve operations, MLOps (Machine Learning Operations) is about operationalising machine learning itself. Building a model is only a small part of making it useful. Models change, data changes, and performance degrades over time if the system isn't maintained.

MLOps is an engineering culture and practice that unifies ML development and ML operations. It extends DevOps by treating machine learning models and data assets as first-class citizens within the delivery lifecycle.

In practice, this means continuous integration and continuous delivery for machine learning. Changes to models and pipelines are tested and deployed in a controlled way. Model versioning tracks not just the model artefact, but also the datasets and hyperparameters that produced it. Monitoring in production watches for performance drift and decides when to retrain or roll back.

The MLOps market was valued at $2.2 billion in 2024 and is projected to reach $16.6 billion by 2030, reflecting rapid adoption across industries. Organisations that successfully implement MLOps report that up to 88% of ML initiatives that previously failed to reach production are now being deployed successfully.

A typical MLOps implementation looks like this: data scientists work in their preferred tools, but when they're ready to deploy, the model goes through automated testing, gets versioned alongside its training data and deploys with built-in monitoring for performance drift. If the model degrades, it can automatically retrain or roll back.

The SRE Automation Opportunity

Site Reliability Engineering, originally created at Google, applies software engineering principles to operations problems. It encompasses availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning. Rather than replacing AIOps, the likely outcome is convergence. Analytics, automation and reliability engineering become mutually reinforcing, with organisations adopting integrated approaches that combine intelligent monitoring, automated operations and proactive reliability practices.

What This Looks Like in the Real World

The difference between traditional operations and ML-powered operations shows up in everyday scenarios.

Before: An application starts responding slowly. Monitoring systems fire hundreds of alerts across different tools. An engineer spends two hours correlating logs, metrics and traces to identify that a database connection pool is exhausted. They manually scale the service, update documentation and hope to remember the fix next time.

After: The same slowdown triggers anomaly detection. The AIOps platform correlates signals across the stack, identifies the connection pool issue and surfaces it as a single incident with context. Either an automated remediation kicks in (scaling the pool based on learned patterns) or the engineer receives a notification with diagnosis complete and remediation steps suggested. Resolution time drops from hours to minutes.

Before: A data science team builds a pricing optimisation model. After three months of development, they hand a trained model to engineering. Engineering spends another month building deployment infrastructure, writing monitoring code and figuring out how to version the model. By the time it reaches production, the model is stale and performs poorly.

After: The same team works within an MLOps platform. Development happens in standard environments with experiment tracking. When ready, the data scientist triggers deployment through a single interface. The platform handles testing, versioning, deployment and monitoring. The model reaches production in days instead of months, and automatic retraining keeps it current.

These patterns extend across industries. Financial services firms use MLOps for fraud detection models that need continuous updating. E-commerce platforms use AIOps to manage complex microservices architectures. Healthcare organisations use both to ensure critical systems remain available whilst deploying diagnostic models safely.

The Tech Behind the Transformation (Optional Deep Dive)

If you want to understand why this convergence is happening now, it helps to know about transformers and vector embeddings. If you're more interested in implementation, skip to the next section.

The breakthrough that enabled modern AI came in 2017 with a paper titled "Attention Is All You Need". Ashish Vaswani and colleagues at Google introduced the transformer architecture, a neural network design that processes sequential data (like sentences) by computing relationships across the entire sequence at once, rather than step by step.

The key innovation is self-attention. Earlier models struggled with long sequences because they processed data sequentially and lost context. Self-attention allows a model to examine all parts of an input simultaneously, computing relationships between each token and every other token. This parallel processing is a major reason transformers scale well and perform strongly on large datasets.

Transformers underpin models like GPT and BERT. They enable applications from chatbots to content generation, code assistance to semantic search. For operations teams, transformer-based models power the natural language interfaces that let engineers query complex systems in plain English and the embedding models that enable semantic search across logs and documentation.

Vector embeddings represent concepts as dense vectors in high-dimensional space. Similar concepts have embeddings that are close together, whilst unrelated concepts are far apart. This lets models quantify meaning in a way that supports both understanding and generation.

In operations contexts, embeddings enable semantic search. Instead of searching logs for exact keyword matches, you can search for concepts. Query "authentication failures" and retrieve related events like "login rejected", "invalid credentials" or "session timeout", even if they don't contain your exact search terms.

Retrieval-Augmented Generation (RAG) combines these capabilities to make AI systems more accurate and current. A RAG system pairs a language model with a retrieval mechanism that fetches external information at query time. The model generates responses using both its internal knowledge and retrieved context.

This approach is particularly valuable for operations. A RAG-powered assistant can pull current runbook procedures, recent incident reports and configuration documentation to answer questions like "how do we handle database failover in the production environment?" with accurate, up-to-date information.

The technical stack supporting RAG implementations typically includes vector databases for similarity search. As of 2025, commonly deployed options include Pinecone, Milvus, Chroma, Faiss, Qdrant, Weaviate and several others, reflecting a fast-moving landscape that's becoming standard infrastructure for many AI implementations.

Where to Begin

Starting with ML-powered operations doesn't require a complete transformation. Begin with targeted improvements that address your most pressing problems.

If you're struggling with alert-fatigue...

Start with event correlation. Many AIOps platforms offer this as an entry point without requiring full platform adoption. Look for solutions that integrate with your existing monitoring tools and can demonstrate noise reduction in a proof of concept.

Focus on one high-volume service or team first. Success here provides both immediate relief and a template for broader rollout. Track metrics like alerts per day, time to acknowledge and time to resolution to demonstrate impact.

Tools worth considering include established platforms like Datadog, Dynatrace and ServiceNow, alongside newer entrants like PagerDuty AIOps and specialised incident response platforms like incident.io.

If you have ML models stuck in development...

Begin with MLOps fundamentals before investing in comprehensive platforms. Focus on model versioning first (track which code, data and hyperparameters produced each model). This single practice dramatically improves reproducibility and makes collaboration easier.

Next, automate deployment for one model. Choose a model that's already proven valuable but requires manual intervention to update. Build a pipeline that handles testing, deployment and basic monitoring. Use this as a template for other models.

Popular MLOps platforms include MLflow (open source), cloud provider offerings like AWS SageMaker, Google Vertex AI and Azure Machine Learning, and specialised platforms like Databricks and Weights & Biases.

If you're building with LLMs...

Implement observability from day one. LLM applications are different from traditional software. They're probabilistic, can be expensive to run, and their behaviour varies with prompts and context. You need to monitor performance (response times, throughput), quality (output consistency, appropriateness), bias, cost (token usage) and explainability.

Common pitfalls include underestimating costs, failing to implement proper prompt versioning, neglecting to monitor for model drift and not planning for the debugging challenges that come with non-deterministic systems.

The LLM observability space is evolving rapidly, with platforms like LangSmith, Arize AI, Honeycomb and others offering specialised tooling for monitoring generative AI applications in production.

Why This Matters Beyond the Tech

The convergence of ML and operations isn't just a technical shift. It requires cultural change, new skills and rethinking of traditional roles.

Teams need to understand not only deployment automation and infrastructure as code, but also concepts like attention mechanisms, vector embeddings and retrieval systems because these directly influence how AI-enabled services behave in production. They also need operational practices that can handle both deterministic systems and probabilistic ones, whilst maintaining reliability, compliance and cost control.

Data scientists are increasingly expected to understand production concerns like latency budgets, deployment strategies and operational monitoring. Operations engineers are expected to understand model behaviour, data drift and the basics of ML pipelines. The gap between these roles is narrowing.

Security and governance cannot be afterthoughts. As AI becomes embedded in tooling and operations become more automated, organisations need to integrate security testing throughout the development cycle, implement proper access controls and audit trails, and ensure models and automated systems operate within appropriate guardrails.

The organisations succeeding with these practices treat them as both a technical programme and an organisational transformation. They invest in training, establish cross-functional teams, create clear ownership and accountability, and build platforms that reduce cognitive load whilst enabling self-service.

Moving Forward

The convergence of machine learning and operations isn't a future trend, it's happening now. AIOps platforms are reducing alert noise and accelerating incident response. MLOps practices are getting models into production faster and keeping them performing well. The economic case for SRE automation is driving investment and innovation.

The organisations treating this as transformation rather than tooling adoption are seeing results: fewer outages, faster deployments, models that actually deliver value. They're not waiting for perfect solutions. They're starting with focused improvements, learning from what works and scaling gradually.

The question isn't whether to adopt these practices. It's whether you'll shape the change or scramble to catch up. Start with the problem that hurts most (alert fatigue, models stuck in development, reliability concerns) and build from there. The convergence of ML and operations offers practical solutions to real problems. The hard part is committing to the cultural and organisational changes that make the technology work.

A Practical Linux Administration Toolkit: Kernels, Storage, Filesystems, Transfers and Shell Completion

4th February 2026

Linux command-line administration has a way of beginning with a deceptively simple question that opens into several possible answers. Whether the task is checking which kernels are installed before an upgrade, mounting an NFS share for backup access, diagnosing low disk space, throttling a long-running sync job or wiring up tab completion, the right answer depends on context: the distribution, the file system type, the transport protocol and whether the need is a one-off action or a persistent configuration. This guide draws those everyday administrative themes into a single continuous reference.

Identifying Your System and Installed Kernels

Reading Distribution Information

A sensible place to begin any administration session is knowing exactly what you are working with. One quick approach is to read the release files directly:

cat /etc/*-release

On systems where bat is available (sometimes installed as batcat), the same files can be read with syntax highlighting using batcat /etc/*-release. Typical output on Ubuntu includes /etc/lsb-release and /etc/os-release, with values such as DISTRIB_ID=Ubuntu, VERSION_ID="20.04" and PRETTY_NAME="Ubuntu 20.04.6 LTS". Three additional commands, cat /etc/os-release, lsb_release -a and hostnamectl, each present the same underlying facts in slightly different formats, while uname -r reports the currently running kernel release in isolation. Adding more flags with uname -mrs extends the output to include the kernel name and machine hardware class, which on an older RHEL system might return something like Linux 2.6.18-8.1.14.el5 x86_64.

Querying Installed Kernels by Package Manager

On Red Hat Enterprise Linux, CentOS, Rocky Linux, AlmaLinux, Oracle Linux and Fedora, installed kernels are managed by the RPM package database and are queried with:

rpm -qa kernel

This may return entries such as kernel-5.14.0-70.30.1.el9_0.x86_64. The same information is also accessible through yum list installed kernel or dnf list installed kernel. On Debian, Ubuntu, Linux Mint and Pop!_OS the package manager differs, so the command changes accordingly:

dpkg --list | grep linux-image

Output may include versioned packages, such as linux-image-2.6.20-15-generic, alongside the metapackage linux-image-generic. Arch Linux users can query with pacman -Q | grep linux, while SUSE Enterprise Linux and openSUSE users can turn to rpm -qa | grep -i kernel or use zypper search -i kernel, which presents results in a structured table. Alpine Linux takes yet another approach with apk info -vvv | grep -E 'Linux' | grep -iE 'lts|virt', which may return entries such as linux-virt-5.15.98-r0 - Linux lts kernel.

Finding Kernels Outside the Package Manager

Package databases do not always tell the whole story, particularly where custom-compiled kernels are involved. A kernel built and installed manually will not appear in any package manager query at all. In that case, /lib/modules/ is a useful place to look, since each installed kernel generally has a corresponding module directory. Running ls -l /lib/modules/ may show entries such as 4.15.0-55-generic, 4.18.0-25-generic and 5.0.0-23-generic. A further check is:

sudo find /boot/ -iname "vmlinuz*"

This may return files such as /boot/vmlinuz-5.4.0-65-generic and /boot/vmlinuz-5.4.0-66-generic, confirming precisely which versions exist on disk.

A Brief History of vmlinuz

That naming convention is worth understanding because it appears on virtually every Linux system. vmlinuz is the compressed, bootable Linux kernel image stored in /boot/. The name traces back through computing history: early Unix kernels were simply called /unix, but when the University of California, Berkeley ported Unix to the VAX architecture in 1979 and added paged virtual memory, the resulting system, 3BSD, was known as VMUNIX (Virtual Memory Unix) and its kernel images were named /vmunix. Linux inherited vmlinuz as a mutation of vmunix, with the trailing z denoting gzip compression (though other algorithms such as xz and lzma are also supported). The counterpart vmlinux refers to the uncompressed, non-bootable kernel file, which is used for debugging and symbol table generation but is not loaded directly at boot. Running ls -l /boot/ will show the full set of boot files present on any given system.

Examining and Investigating Disk Usage

Why ls Is Not the Right Tool for Directory Sizes

Storage management is an area where a familiar command can mislead. Running ls -l on a directory typically shows it occupying 4,096 bytes, which reflects the directory entry metadata rather than the combined size of its contents. For real space consumption, du is the appropriate tool.

sudo du -sh /var

The above command produces a summarised, human-readable total such as 85G /var. The -s flag limits output to a single grand total and -h formats values in K, M or G units. For an individual file, du -sh /var/log/syslog might report 12M /var/log/syslog, while ls -lh /var/log/syslog adds ownership and timestamps to the same figure.

Drilling Down to Find Where Space Has Gone

When a file system is full and the need is to locate exactly where the space has accumulated, du can be made progressively more revealing. The command sudo du -h --max-depth=1 /var lists first-level subdirectories with sizes, potentially showing 77G /var/lib, 5.0G /var/cache and 3.3G /var/log. To surface the biggest consumers quickly, piping to sort and head works well:

sudo du -h /var/ | sort -rh | head -10

Adding the -a flag includes individual files alongside directories in the same output:

sudo du -ah /var/ | sort -rh | head -10

Apparent Size Versus Allocated Disk Space

There is a subtle distinction that sometimes causes confusion. By default, du reports allocated disk usage, which is governed by the file system block size. A single-byte file on a file system with 4 KB blocks still consumes 4 KB of disk. To see the amount of data actually stored rather than allocated, sudo du -sh --apparent-size /var reports the apparent size instead. The df command answers a different question altogether: it shows free and used space per mounted file system, such as /dev/sda1 at 73 per cent usage or /dev/sdb1 mounted on /data with 70 GB free. In practice, du is for locating what consumes space and df is for checking how much remains on each volume.

gdu: A Faster Interactive Alternative

Some administrators prefer a more modern tool for storage investigations, and gdu is a notable option. It is a fast disk usage analyser written in Go with an interactive console interface, designed primarily for SSDs where it can exploit parallel processing to full effect, though it functions on hard drives too with less dramatic speed gains. The binary release can be installed by extracting its .tgz archive:

curl -L https://github.com/dundee/gdu/releases/latest/download/gdu_linux_amd64.tgz | tar xz
chmod +x gdu_linux_amd64
mv gdu_linux_amd64 /usr/bin/gdu

It can also be run directly via Docker without installation:

docker run --rm --init --interactive --tty --privileged 
  --volume /:/mnt/root ghcr.io/dundee/gdu /mnt/root

In use, gdu scans a directory interactively when run without flags, summarises a target with gdu -ps /some/dir, shows top results with gdu -t 10 / and runs without interaction using gdu -n /. It supports apparent size display, hidden file inclusion, item counts, modification times, exclusions, age filtering and database-backed analysis through SQLite or BadgerDB. The project documentation notes that hard links are counted only once and that analysis data can be exported as JSON for later review.

Unpacking TGZ Archives

A brief note on the tar command is useful here, since it appears throughout Linux administration, including in the gdu installation step above. A .tgz file is simply a GZIP-compressed tar archive, and the standard way to extract one is:

tar zxvf archive.tgz

Modern GNU tar can detect the compression type automatically, so the -z flag is often optional:

tar xvf archive.tgz

To extract into a specific directory rather than the current working directory, the -C option takes a destination path:

tar zxvf archive.tgz -C /path/to/destination/

To inspect the contents of a .tgz file without extracting it, the t (list) flag replaces x (extract):

tar ztvf archive.tgz

The tar command was first introduced in the seventh edition of Unix in January 1979 and its name comes from its original purpose as a Tape ARchiver. Despite that origin, modern tar reads from and writes to files, pipes and remote devices with equal facility.

Mounting NFS Shares and Optical Media

Installing NFS Client Tools

NFS remains common on Linux and Unix-like systems, allowing remote directories to be mounted locally and treated as though they were native file systems. Before a client can mount an NFS export, the client packages must be installed. On Ubuntu and Debian, that means:

sudo apt update
sudo apt install nfs-common

On Fedora and RHEL-based distributions, the equivalent is:

sudo dnf install nfs-utils

Once installed, showmount -e 10.10.0.10 can list available exports from a server, returning output such as /backups 10.10.0.0/24 and /data *.

Mounting an NFS Share Manually

Mounting an NFS share follows the same broad pattern as mounting any other file system. First, create a local mount point:

sudo mkdir -p /var/backups

Then mount the remote export, specifying the file system type explicitly:

sudo mount -t nfs 10.10.0.10:/backups /var/backups

A successful command produces no output. Verification is done with mount | grep nfs or df -h, after which the local directory acts as the root of the remote file system for all practical purposes.

Persisting NFS Mounts Across Reboots

Since a manual mount does not survive a reboot, persistent setups use /etc/fstab. An appropriate entry looks like:

10.10.0.10:/backups /var/backups nfs defaults,nofail,_netdev 0 0

The nofail option prevents a boot failure if the NFS server is unavailable when the machine starts. The _netdev flag marks the mount as network-dependent, ensuring the system defers the operation until the network stack is available. Running sudo mount -a tests the entry without rebooting.

Troubleshooting Common NFS Errors

NFS problems are often predictable. A "Permission denied" error usually means the server export in /etc/exports does not include the client, and reloading exports with sudo exportfs -ar is frequently the remedy. "RPC: Program not registered" indicates the NFS service is not running on the server, in which case sudo systemctl restart nfs-server applies. A "Stale file handle" error generally follows a server reboot or a deleted file and is cleared by unmounting and remounting. Timeouts and "Server not responding" messages call for checking network connectivity, confirming that firewall rules permit access to port 111 (rpcbind, required for NFSv3) and port 2049 (NFS itself), and verifying NFS version compatibility using the vers=3 or vers=4 mount option. NFSv4 requires only port 2049, while NFSv2 and NFSv3 also require port 111. To detach a share, sudo umount /var/backups is the standard route, with fuser -m /var/backups helping identify processes that are blocking the unmounting process.

Mounting Optical Media

CDs and DVDs are less central than they once were, but some systems still need to read them. After inserting a disc, blkid can identify the block device path, which is typically /dev/sr0, and will report the file system type as iso9660. With a mount point created using sudo mkdir /mnt/cdrom, the disc is mounted with:

sudo mount /dev/sr0 /mnt/cdrom

The warning device write-protected, mounted read-only is expected for optical media and can be disregarded. CDs and DVDs use the ISO 9660 file system, a data-exchange standard designed to be readable across operating systems. Once mounted, the disc contents are accessible under /mnt/cdrom, and sudo umount /mnt/cdrom detaches it cleanly when work is complete.

Transferring Files Securely and Efficiently

Copying Files with scp

scp (Secure Copy) transfers files and directories between hosts over SSH, encrypting both data and authentication credentials in transit. Its basic syntax is:

scp [OPTIONS] [[user@]host:]source [[user@]host:]destination

The colon is how scp distinguishes between local and remote paths: a path without a colon is local. A typical upload from a local machine to a remote host looks like:

scp file.txt remote_username@10.10.0.2:/remote/directory

A download from a remote host to the local machine reverses the argument order:

scp remote_username@10.10.0.2:/remote/file.txt /local/directory

Commonly used options include -r for recursive directory copies, -p to preserve metadata such as modification times and permissions, -C for compression, -i for a specific private key, -l to cap bandwidth in Kbit/s and the uppercase -P to specify a non-standard SSH port. It is also possible to copy between two remote hosts directly, routing the transfer through the local machine with the -3 flag.

The Protocol Change in OpenSSH 9.0

There is an important change in modern OpenSSH that administrators should be aware of. From OpenSSH 9.0 onward, the scp command uses the SFTP protocol internally by default rather than the older SCP/RCP protocol, which is now considered outdated. The command behaves identically from the user's perspective, but if an older server requires the legacy protocol, the -O flag forces it. For advanced requirements such as resumable transfers or incremental directory synchronisation, rsync is generally the better fit, particularly for large directory trees.

Throttling rsync to Protect Bandwidth

Even with rsync, raw speed is not always desirable. A backup script consuming all available bandwidth can disrupt other services on the same network link, so --bwlimit is often essential. The basic syntax is:

rsync --bwlimit=KBPS source destination

The value is in units of 1,024 bytes unless an explicit suffix is added. A fractional value is also valid: --bwlimit=1.5m sets a cap of 1.5 MB/s. A local transfer capped at 1,000 KB/s looks like:

rsync --bwlimit=1000 /path/to/source /path/to/dest/

And a remote backup:

rsync --bwlimit=1000 /var/www/html/ backups@server1.example.com:~/mysite.backups/

The man page for rsync explains that --bwlimit works by limiting the size of the blocks rsync writes and then sleeping between writes to achieve the target average. Some volume undulation is therefore normal in practice.

Managing I/O Priority with ionice

Bandwidth is only one dimension of the load a transfer places on a system. Disk I/O scheduling may also need attention, particularly on busy servers running other workloads. The ionice utility adjusts the I/O scheduling class and priority of a process without altering its CPU priority. For instance:

/usr/bin/ionice -c2 -n7 rsync --bwlimit=1000 /path/to/source /path/to/dest/

This runs the rsync process in best-effort I/O class (-c2) at the lowest priority level (-n7), combining transfer rate limiting with reduced I/O priority. The scheduling classes are: 0 (none), 1 (real-time), 2 (best-effort) and 3 (idle), with priority levels 0 to 7 available for the real-time and best-effort classes.

Together, --bwlimitand  ionice provide complementary controls over exactly how much resource a routine transfer is permitted to consume at any given time.

Setting Up Bash Tab Completion

On Ubuntu and related distributions, Bash programmable completion is provided by the bash-completion package. If tab completion does not function as expected in a new installation or container environment, the following commands will install the necessary support:

sudo apt update
sudo apt upgrade
sudo apt install bash-completion

The package places a shell script at /etc/profile.d/bash_completion.sh. To ensure it is loaded in shell startup, the following appends the source line to .bashrc:

echo "source /etc/profile.d/bash_completion.sh" >> ~/.bashrc

A conditional form avoids duplicating the line on repeated runs:

grep -wq '^source /etc/profile.d/bash_completion.sh' ~/.bashrc 
  || echo 'source /etc/profile.d/bash_completion.sh' >> ~/.bashrc

The script is typically loaded automatically in a fresh login shell, but source /etc/profile.d/bash_completion.sh activates it immediately in the current session. Once active, pressing Tab after partial input such as sudo apt i or cat /etc/re completes commands and paths against what is actually installed. Bash also supports simple custom completions: complete -W 'google.com cyberciti.biz nixcraft.com' host teaches the shell to offer those three domains after typing host and pressing Tab, which illustrates how the feature can be extended to match the patterns of repeated daily work.

Installing Snap on Debian

Snap is a packaging format developed by Canonical that bundles an application together with all of its dependencies into a single self-contained package. Snaps update automatically, roll back gracefully on failure and are distributed through the Snap Store, which carries software from both Canonical and independent publishers. The background service that manages them, snapd, is pre-installed on Ubuntu but requires a manual setup step on Debian.

On Debian 9 (Stretch) and newer, snap can be installed directly from the command line:

sudo apt update
sudo apt install snapd

After installation, logging out and back in again, or restarting the system, is necessary to ensure that snap's paths are updated correctly in the environment. Once that is done, install the snapd snap itself to obtain the latest version of the daemon:

sudo snap install snapd

To verify that the setup is working, the hello-world snap provides a straightforward test:

sudo snap install hello-world
hello-world

A successful run prints Hello World! to the terminal. Note that snap is not available on Debian versions before 9. If a snap installation produces an error such as snap "lxd" assumes unsupported features, the resolution is to ensure the core snap is present and current:

sudo snap install core
sudo snap refresh core

On desktop systems, the Snap Store graphical application can then be installed with sudo snap install snap-store, providing a point-and-click interface for browsing and managing snaps alongside the command-line tools.

Increasing the Root Partition Size on Fedora with LVM

Fedora's default installer has used LVM (Logical Volume Manager) for many years, dividing the available disk into a volume group containing separate logical volumes for root (/), home (/home) and swap. This arrangement makes it straightforward to redistribute space between volumes without repartitioning the physical disk, which is a significant advantage over a fixed partition layout. Note that Fedora 33 and later default to Btrfs without LVM for new installations, so the steps below apply to systems that were installed with LVM, including pre-Fedora 33 installs and any system where LVM was selected manually.

Because the root file system is in active use while the system is running, resizing it safely requires booting from a Fedora Live USB stick rather than the installed system. Once booted from the live environment, open a terminal and begin by checking the volume group:

sudo vgs

Output such as the following shows the volume group name, total size and, crucially, how much free space (VFree) is unallocated:

  VG     #PV #LV #SN Attr   VSize    VFree
  fedora   1   3   0 wz--n- <237.28g    0

Before proceeding, confirm the exact device mapper paths for the root and home logical volumes by running fdisk -l, since the volume group name varies between installations. Common names include /dev/mapper/fedora-root and /dev/mapper/fedora-home, though some systems use fedora00 or another prefix.

When Free Space Is Already Available

If VFree shows unallocated space in the volume group, the root logical volume can be extended directly and the file system resized in a single command:

lvresize -L +5G --resizefs /dev/mapper/fedora-root

The --resizefs flag instructs lvresize to resize the file system at the same time as the logical volume, removing the need to run resize2fs separately.

When There Is No Free Space

If VFree is zero, space must first be reclaimed from another logical volume before it can be given to root. The most common approach is to shrink the home logical volume, which typically holds the most available headroom. Shrinking a file system involves data moving on disk, so the operation requires the volume to be unmounted, which is why the live environment is essential. To take 10 GB from home:

lvresize -L -10G --resizefs /dev/mapper/fedora-home

Once that completes, the freed space appears as VFree in vgs and can be added to the root volume:

lvresize -L +10G --resizefs /dev/mapper/fedora-root

Both steps use --resizefs so that the file system boundaries are updated alongside the logical volume boundaries. After rebooting back into the installed system, df -h will confirm the new sizes are in effect.

Keeping a Linux System Well Maintained

The commands and configurations covered above form a coherent body of everyday Linux administration practice. Knowing where installed kernels are recorded, how to measure real disk usage rather than directory metadata, how to attach local and network file systems correctly, how to extract archives and move data securely without disrupting shared resources, how to make the shell itself more productive, how to extend a Debian system with snap packages and how to redistribute disk space between LVM volumes on Fedora converts a scattered collection of one-liners into a reliable working toolkit. Each topic interconnects naturally with the others: a kernel query clarifies what system you are managing, disk investigation reveals whether a file system has room for what you plan to transfer, NFS mounting determines where that transfer will land and bandwidth control determines what impact it will have while it runs.

Four technical portals that still deliver after decades online

3rd February 2026

The early internet was built on a different kind of knowledge sharing, one driven by individual expertise, community generosity and the simple desire to document what worked. Four informative websites that started in that era, namely MDN Web Docs, AskApache, WindowsBBS and Office Watch, embody that spirit and remain valuable today. They emerged at a time when technical knowledge was shared through forums, documentation and personal blogs rather than social media or algorithm-driven platforms, and their legacy persists in offering clarity and depth in an increasingly fragmented digital landscape.

MDN Web Docs

MDN Web Docs stands as a cornerstone of modern web development, offering comprehensive coverage of HTML, CSS, JavaScript and Web APIs alongside authoritative references for browser compatibility. Mozilla started the project in 2005 under the name Mozilla Developer Centre, and it has since grown into a collaborative effort of considerable scale. In 2017, Mozilla announced a formal partnership with Google, Microsoft, Samsung and the W3C to consolidate web documentation on a single platform, with Microsoft alone redirecting over 7,700 of its MSDN pages to MDN in that year.

For developers, the site is not merely a reference tool but a canonical guide that ensures standards are adhered to and best practices followed. Its tutorials, guides and learning paths make it indispensable for beginners and seasoned professionals alike. The site's community-driven updates and ongoing contributions from browser vendors have cemented its reputation as the primary source for anyone building for the web.

AskApache

AskApache is a niche but invaluable resource for those managing Apache web servers, built by a developer whose background lies in network security and penetration testing on shared hosting environments. The site grew out of the founder's detailed study of .htaccess files, which, unlike the main Apache configuration file httpd.conf, are read on every request and offer fine-grained, per-directory control without requiring root access to the server. That practical origin gives the content its distinctive character: these are not generic tutorials, but hard-won techniques born from real-world constraints.

The site's guides on blocking malicious bots, configuring caching headers, managing redirects with mod_rewrite and preventing hot-linking are frequently cited by system administrators and WordPress users. Its specificity and longevity have made it a trusted companion for those maintaining complex server environments, covering territory that mainstream documentation rarely touches.

WindowsBBS

WindowsBBS offers a clear window into the era when online forums were the primary hub for technical support. Operating in the tradition of classic bulletin board systems, the site has long been a resource for users troubleshooting Windows installations, hardware compatibility issues and malware removal. It remains completely free, sustained by advertisers and community donations, which reflects the ethos of mutual aid that defined early internet culture.

During the Windows XP and Windows 7 eras, community forums of this kind were essential for solving problems that official documentation often overlooked, with volunteers providing detailed answers to questions that Microsoft's own support channels would not address. While the rise of social media and centralised support platforms has reduced the prominence of such forums, WindowsBBS remains a testament to the power of community-driven problem-solving. Its straightforward structure, with users posting questions and experienced volunteers providing answers, mirrors the collaborative spirit that made the early web such a productive environment.

Office Watch

Office Watch has served as an independent source of Microsoft Office news, tips and analysis since 1996, making it one of the longer-running specialist publications of its kind. Its focus on Microsoft Office takes in advanced features and hidden tools that are seldom documented elsewhere, from lesser-known functions in Excel to detailed comparisons between Office versions and frank assessments of Microsoft's product decisions. That independence gives it a voice that official resources cannot replicate.

The site serves power users seeking to make the most of the software they use every day, with guides and books that extend its reach beyond the website itself. In an era where software updates are frequent and often poorly explained, Office Watch provides the kind of context and plain-spoken clarity that official documentation rarely offers.

The Enduring Value of Depth and Community

These four sites share a common thread: they emerged when technical knowledge was shared openly by experts and enthusiasts rather than filtered through algorithms or paywalls, and they retain the value that comes from that approach. Their continued relevance speaks to what depth, specificity and community can achieve in the digital world. While platforms such as Stack Overflow and GitHub Discussions have taken over many of the roles these sites once played, the original resources remain useful for their historical context and the quality of their accumulated content.

As the internet continues to evolve, the lessons from these sites are worth remembering. The most useful knowledge is often found at the margins, where dedicated individuals take the time to document, explain and share what they have learned. Whether you are a developer, a server administrator or an everyday Office user, these resources are more than archives: they are living repositories of expertise, built by people who cared enough to write things down properly.

Why SAS, R and Python can report different percentiles for the same data

2nd February 2026

Quantiles look straightforward on the surface. Ask for the median, the 75th percentile or the 95th percentile and most people expect one clear answer. Yet small differences between software packages often reveal that quantiles are not defined in only one way. When the same data are analysed in SAS, R or Python, the reported percentile can differ, particularly for small samples or for data sets with large gaps between adjacent values.

That difference is not necessarily a bug, and it is not a sign that one platform is wrong. It reflects the fact that sample quantiles are estimates of population quantiles, and statisticians have proposed several valid ways to construct those estimates. For everyday work with large samples, the distinction often fades into the background because the values tend to be close. For smaller samples, the choice of definition can matter enough to alter a reported result, a chart or a downstream calculation.

The Problem With the Empirical CDF

A useful starting point is understanding why multiple definitions exist at all. A sample quantile is an estimate of an unknown population quantile. Many approaches base that estimate on the empirical cumulative distribution function (ECDF), which approximates the cumulative distribution function (CDF) for the population. As Rick Wicklin explains in his 22nd May 2017 article on The DO Loop, the ECDF is a step function with a jump discontinuity at each unique data value. For that reason, the inverse ECDF does not exist and quantiles are not uniquely defined, which is precisely why different conventions have developed.

In high school, most people learn that when a sorted sample has an even number of observations, the median is the average of the two middle values. The default quantile definition in SAS extends that familiar rule to other quantiles. If the sample size is N and the q-th quantile is requested, then when Nq is an integer, the result is the data value x[Nq]. When Nq is not an integer, the result is the average of the two adjacent data values x[j] and x[j+1], where j = floor(Nq). Averaging is not the only choice available when Nq is not an integer, and that is where the definitions diverge.

The Hyndman and Fan Taxonomy

According to Hyndman and Fan ("Sample Quantiles in Statistical Packages," TAS, 1996), there are nine definitions of sample quantiles that commonly appear in statistical software packages. Three of those definitions are based on rounding and six are based on linear interpolation. All nine result in valid estimates.

As Wicklin describes in his 24th May 2017 article comparing all nine definitions, the nine methods share a common general structure. For a sample of N sorted observations and a target probability p, the estimate uses two adjacent data values x[j] and x[j+1]. A fractional quantity determines an interpolation parameter λ, and each definition has a parameter m that governs how interpolation between adjacent data points is handled. In general terms, the estimate takes the form q = (1 − λ)x[j] + λx[j+1], where λ and j depend on the values of p, N and the method-specific parameter m. The practical consideration at the extremes is that when p is very small or very close to 1, most definitions fall back to returning x[1] or x[N] respectively.

Default Methods Across Platforms

It is a misnomer to refer to one approach as "the SAS method" and another as "the R method." As Wicklin notes in his 26th July 2021 article comparing SAS, R and Python defaults, SAS supports five different quantile definitions through the PCTLDEF= option in PROC UNIVARIATE or the QNTLDEF= option in other procedures, and all nine can be computed via SAS/IML. R likewise supports all nine through the type parameter in its quantile function. The confusion arises not from limited capability, but from the defaults that most users accept without much thought.

By default, SAS uses Hyndman and Fan's Type 2 method (QNTLDEF=5 in SAS procedure syntax). R uses Type 7 by default, and that same Type 7 method is also the default in Julia and in the Python packages SciPy and NumPy. A comparison between SAS and Python therefore often becomes the same comparison as between SAS and R.

A Worked Example

The contrast between Type 2 and Type 7 is especially clear on a small data set. Wicklin uses the sample {0, 1, 1, 1, 2, 2, 2, 4, 5, 8} throughout both his 2017 and 2021 articles: ten observations, six unique values, and a particularly large gap between the two highest values, 5 and 8. That gap is deliberately chosen because the differences between quantile definitions are most visible when the sample is small and when adjacent ordered values are far apart.

The Type 2 method (SAS default) uses the ECDF to estimate population quantiles, so a quantile is always an observed data value or the average of two adjacent data values. The Type 7 method (R default) uses a piecewise-linear estimate of the CDF. Because the inverse of that piecewise-linear estimate is continuous, a small change in the probability level produces a small change in the estimated quantile, a property that is absent from the ECDF-based methods.

Where the Methods Agree and Where They Part Company

For the 0.5 quantile (the median), both methods return 2. A horizontal line at 0.5 crosses both CDF estimates at the same point, so there is no disagreement. This is one reason the issue can be easy to miss: some commonly reported percentiles coincide across definitions.

The 0.75 quantile tells a different story. Under Type 2, a horizontal line at 0.75 crosses the empirical CDF at 4, which is a data value. Under Type 7, the estimate is 3.5, which is neither a data value nor the average of adjacent values; it emerges from the piecewise-linear interpolation rule. The 0.95 quantile shows the sharpest divergence: Type 2 returns 8 (the maximum data value), while Type 7 returns 6.65, a value between the two largest observations.

Those differences are not errors. They are consequences of the assumptions built into each estimator. The default in SAS always returns a data value or the average of adjacent data values, whereas the default in R can return any value in the range of the data.

The Five Definitions Available in SAS Procedures

For users who stay within base SAS procedures, that same 22nd May 2017 article sets out the five available definitions clearly. QNTLDEF=1 and QNTLDEF=4 are piecewise-linear interpolation methods, whilst QNTLDEF=2, QNTLDEF=3 and QNTLDEF=5 are discrete rounding methods. The default is QNTLDEF=5. For the discrete definitions, SAS returns either a data value or the average of adjacent data values; the interpolation methods can return any value between observed data values.

The differences between the definitions are most apparent when there are large gaps between adjacent data values. Using the same ten-point data set, for the 0.45 quantile, different definitions return 1, 1.5, 1.95 or 2. For the 0.901 quantile, the round-down method (QNTLDEF=2) gives 5, the round-up method (QNTLDEF=3) gives 8, the backward interpolation method (QNTLDEF=1) gives 5.03 and the forward interpolation method (QNTLDEF=4) gives 7.733. These are not trivial discrepancies on a small sample.

The Four Remaining Definitions and the General Formula

The 24th May 2017 comparison article goes further, showing how SAS/IML can be used to compute the four Hyndman and Fan definitions that are not natively supported in SAS procedures. Each of the nine methods is an instance of the same general formula involving the parameter m. The four non-native methods each require their own specific value (or expression) for m, plus a small boundary value c that governs the behaviour at the extreme ends of the probability scale.

Wicklin also overlays the default methods for SAS (Type 2) and R (Type 7) graphically on the ten-point data set, showing that the SAS default produces a discrete step pattern whilst the R default traces a smoother piecewise-linear curve. He then repeats the comparison on a sample of 100 observations from a uniform distribution and finds that the two methods are almost indistinguishable at that scale, illustrating why many analysts work comfortably with defaults most of the time.

A SAS/IML Function to Match R's Default

For analysts who need cross-platform consistency, that same 26th July 2021 article provides a simplified SAS/IML function that reproduces the Type 7 default from R, Julia, SciPy and NumPy. The function converts the input to a column vector, handles missing values and the degenerate case of a single observation, then sorts the data and applies the Type 7 rule. The index into the sorted data would be j = floor(N*p + m) with m = 1 − p, the interpolation fraction is g = N*p + m − j, and the estimate is (1 − g)x[j] + gx[j+1] for all p < 1, with x[N] returned when p = 1. This gives SAS users a practical route to reproduce the default quantiles from other platforms without switching software.

If SAS/IML is unavailable, Wicklin suggests using PCTLDEF=1 in PROC UNIVARIATE (or QNTLDEF=1 in PROC MEANS) as the next best option. This produces the Type 4 method, which is not the same as Type 7 but does use interpolation rather than a purely discrete rule, so it avoids the jumpy behaviour of the ECDF-based defaults.

A Wider Point About Conventions in Statistical Software

The comments on the 2021 article make clear that quantiles are not an isolated example. Conventions differ across platforms in ARIMA sign conventions, whether likelihood constants are included in reported values, the definition of the multivariate autocovariance function and the sign convention and constant term used in discrete Fourier transforms. Quantiles are simply a particularly visible instance of a broader pattern where results can differ even when each platform is behaving correctly.

One question from the same comment thread is also worth noting: SQL's percent_rank formula, defined as (rank − 1) / (total_rows − 1), does not estimate a quantile. As Wicklin clarifies in his reply, it estimates the empirical distribution function for observed data values. Both concepts involve percentiles and rankings, but they address different problems. One maps values to cumulative proportions; the other maps cumulative probabilities to estimated values.

Does the Definition of a Sample Quantile Actually Matter?

The answer from all three articles is balanced. Yes, it matters in principle, and it is noticeably important for small samples, in extreme tails and wherever there are wide gaps in the ordered data. No, it often matters very little for larger samples (say, 100 or more observations), where the nine methods tend to produce results that are nearly indistinguishable. Wicklin's 100-observation comparison showed that the Type 2 and Type 7 estimates were so close that one set of points sat almost directly on top of the other.

That is why, as Wicklin notes, most analysts simply accept the default method of whichever software they are using. Even so, there are contexts where the definition should be stated explicitly. Regulatory work, reproducible research, published analyses and any cross-software validation all benefit from naming the method in use. Without that detail, two analysts can work correctly with the same data and still arrive at different percentile values.

Matching Quantile Definitions Across SAS, R and Python

The practical conclusion is clear. SAS defaults to Hyndman and Fan Type 2 (QNTLDEF=5), while R, Julia, SciPy and NumPy default to Type 7. SAS procedures natively support five of the nine definitions, and SAS/IML can be used to compute all nine, including a simplified function for the R default. For large data sets, the differences are typically negligible. For small data sets, particularly those with unevenly spaced observations, they can be large enough to change the story the numbers appear to tell. The solution is not to favour any particular platform, but to be explicit about the method wherever precision matters.

WordPress Starter Themes: From bare foundations to modern workflows

1st February 2026

Starter themes have long occupied an important place in WordPress development. They sit between a completely blank project and a fully styled off-the-shelf theme, offering enough structure to speed up work without dictating how the finished site must look. For agencies, freelancers and in-house teams, that balance can save considerable time, allowing developers to begin with a lean codebase and concentrate on the parts that make each site distinct.

That broad appeal helps explain why starter themes continue to evolve in different directions. Some remain deliberately minimal and close to WordPress core conventions, while others embrace modern tooling such as Composer, Vite, Tailwind CSS and component-based templating. Alongside these are starter themes intended for visual builders and users who want a gentler route into customisation. Taken together, they demonstrate that there is no single definition of a WordPress starter theme, with the common thread being that each provides a starting point rather than a finished product.

wd_s: A Generator-Driven Approach

wd_s is a generator-driven starter theme from WebDevStudios. The wd_s generator makes the setup process more tailored by asking for project details: name, URL, description, namespace, text domain, author, author URL, author email and development URL. Once those details are entered, the script performs a find-and-replace and delivers a ZIP file ready to extract into wp-content/themes. The process is straightforward, though it also reflects a more structured approach to project setup than many starter themes provide.

The generator highlights details that matter in real-world theme development, but are sometimes overlooked when beginning from a generic scaffold. Name-spacing, text domains and project metadata all play a part in maintainability and localisation, and wd_s brings those choices to the fore from the very beginning. The example values shown by the generator, such as "Acme Inc." for the name field and a namespace using underscores, are illustrative rather than prescriptive. What stands out more than any one field is the intention to reduce repetitive manual setup and encourage consistency from the very start of a project.

Sage: A Modern Development Workflow

Roots Sage represents a significant shift towards a modern development workflow. Sage is a Tailwind CSS WordPress starter theme with Laravel Blade templating, currently at version 11.1.0 and with over 13,191 GitHub stars at the time of writing. Setup uses Composer and NPM rather than a simple ZIP download, and a typical installation begins in wp-content/themes with composer create-project roots/sage my-theme, followed by npm install and npm run build.

That workflow signals the audience Sage is aimed at. Rather than merely wrapping WordPress templates in a minimal theme shell, Sage introduces tooling and conventions familiar to developers who work with Laravel and modern front-end stacks. The Vite build process generates files including a manifest, compiled CSS and JavaScript and a theme.json, completing the whole build in a matter of seconds and demonstrating that WordPress development can be integrated into contemporary asset pipelines without giving up compatibility with the CMS.

Blade Templating and Component-Driven Design

Blade templating is central to Sage's proposition. The base layout in resources/views/layouts/app.blade.php shows a clean separation of structure and content using directives such as @include, @yield and @hasSection. Header and footer hooks still call familiar WordPress functions including wp_head, wp_body_open and wp_footer, but the surrounding syntax is closer to Laravel than traditional PHP-heavy WordPress templates. This gives developers access to template inheritance, reusable components and directives, making larger codebases considerably easier to organise.

Reusable components illustrate this style compactly. Properties define type and message, a PHP match expression selects classes based on alert type and the component merges those classes into its final markup. The result is not merely an isolated snippet but a demonstration of how Sage encourages component-driven design, reducing repetition and making presentation logic easier to follow, particularly in projects with many shared interface elements.

Tailwind CSS and the Block Editor

Sage places strong emphasis on integrating Tailwind CSS with the WordPress block editor. It automatically generates theme.json from the Tailwind configuration, making colours, font families and sizes immediately available in the block editor with zero additional configuration. The sample app.css imports Tailwind and points at views and app files as content sources, while the generated theme.json includes settings for layout, background, colour palettes, spacing and typography. The palette includes eleven shades of grey along with black and white, and the typography settings mirror Tailwind's familiar scale from xs through 9xl.

This addresses a long-standing friction point on WordPress theming: keeping front-end design systems in sync with the editing experience. In older workflows, editor styles and front-end styles often drifted apart, creating extra maintenance work and inconsistency for content editors. Sage's approach narrows that gap by deriving editor settings directly from the same Tailwind configuration used for the front end, with theme.json generated during the build process rather than maintained by hand.

Theme Structure and the Roots Ecosystem

The theme structure for Sage reinforces its emphasis on organisation. The app directory contains providers, view composers, filters.php and setup.php, while resources holds CSS, JavaScript and Blade views grouped into components, layouts, partials and sections. Public assets, composer.json, package.json, theme.json and vite.config.js complete the structure, paired with PSR-4 autoloading, service providers and Acorn, which brings Laravel-style patterns into WordPress. The Vite configuration includes Tailwind, the Laravel Vite plugin and Roots plugins for WordPress support and theme.json generation, plus aliases for scripts, styles, fonts and images.

Another notable feature is hot module replacement in the WordPress block editor, with style changes updating instantly without page refreshes. Sage sits within the broader Roots ecosystem, which also includes Bedrock (a WordPress boilerplate for Composer and Git-based projects), Trellis (a server provisioning and deployment tool), Acorn, Radicle (which bundles the entire Roots stack into a single starting point) and WP Packages (a Composer repository for WordPress plugins and themes). Testimonials on the Roots website emphasise that many developers regard this ecosystem as a route to a more structured and modern WordPress experience, with Sage having been actively maintained for over a decade.

Visual Composer Starter Theme: A Builder-Friendly Option

Not every starter theme is aimed at developers working with command-line tooling and component-based templates. The Visual Composer Starter Theme occupies a different place in the landscape, described as a free bundle of a lightweight theme and a powerful WordPress page builder. It is aimed at building blogs, WooCommerce stores, business sites and personal websites, and the language surrounding it stresses ease of use, intuitive theme options and layout customisation tools, presented as a free resource intended to support the WordPress community.

Its feature set reflects that broader audience. The theme is easy to customise through the WordPress customiser, SEO-friendly and responsive by default, covering use cases such as personal blogs, landing pages, business sites, portfolios, startups and online stores. WooCommerce compatibility receives particular emphasis, with support for adjusting design preferences via the customiser and building online shops at no cost. Hero and featured images, unlimited colour options, page-level design controls and a choice between regular and mobile sandwich-style menus are all included.

There is also a strong focus on compatibility. The theme is fully translation-ready and compatible with WPML, qTranslate and Polylang, while support for Advanced Custom Fields and Toolset for custom post type development is highlighted. It is also presented as ready to combine with the Visual Composer website builder and is developed openly on GitHub, where anyone can contribute. This is less about offering an unvarnished code scaffold and more about giving users a flexible visual base with a broad range of built-in options, though it remains part of the starter theme conversation because it is designed to be extended rather than merely installed and left untouched.

Bones: Speed, Control and Pragmatism

Bones, designed and developed by Eddie Machado, returns more closely to the classic developer-oriented concept while retaining a distinctive voice. It is described as an HTML5, mobile-first starter theme for rapid WordPress development, and it makes clear that it is not a framework. Frameworks can introduce their own conventions and complexity, whereas Bones is designed to be as bare and minimalistic as possible, intended to be used on a per-project basis with no child themes.

The mobile-first emphasis is one of Bones' defining characteristics. Its Sass setup serves minimal resources to smaller screens before scaling up for larger viewports, an approach tied to performance as well as responsiveness, and Bones includes extensive comments and examples to help developers get started with Sass. It also provides a well-documented example for custom post types and functions to customise the WordPress admin area for clients, though these are entirely optional and can be removed if not needed. The project is released under the WTFPL, one of the most permissive licences available, and takes pride in removing unnecessary elements from the WordPress header to keep output clean and lightweight. The philosophy is to keep what is useful and discard the rest, building from a solid and speedy foundation.

Selecting a Starter Theme to Match Your Workflow

When viewed together, these themes reveal how varied the starter theme category has become. A Speckyboy roundup of top starter and bare-bones themes for WordPress development in 2026 (last updated on the 8th of March 2026) places Sage alongside newer and more editor-focused options including Blockbase, GeneratePress, Air, WDS BT, Byvex and Flynt. The roundup notes that every website serves different goals and that WordPress is flexible enough to support them all, but also makes clear that starting each project from scratch leads to repeated work, with starter themes offering a way to avoid that repetition while preserving freedom over design and functionality.

The same roundup provides a useful framework for evaluating starter themes. Ongoing maintenance matters because themes need to keep pace with WordPress and surrounding technologies, and themes that have not been updated in years should be avoided. The distinction between classic and block themes is important, since developers need a starting point that aligns with their preferred editing model. Features that genuinely speed development, whether block patterns, a comprehensive settings panel or development tools, can make a significant difference over time. A starter theme should also stay out of the way rather than burden projects with an opinionated design direction, and compatibility with a preferred editor or page builder remains central to choosing well.

Whether the preference is for the generator-based setup of wd_s, the modern tooling of Sage, the builder-friendly versatility of Visual Composer Starter Theme or the stripped-back classic structure of Bones, each represents a different answer to the same challenge. Developers and site builders often need a head start rather than a finished design, and a good starter theme provides exactly that, while leaving enough room for the final result to become something entirely its own.

Modernising SAS: The 4GL Apps and SASjs Ecosystem

31st January 2026

Custom interfaces to the world's most powerful analytics platform are no longer a niche concern. In many organisations, SAS remains central to reporting, modelling and operational decision-making, yet the way users interact with that capability can vary widely. Some teams still rely on desktop applications, batch processes, shared drives and manual interventions, while others are moving towards web-based interfaces, stronger governance and a more modern development workflow. The material at sasapps.io points to an ecosystem built around precisely that transition, blending long-standing SAS expertise with open-source tooling and documented delivery methods.

The Company Behind the Ecosystem

At the centre of this transition is 4GL Apps. The company's positioning is straightforward: help organisations leverage their SAS investment through services, solutions and products that fit specific needs. Rather than replacing SAS, the aim is to extend it with custom interfaces and delivery approaches that are maintainable, transparent and based on standard frameworks. An emphasis on documentation appears throughout the site, suggesting that projects are intended either for handover to internal teams or for ongoing support under clearly defined packages.

That proposition matters because many SAS environments have grown over years, sometimes decades. In such settings, technical capability is rarely the issue. The challenge is more often how to expose that capability in ways that are usable, secure and sustainable. A powerful analytics platform can still be hampered by awkward user journeys, brittle desktop tooling or resource-heavy support arrangements, and the 4GL Apps model tries to address those practical concerns without discarding existing SAS infrastructure.

Services

The service offering gives a useful sense of how this approach is organised. One strand is SAS App Delivery, framed not merely as building applications, but also as building tools that make SAS app development faster. That detail points to an emphasis on repeatability rather than one-off implementation. Another strand is SAS App Support, aimed at organisations with existing SAS-powered applications but insufficient internal resource to keep them running. Fixed-price plans are offered to keep those interfaces active, which implies an attempt to make operational costs more predictable. A third service area is SASjs Enhancement, where new features can be added to SASjs at a discounted rate to support particular use cases.

Solutions

These services sit alongside a broader set of solutions. One is the creation of SAS-powered HTML5 applications, described as bespoke builds tailored to specific workflow and reporting requirements, using fully open-source tools, standard frameworks and full documentation. Clients are given a practical choice: maintain the application in-house or use a transparent support package. Another solution addresses end-user computing risk through data capture and control. Here, the approach enables business users to self-load VBA-driven Excel reporting tools into a preferred database while applying data quality checks at source, a four-eyes (or more) approval step at each stage and full audit traceability back to the original EUC artefact. A further solution is the modernisation of legacy AF/SCL desktop applications, with direct migration to SAS 9 or Viya in order to improve user experience, security and scalability while moving to a modern SAS stack supported by open-source technology.

That last area reveals a theme running through the whole ecosystem: modernisation does not necessarily mean abandoning what exists. In many SAS estates, AF/SCL applications remain deeply embedded in business processes, and replacing them outright can be costly and risky, especially when they encode years of operational logic. A migration path that preserves business function while improving maintainability and interface design will naturally appeal to teams that need progress without disruption.

Products

The product range fills out the picture further. Data Controller for SAS enables business users to make controlled changes to data in SAS. The SASjs Framework is a collection of open-source tools to accelerate SAS DevOps and the development of SAS-powered web applications. There is also an AF/SCL Kit, migration tooling for the rapid modernisation of monolithic AF/SCL applications. Together, these products form a stack covering interface delivery, governed data change and development workflow, and they suggest that the company's work is not limited to consultancy but includes reusable software assets with their own documentation and source code.

Data Controller: Governance and Audit

Data Controller receives the richest functional description in the ecosystem's documentation. It is intended for business owners in regulatory reporting environments and, more broadly, for any enterprise that needs to perform manual data uploads with validation, approval, security and control. The rationale is rooted in familiar SAS working practices. Users may place files on network drives for batch loading, update data directly using SAS code, open a dataset in Enterprise Guide and change a value, or ask a database administrator to run a script update. According to the product's own documentation, those approaches are less than ideal: every new piece of data may require a new programme, end users may need to have `modify` access to sensitive data locations, datasets can become locked, and change requests can slow the process.

Data Controller is presented as a response to those weaknesses. The goal is described as focusing on great user experience and auditor satisfaction, while saving years of development and testing compared with a custom-built alternative. It is a SAS-powered web application with real-time capabilities, where intraday concurrent updates are managed using a lock table and queuing mechanism. Updates are aborted if another user has changed the table since the approval difference was generated, which helps preserve consistency in multi-user environments. Authentication and authorisation rely on the existing SASLogon framework, and end users do not require direct access to the target tables.

The governance model is equally central. All data changes require one or more approvals before a table is updated, and the approver sees only the changes that will be applied to the target, including new, deleted and changed rows. The system supports loading tables of different types through SAS libname engines, with support for retained keys, SCD2 loads, bitemporal data and composite primary keys. Full audit history is a prominent feature: users can track every change to data, including who made it, when it was made, why it was made and what the actual change was, all accessible through a History page.

A particularly notable feature is that onboarding new tables requires zero code. Adding a table is a matter of configuration performed within the tool itself, without the need to define column types or lengths manually, as these are determined dynamically at runtime. Workflow extensibility is built in through configurable hook scripts that execute before and after each action, with examples such as running a data quality check after uploading a mapping table or running a model after changing a parameter. Taken together, those features position Data Controller less as a narrow upload utility and more as a governed operational layer for business-managed data change.

The application was designed to work on multiple devices and different screen types, combined with SAS scalability and security to provide flexibility and location independence when managing data. This suggests it is intended for practical day-to-day use by business teams rather than solely by technical specialists at a desktop workstation.

SASjs: DevOps for SAS

Underpinning much of the ecosystem is SASjs, described on its GitHub organisation page as "DevOps for SAS." It is designed to accelerate the development and deployment of solutions on all flavours of SAS, including Viya, EBI and Base. Everything in SASjs is MIT open-source and free for commercial use. The framework also explicitly underpins Data Controller for SAS, which connects the product and framework strands of the wider ecosystem. The GitHub organisation page notes that the SASjs project and its repositories are not affiliated with SAS Institute.

The resources page at sasjs.io lists the key GitHub repositories: the Macro Core library, the SASjs adapter for bidirectional SAS and JavaScript communication, the SASjs CLI, a minimal seed application and seed applications for React and Angular. Documentation sites cover the adapter, CLI, Macro Core library, SASjs Server and Data Controller. Useful external links from the same resources page include guides to building and deploying web applications with the SASjs CLI, scaffolding SAS projects with NPM and SASjs, extending Angular web applications on Viya and building a vanilla JavaScript application on SAS 9 or Viya. There is also mention of a Viya log parser, training resources, guides, FAQs and a glossary, pointing to an effort to support both implementation and adoption.

The SASjs CLI

The command-line tooling, documented at cli.sasjs.io, gives a clearer view of how SASjs approaches DevOps. The CLI is described as a Swiss-army knife with a flexible set of options and utilities for DevOps on SAS Viya, SAS 9 EBI and SASjs Server. Its core functions include creating a SAS Git repository in an opinionated way, compiling each service with all dependent macros, macro variables and pre- or post-code, building the master SAS deployment, deploying through local scripts and remote SAS programmes, running unit tests with coverage and generating a Doxygen documentation site with data lineage, homepage and project logo from the configuration file. There is also a feature for deploying a frontend as a streaming application, bypassing the need to access the SAS web server directly.

The full command set covers the project lifecycle. The CLI can add and authenticate targets, compile and build projects, deploy them to a SAS server location, generate documentation and manage contexts, folders and files. It can execute jobs, run arbitrary SAS code from the terminal, deploy a service pack and generate a snippets file for macro autocompletion in VS Code. It can also lint SAS code to identify common problems and run unit tests while collecting results in JSON or CSV format, together with logs. In effect, this brings SAS development considerably closer to the workflows commonly seen in mainstream software engineering, which may be especially valuable in organisations trying to standardise delivery practices across mixed technology estates.

Presentations and the Wider SAS Community

The slides.sasjs.io collection adds another dimension by showing that these ideas have been presented in conference and user group settings. Available decks cover DevOps for MSUG, SUGG and WUSS, SASjs for application development, SASjs Server, AF and AF/SCL modernisation, SASjs for PHUSE, testing and a legacy SAS apps presentation for FANS in January 2023. While slide decks alone do not prove adoption or outcomes, they do show a sustained effort to communicate methods and patterns to the broader SAS community, consistent with the open documentation and MIT licensing found throughout the ecosystem.

Building a Modern Layer Around an Established Platform

The most useful way to understand this ecosystem is not as a single product but as a layered approach. At one level, there are services for building and supporting applications. At another, there are packaged tools such as Data Controller and the AF/SCL Kit. Underneath both sits SASjs, providing open-source components and delivery practices intended to make SAS development more structured and scalable. The combination of bespoke SAS-powered HTML5 applications, governed data update tooling, AF/SCL migration support and open-source DevOps utilities points to a coherent effort to modernise how SAS is delivered and used, without severing ties to established platforms. SAS remains the analytical engine, but the interfaces, workflows and operational controls around it are updated to reflect current expectations in web application design, governance and DevOps practice.

Adding a dropdown calendar to the macOS desktop with Itsycal

26th January 2026

In Linux Mint, there is a dropdown calendar that can be used for some advance planning. On Windows, there is a pop-up one on the taskbar that is as useful. Neither of these possibilities is there on a default macOS desktop, and I missed the functionality. Thus, a search began.

That ended with my finding Itsycal, which does exactly what I need. Handily, it also integrates with the macOS Calendar app, though I use other places for my appointments. In some ways, that is more than I need. The dropdown pane with the ability to go back and forth through time suffices for me.

While it would be ideal if I could go year by year as well as month by month, which is the case on Linux Mint, I can manage with just the latter. Anything is better than having nothing at all. Sometimes, using more than one operating system broadens a mind.

Switching from uBlock Origin to AdGuard and Stylus

15th January 2026

A while back, uBlock Origin broke this website when I visited it. There was a long AI conversation that left me with the impression that the mix of macOS, Firefox and WordPress presented an edge case that could not be resolved. Thus, I went looking for alternatives because I may not be able to convince else to look into it, especially when the issue could be so niche.

One thing the uBlock Origin makes very easy is the custom blocking of web page elements, so that was one thing that I needed to replace. A partial solution comes in the form of the Stylus extension. Though the CSS rules may need to be defined manually after interrogating a web page structure, the same effects came be achieved. In truth, it is not as slick as using a GUI element selector, but I have learned to get past that.

For automatic ad blocking, I have turned to AdGuard AdBlocker. Thus far, it is doing what I need it to do. One thing to note is that does nothing to stop your registering in website visitor analytics, not that it bothers me at all. That was something that uBlock Origin does out of the box, while my new ad blocker sticks more narrowly to its chosen task, and that suffices for now.

In summary, I have altered my tooling for controlling what websites show me. It is all too easy for otherwise solid tools to register false positives and cause other obstructions. That is why I find myself swapping between them every so often; after all, website code can be far too variable.

Maybe it highlights how challenging it is to make ad blocking and other similar software when your test cases cannot be as extensive as they need to be. Add in something of an arms race between advertisers and ad blockers for the ante to be upped even more. It does not help when we want the things free of charge too.

Finding a better way to uninstall Mac applications

14th January 2026

If you were to consult an AI about uninstalling software under macOS, you would be given a list of commands to run in the Terminal. That feels far less slick than either Linux or Windows. Thus, I set to looking for a cleaner solution. It came in the form of AppCleaner from FreeMacSoft.

This finds the files to remove once you have supplied the name of the app that you wish to uninstall. Once you have reviewed those, you can set it to remove them to the recycling bin, after which they can be expunged from there. Handily, this automates the manual graft that otherwise would be needed.

It amazes me that such an operation is not handled within macOS itself, instead of leaving it to the software providers themselves, or third-party tools like this one. Otherwise, a Mac could get very messy, though Homebrew offers ways of managing software installations for certain cases. Surprisingly, the situation is more free-form than on iOS, too.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.