Microsoft | Technology Tales

A round-up of online portals for those seeking work

21^st June 2026

For me, much of 2025 was spent finding a new freelance work engagement. In time, that search successfully concluded, but not before I got flashbacks of how hard things were when seeking work after completing university education and deciding to hybridise my search to include permanent employment too. While fulfilling a new contract with a new client, I am compiling a listing of places on the web to a search for work, at least for future reference if nothing else.

Adzuna

Founded in 2011 by former executives from Gumtree, eBay and Zoopla, this UK-based job search engine aggregates listings from thousands of sites across 16+ countries with headquarters in London and approximately 100 employees worldwide. The platform offers over one million job advertisements in the UK alone and an estimated 350 million globally, attracting more than 10 million monthly visits. Jobseekers can use the service without cost, benefiting from search functionality, email alerts, salary insights and tools such as ValueMyCV and the AI-powered interview preparation tool Prepper. The company operates on a Cost-Per-Click or Cost-Per-Applicant model for employers seeking visibility, while also providing data and analytics APIs for programmatic advertising and labour market insights. Notably, the platform powers the UK government Number 10 Dashboard, with its dataset frequently utilised by the ONS for real-time vacancy tracking.

CV-Library

Founded in 2000 by Lee Biggins, this independent job board has grown to become one of the leading platforms in the UK job market. Based in Fleet, Hampshire, it maintains a substantial database of approximately 21.4 million CV's, with around 360,000 new or updated profiles added monthly. The platform attracts significant traffic with about 10.1 million monthly visits from 4.3 million unique users, facilitating roughly 3 million job applications each month across approximately 137,000 live vacancies. Jobseekers can access all services free of charge, including job searching, CV uploads, job alerts and application tracking, though the CV building tools are relatively basic compared to specialist alternatives. The platform boasts high customer satisfaction, with 96 percent of clients rating their service as good or excellent, and offers additional value through its network of over 800 partner job sites and ATS integration capabilities.

Eztrackr

This is a comprehensive job-hunt management tool that replaces traditional spreadsheets with an intuitive Kanban board interface, allowing users to organise their applications effectively. The platform features a Chrome extension that integrates with major job boards like LinkedIn and Indeed, enabling one-click saving of job listings. Users can track applications through various stages, store relevant documents and contact information, and access detailed statistics about their job search progress. The service offers artificial intelligence capabilities powered by GPT-4 to generate application responses, personalise cover letters and craft LinkedIn profiles. With over 25,000 active users who have tracked more than 280,000 job applications collectively, the tool provides both free and premium tiers. The basic free version includes unlimited tracking of applications, while the Pro subscription adds features such as custom columns, unlimited tags and expanded AI capabilities. This solution particularly benefits active jobseekers managing numerous applications across different platforms who desire structured organisation and data-driven insights into their job search.

Flexa

This organisation provides a specialised platform matching candidates with companies based on flexible working arrangements, including remote options, location independence and customisable hours. Their interface features a notable "Work From Anywhere" filter highlighting roles with genuine location flexibility, alongside transparency scores for companies that reflect their openness regarding working arrangements. The platform allows users to browse companies offering specific perks like part-time arrangements, sabbatical leave, or compressed hours, with rankings based on flexibility and workplace culture. While free to use with job-saving capabilities and quick matching processes, it appears relatively new with a modest-sized team, limited independent reviews and a smaller volume of job listings compared to more established competitors. The platform's distinctive approach prioritises work-life balance through values-driven matching and company-oriented filters, particularly useful for those seeking roles aligned with modern flexible working preferences.

FlexJobs

Founded in 2007 and based in Puerto Rico, FlexJobs operates as a subscription-based platform specialising in remote, hybrid, freelance and part-time employment opportunities. The service manually verifies all job listings to eliminate fraudulent postings, with staff dedicating over 200 hours daily to screening processes. Users gain access to positions across 105+ categories from entry-level to executive roles, alongside career development resources including webinars, resume reviews and skills assessments. Pricing options range from weekly trials to annual subscriptions with a 30-day money-back guarantee. While many users praise the platform for its legitimacy and comprehensive filtering tools, earning high ratings on review sites like Trustpilot, some individuals question whether the subscription fee provides sufficient value compared to free alternatives. Potential limitations include delayed posting of opportunities and varying representation across different industries.

Indeed

Founded in November 2004 and now operating in over 60 countries with 28 languages, this leading global job search platform serves approximately 390 million visitors monthly worldwide. In the UK alone, it attracts about 34 million monthly visits, with users spending nearly 7 minutes per session and viewing over 8.5 pages on average. The platform maintains more than 610 million jobseeker profiles globally while offering free services for candidates including job searching, application tools, CV uploads, company reviews and salary information. For employers, the business model includes pay-per-click and pay-per-applicant sponsored listings, alongside tools such as Hiring Insights providing salary data and application trends. Since October 2024, visibility for non-sponsored listings has decreased, requiring employers to invest in sponsorship for optimal visibility. Despite this competitive environment requiring strategic budget allocation, the platform remains highly popular due to its comprehensive features and extensive reach.

JobBoardSearch

A meta-directory founded in 2022 by Rodrigo Rocco, this platform aggregates and organises links to over 400 specialised and niche job sites across various industries and regions. Unlike traditional job boards, it does not host listings directly but serves as a discovery tool that redirects users to external platforms where actual applications take place. The service refreshes links approximately every 45 minutes and offers a weekly newsletter. While providing free access and efficient discovery of relevant boards by category or sector, potential users should note that the platform lacks direct job listings, built-in application tracking, or alert systems. It is particularly valuable for professionals exploring highly specialised fields, those wishing to expand beyond mainstream job boards and recruiters seeking to increase their visibility, though beginners might find navigating numerous destination boards somewhat overwhelming.

Jobrapido

Founded in Milan by Vito Lomele in 2006 (initially as Jobespresso), this global job aggregator operates in 58 countries and 21 languages. The platform collects between 28 and 35 million job listings monthly from various online sources, attracting approximately 55 million visits and serving over 100 million registered users. The service functions by gathering vacancies from career pages, agencies and job boards, then directing users to original postings when they search. For employers, it offers programmatic recruitment solutions using artificial intelligence and taxonomy to match roles with candidates dynamically, including pay-per-applicant models. While the platform benefits from its extensive global reach and substantial job inventory, its approach of redirecting to third-party sites means the quality and freshness of listings can vary considerably.

Jobserve

Founded in 1993 as Fax-Me Ltd and rebranded in 1995, this pioneering UK job board launched the world's first jobs-by-email service in May 1994. Originally dominating the IT recruitment sector with up to 80% market share in the early 2000s, the platform published approximately 200,000 jobs and processed over 1 million applications monthly by 2010. Currently headquartered in Colchester, Essex, the service maintains a global presence across Europe, North America and Australia, delivering over 1.2 million job-subscription emails daily. The platform employs a proprietary smart matching engine called Alchemy and features manual verification to ensure job quality. While free for jobseekers who can upload CVs and receive tailored job alerts, employers can post vacancies and run recruitment campaigns across various sectors. Although respected for its legacy and niche focus, particularly in technical recruitment, its scale and visibility are more modest compared to larger contemporary platforms.

Lifelancer

Founded in 2020 with headquarters in London, Lifelancer operates as an AI-powered talent hiring platform specialising in life sciences, pharmaceutical, biotech, healthcare IT and digital health sectors. The company connects organisations with freelance, remote and international professionals through services including candidate matching and global onboarding assistance. Despite being relatively small, Lifelancer provides distinct features for both hiring organisations and jobseekers. Employers can post positions tailored to specific healthcare and technology roles, utilising AI-based candidate sourcing, while professionals can create profiles to be matched with relevant opportunities. The platform handles compliance and payroll across multiple countries, making it particularly valuable for international teams, though as a young company, it may not yet offer the extensive talent pool of more established competitors in the industry.

The professional networking was core to my search for work and had its uses while doing so. Writing posts and articles did a lot to raise my profile along with reaching out to others, definitely an asset when assessing the state of a freelancing market. The usefulness of the green "Open to Work" banner is debatable because of my freelancing pitch in a slow market. Nevertheless, there was one headhunting approach that might have resulted in something if another offer had not gazumped it. Also, this is not a place to hang around over a weekend with job search moaning filling your feed, though making your interests known can change that. Now that I have paid work, the platform has become a way of keeping up to date in my line of business.

Monster

Established in 1994 as The Monster Board, Monster.com became one of the first online job portals, gaining prominence through memorable Super Bowl advertisements. As of June 2025, the platform attracts approximately 4.3 million monthly visits, primarily from the United States (76%), with smaller audiences in India (6%) and the UK (1.7%). The service offers free resources for jobseekers, including resume uploads and career guidance, while employers pay for job postings and additional premium features.

PharmiWeb

Established in 1999 and headquartered in Richmond, Surrey, PharmiWeb has evolved into Europe's leading pharmaceutical and life sciences platform. The company separated its dedicated job board as PharmiWeb.jobs in 2019, while maintaining industry news and insights on the original portal. With approximately 600,000 registered jobseekers globally and around 200,000 monthly site visits generating 40,000 applications, the platform hosts between 1,500 and 5,000 active vacancies at any time. Jobseekers can access the service completely free, uploading CVs and setting alerts tailored to specific fields, disciplines or locations. Additional recruiter services include CV database access, email marketing campaigns, employer branding and applicant management tools. The platform particularly excels for specialised pharmaceutical, biotech, clinical research and regulatory affairs roles, though its focused nature means it carries fewer listings than mainstream employment boards and commands higher posting costs.

Reed

If 2025 was a flashback to the travails of seeking work after completing university education, meeting this name again was another part of that. Founded in May 1960 by Sir Alec Reed, the firm began as a traditional recruitment agency in Hounslow, West London, before launching the first UK recruitment website in 1995. Today, the platform attracts approximately 3.7 million monthly visitors, primarily UK-based users aged 25-34, generating around 80,000 job applications daily. The service offers jobseekers free access to search and apply for roles, job alerts, CV storage, application tracking, career advice articles, a tax calculator, salary tools and online courses. For employers, the privately owned company provides job advertising, access to a database of 18-22 million candidate CVs and specialist recruitment across about 20 industry sectors.

Remote OK

Founded by digital nomad Pieter Levels in 2015, this prominent job board specialises exclusively in 100% remote positions across diverse sectors including tech, marketing, writing, design and customer support. The platform offers free browsing and application for jobseekers, while employers pay fees. Notable features include mandatory salary transparency, global job coverage with regional filtering options and a clean, minimalist interface that works well on mobile devices. Despite hosting over 100,000 remote jobs from reputable companies like Amazon and Microsoft, the platform has limitations including basic filtering capabilities and highly competitive application processes, particularly for tech roles. The simple user experience redirects applications directly to employer pages rather than using an internal system. For professionals seeking remote work worldwide, this board serves as a valuable resource but works best when used alongside other specialised platforms to maximise opportunities.

Remote.co

Founded in 2015 and based in Boulder, Colorado, this platform exclusively focuses on remote work opportunities across diverse industries such as marketing, finance, healthcare, customer support and design. Attracting over 1.5 million monthly visitors, it provides jobseekers with free access to various employment categories including full-time, part-time, freelance and hybrid positions. Beyond job listings, the platform offers a comprehensive resource centre featuring articles, expert insights and best practices from over 108 remote-first companies. Job alerts and weekly newsletters keep users informed about relevant opportunities. While the platform provides strong resources and maintains positive trust ratings of approximately 4.2/5 on Trustpilot, its filtering capabilities are relatively basic compared to competitors. Users might need to conduct additional research as company reviews are not included with job postings. Despite these limitations, the platform serves as a valuable resource for individuals seeking remote work guidance and opportunities.

Remotely

Formerly known as TryRemotely and Empllo, Remotely functions as a comprehensive job board specialising in remote technology and startup positions across various disciplines including engineering, product, sales, marketing, design and finance. The platform currently hosts over 30,000 active listings from approximately 24,000 hiring companies worldwide, with specific regional coverage including around 375 positions in the UK and 36 in Ireland. Among its notable features is the AI-powered Job Copilot tool, which can automatically apply to roles based on user preferences. While Empllo offers extensive listings and advanced filtering options by company, funding and skills, it does have limitations including inconsistent salary information and variable job quality. The service is free to browse, with account creation unlocking personalised features. It is particularly suitable for technology professionals seeking distributed work arrangements with startups, though users are advised to verify role details independently and potentially supplement their search with other platforms offering employer reviews for more thorough vetting.

Remotive

For jobseekers in the technology and digital sectors, Remotive serves as a specialised remote job board offering approximately 2,000 active positions on its free public platform. Founded around 2014-2015, this service operates with a remote-first approach and focuses on verifying job listings for legitimacy. The platform provides a premium tier called "Remotive Accelerator" which grants users access to over 50,000 additional curated jobs, advanced filtering options based on skills and salary requirements and membership to a private Slack community. While the interface receives praise for its clean design and intuitive navigation, user feedback regarding the paid tier remains mixed, with some individuals noting limitations such as inactive community features and an abundance of US-based or senior-level positions. The platform is particularly valuable for professionals in software development, product management, marketing and customer service who are seeking global remote opportunities.

Talent.com

Originally launched in Canada in 2011 as neuvoo, this global job search engine is now headquartered in Montreal, Quebec, providing access to over 30 million jobs across more than 75 countries. The platform attracts between 12 and 16 million monthly visits worldwide, with approximately 6 percent originating from the UK. Jobseekers can utilise the service without charge, accessing features like salary converters and tax calculators in certain regions to enhance transparency about potential earnings. Employers have the option to post jobs for free in some areas, with additional pay per click sponsored listings available to increase visibility. Despite its extensive coverage and useful tools, user feedback remains mixed, with numerous complaints on review sites regarding outdated listings, unwanted emails and difficulties managing or deleting accounts.

The Muse

Founded in 2011 and based in New York City, The Muse is an online platform that integrates job listings with career guidance, employer insights and coaching services to support individuals in making informed career decisions. It distinguishes itself by offering detailed employer profiles that include workplace culture, employee perspectives and company values, alongside editorial content on resume writing, interview techniques and career progression. While jobseekers can access core features for free, employers pay to advertise roles and create branded profiles, with additional revenue generated through premium coaching services. The platform appeals to graduates, early-career professionals and those seeking career transitions, prioritising alignment between personal values and workplace environments over simply aggregating job vacancies. Compared to larger job boards, it focuses on storytelling and career development resources, positioning itself as a tool for navigating modern employment trends such as flexible work and diversity initiatives.

Totaljobs

Founded in 1999, Totaljobs is a major UK job board currently owned by StepStone Group UK Ltd, a subsidiary of Axel Springer Digital Classifieds. The platform attracts approximately 20 million monthly visits and generates 4-5 million job applications each month, with over 300,000 daily visitors browsing through typically 280,000+ live job listings. As the flagship of a broader network including specialised boards such as Jobsite, CareerStructure and City Jobs, Totaljobs provides jobseekers with search functionality across various sectors, job alerts and career advice resources. For employers and recruiters, the platform offers pay-per-post job advertising, subscription options for CV database access and various employer tools.

We Work Remotely

Founded in 2011, this is one of the largest purely remote job boards globally, attracting approximately 6 million monthly visitors and featuring over 36,000 remote positions across various categories including programming, marketing, customer support and design. Based in Vancouver, the platform operates with a small remote-first team who vet listings to reduce spam and scams. Employers pay for each standard listing, while jobseekers access the service without charge. The interface is straightforward and categorised by functional area, earning trust from major companies like Google, Amazon and GitHub. However, the platform has limitations including basic filtering capabilities, a predominance of senior-level positions particularly in technology roles and occasional complaints about outdated or misleading posts. The service is most suitable for experienced professionals seeking genuine remote opportunities rather than those early in their careers. Some users report region-restricted application access and positions that offer lower compensation than expected for the required experience level.

Working Nomads

Founded in 2014, this job board provides remote work opportunities for digital nomads and professionals across various industries. The platform offers over 30,000 fully remote positions spanning sectors such as technology, marketing, writing, finance and education. Users can browse listings freely, but a Premium subscription grants access to additional jobs, enhanced filters and email alerts. The interface is user-friendly with fast-loading pages and straightforward filtering options. The service primarily features global employment opportunities suitable for location-independent workers. However, several limitations exist: many positions require senior-level experience, particularly in technical fields; the free tier displays only a subset of available listings; filtering capabilities are relatively basic; and job descriptions sometimes lack detail. The platform has received mixed reviews, earning approximately 3.4 out of 5 on Trustpilot, with users noting the prevalence of senior technical roles and questioning the value of the premium subscription. It is most beneficial for experienced professionals comfortable with remote work arrangements, while those seeking entry-level positions might find fewer suitable opportunities.

Docling, MarkItDown and Textract: Understanding the New Document-Processing Landscape

3^rd June 2026

The growing use of large language models has changed the way many organisations think about documents. Reports, manuals, protocols, spreadsheets, presentations and scanned PDFs are no longer just files to be opened by a person; they are also sources of knowledge that can feed search systems, retrieval-augmented generation workflows and internal knowledge bases.

This shift has brought renewed attention to a practical problem that long predates the current AI boom. Documents are often rich in structure, yet many extraction tools reduce them to plain text, and once headings, tables, figures, captions and reading order are discarded, the resulting output can become difficult for humans to review and even harder for an AI system to use reliably.

Docling, MarkItDown and Textract all sit in this space, but they approach the problem from different directions. Textract is rooted in general-purpose text extraction, MarkItDown focuses on producing Markdown for text-analysis workflows and Docling aims to build a richer understanding of document structure.

Why Document Conversion Matters for AI Systems

A PDF, Word document or spreadsheet may look orderly on screen, yet that order is not always easy to recover programmatically. A human reader can see that a line of text is a heading, that a table belongs to a section or that a caption describes a nearby figure, while a simple text extractor may see only a stream of characters.

That difference matters when documents are used with large language models. If a technical manual, financial report or policy document is flattened into undifferentiated text, a search system may retrieve the wrong passage or miss the relationship between a table and its surrounding explanation. Retrieval-augmented generation depends not only on having the right words in an index, but also on preserving enough context for those words to remain meaningful.

Markdown and structured JSON have therefore become important intermediate formats. Markdown is close to plain text, but it can still represent headings, tables, links and lists in a compact way. JSON can go further by encoding document hierarchy, page-level information and other metadata for downstream processing.

Docling and Document Understanding

Docling is an open-source Python toolkit designed to convert and understand documents for AI-oriented workflows. It was initially developed by IBM's AI for Knowledge team at IBM Research Zurich, open-sourced in July 2024 and is now hosted under the LF AI & Data Foundation (part of the Linux Foundation), following IBM's formal contribution of the project to the foundation on 29th April 2025. Its purpose is not merely to extract text, but to preserve structure and meaning in a form that can be used by search systems, knowledge extraction tools and retrieval-augmented generation pipelines.

A simple way to describe Docling is to place it between raw documents and a language model. Instead of treating a document as a block of text, it attempts to identify headings, document hierarchy, tables, figures, captions, formulas, code blocks, reading order and page layout information. This is particularly important for PDFs, where the visual appearance of a page can hide a complicated underlying structure.

Docling supports parsing for PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3 and image formats, as well as several application-specific XML schemas including USPTO patents, JATS articles and XBRL financial reports, making it relevant well beyond ordinary office files.

The project's outputs are also designed for modern AI pipelines. Docling can export to Markdown, HTML, WebVTT, DocTags and lossless JSON, while its internal DoclingDocument representation provides a unified model of the parsed material. This gives developers a way to move from a PDF or other source file to a structured representation that is easier to chunk, index and query.

Its PDF capabilities are among the main reasons it has drawn attention. The documentation highlights advanced PDF understanding, including page layout, reading order, table structure, code, formulas and image classification. It also includes extensive OCR support for scanned PDFs and images, support for several visual language models under the GraniteDocling name and audio support using automatic speech recognition models.

Docling is also designed to fit into the wider generative AI ecosystem. Its integrations include LangChain, LlamaIndex, CrewAI and Haystack for agentic AI, and it supports local execution for sensitive data and air-gapped environments. The ability to run entirely on local hardware is important where sending files to an external service is not acceptable.

A typical installation begins with pip install docling. A simple conversion in Python uses DocumentConverter from docling.document_converter, then calls converter.convert("study_report.pdf") and exports the result with result.document.export_to_markdown(). The resulting Markdown can then be used in a search index, vector database or language-model workflow.

Textract and Traditional Text Extraction

Textract, in this context, refers to the Python package named textract and not Amazon Textract, which is a separate, cloud-hosted service. Its philosophy is much simpler than Docling's: give it a file and receive extracted text. The package provides both a command-line interface (for example, textract path/to/file.extension) and a Python interface using textract.process("path/to/file.extension").

The Textract documentation describes the problem it addresses as one of recovering useful information from "dark data" embedded in Word documents, PowerPoint presentations, PDFs and other files, providing a single interface across many formats, which has long been useful for natural language processing and textual analysis. Its design is method-agnostic, meaning it wraps different tools and libraries depending on the file type.

Textract supports many formats through a variety of underlying systems. Its documentation mentions CSV, TSV, DOC, DOCX, EML, EPUB, GIF, JPG, JSON, HTML, MP3, MSG, ODT, OGG, PDF, PNG, PPTX, PostScript, RTF, TIFF, TXT, WAV, XLSX and XLS. DOC files may be processed through antiword, DOCX through python-docx2txt, images through tesseract-ocr, PDFs through pdftotext by default (or pdfminer.six) and audio through tools such as sox, SpeechRecognition and pocketsphinx.

This broad format support is useful, but Textract was not designed specifically for the current generation of RAG and LLM systems. Its goal is text extraction rather than document understanding, meaning it may retrieve text effectively from many sources, but it does not build a rich document model and has limited awareness of tables, layout, semantic sections or chunking for retrieval.

That distinction is central when comparing Textract with newer tools. Textract answers the question of what text is present in a file. Docling tries to answer what the document means structurally, which is a more demanding task when tables, appendices, captions and section hierarchies carry much of the meaning.

MarkItDown and the Rise of Markdown-First Conversion

MarkItDown is an open-source Microsoft tool for converting files and office documents to Markdown. It is published under an MIT licence and its stated purpose is to provide a lightweight Python utility for converting various files to Markdown for indexing, text analysis and related purposes. It is therefore closer to Textract than to a full document-understanding framework, but it places much more emphasis on preserving useful structure.

The tool's basic command-line use is straightforward. A document can be converted with a command such as markitdown path-to-file.pdf > document.md, or an output file can be specified with -o. In Python, the usual pattern is to import MarkItDown, create an instance and call md.convert("test.xlsx"), then read the result through result.text_content.

MarkItDown supports PDF, PowerPoint, Word, Excel, images (with EXIF metadata and OCR), audio (with EXIF metadata and speech transcription), HTML and various text-based formats such as CSV, JSON and XML. The exact capabilities depend on optional dependencies, which can be installed all at once with pip install 'markitdown[all]' or more selectively, such as pip install 'markitdown[pdf, docx, pptx]'.

The reason for focusing on Markdown is practical. Markdown is close to plain text, while still allowing structure to be represented through headings, tables, links and other simple conventions. The MarkItDown documentation notes that mainstream large language models such as GPT-4o appear to understand Markdown well and often produce it without being asked to do that, while Markdown conventions are also token-efficient.

MarkItDown is not intended as a high-fidelity document conversion tool for human publishing, and the project documentation makes clear that it is mainly intended for consumption by text-analysis tools. This makes it well suited to smaller knowledge-management projects, indexing workflows and pipelines where clean Markdown is more valuable than visual reproduction.

The project also supports optional integrations and extensions. Plugins are supported but disabled by default, and the documentation points developers to third-party plugins using the #markitdown-plugin tag. The markitdown-ocr plugin adds OCR support to PDF, DOCX, PPTX and XLSX converters by using LLM Vision through the same llm_client and llm_model pattern already used for image descriptions, while falling back to the standard converter if no client is provided.

MarkItDown also integrates with Microsoft cloud services for more advanced cases. Azure Document Intelligence can be used for conversion by providing an endpoint, while Azure Content Understanding offers higher-quality cloud extraction, structured field extraction through YAML front matter and multimodal support for documents, images, audio and video. The documentation notes that these cloud routes involve billable Azure API calls, so users can restrict which file types are sent through Content Understanding.

Its security guidance is also worth noting. MarkItDown performs I/O with the privileges of the current process, similar to open() or requests.get(). The project advises sanitising untrusted inputs and using the narrowest conversion function suitable for the task, such as convert_local(), convert_stream() or convert_response(), rather than the more permissive convert() when tighter control is needed.

Comparing the Three Approaches

The simplest comparison is based on the main question each tool is designed to answer. Textract asks what text is in a file, MarkItDown asks how to turn a file into useful Markdown, and Docling asks how to represent the structure of the document itself. These differences lead to different strengths, even where the tools appear to support similar file types.

For a quick extraction task, Textract remains a practical option. If the requirement is simply to read a DOCX file or pull text from a straightforward PDF, its single API can be convenient, particularly where plain text is enough, and the downstream process does not require reliable headings, tables or layout.

MarkItDown occupies a middle ground. It is lightweight, actively maintained by Microsoft and designed with LLM workflows in mind. It can produce Markdown that preserves more structure than plain text, making it useful for search, summarisation and note-taking systems without requiring the heavier processing associated with a document-understanding framework.

Docling is the strongest fit when the structure of a document is central to its meaning. Complex PDFs, detailed tables, heavily formatted reports, figure captions, formulas and multi-level document hierarchies are undoubtedly the kinds of material that can lose meaning when converted to plain text, and Docling's richer document representation and JSON export make it especially relevant for more demanding AI pipelines.

There is a cost to that additional capability, and Docling is a heavier and more complex tool than Textract. MarkItDown may be easier to adopt for smaller projects where Markdown output is the main requirement, while Docling becomes more attractive when accuracy of structure matters more than simplicity.

Where Document Structure Carries the Meaning

Some document types illustrate more clearly than others why structure matters as much as content. Legal contracts, engineering specifications, academic papers, financial filings and policy documents are all examples of material where meaning is often distributed across headings, tables, footnotes, appendices and cross-references. A plain-text extraction from any of these can be difficult to use reliably because it may blur the boundary between sections and collapse the relationship between a table and the text that explains it.

A Markdown conversion may preserve enough structure for many search and summarisation tasks. A richer representation such as Docling's can be more suitable when table structure, reading order and document hierarchy need to be retained for reliable retrieval or knowledge-base construction.

This does not mean that one tool replaces all others. A lightweight converter may be preferable for routine ingestion of simple office files, while a structured parser may be selected for scanned PDFs, complex reports or multisection reference documents. The appropriate choice depends on the nature of the source material and the level of structure required downstream.

Choosing Between Docling, MarkItDown and Textract

The current landscape reflects a broader shift from text extraction towards document understanding. Textract represents an older but still useful model, where the priority is to get plain text from many file types through a consistent interface. MarkItDown reflects the needs of LLM-era workflows by turning varied content into Markdown that is compact, readable and easier for language models to process.

Docling goes further by treating documents as structured objects rather than text containers. Its support for layout analysis, OCR, tables, reading order, formulas, figures, audio and specialised schemas makes it a more ambitious option for complex pipelines, and its ability to run locally also matters where privacy, security or regulatory constraints limit the use of external services.

For general users, the choice ultimately follows the complexity of the source material and what is expected of the output. Textract handles straightforward extraction well enough, while MarkItDown adds lightweight structure without much overhead, and Docling is the right tool when the document's own organisation needs to survive the conversion intact.

Online R programming books that are worth bookmarking

23^rd March 2026

As part of making content more useful following its reorganisation, numerous articles on the R statistical computing language have appeared on here. All of those have taken a more narrative form. With this collation of online books on the R language, I take a different approach. What you find below is a collection of links with associated descriptions. While narrative accounts can be very useful, there is something handy about running one's eye down a compilation as well. Many entries have a corresponding print edition, some of which are not cheap to buy, which makes me wonder about the economics of posting the content online as well, though it can help with getting feedback during book preparation.

Big Book of R

We start with this comprehensive collection of over 400 free and affordable resources related to the R programming language, organised into categories such as data science, statistics, machine learning and specific fields like economics and life sciences. In many ways, it is a superset of what you find below and complements this collection with many other finds. The fact that it is a living collection makes it even more useful.

R Programming for Data Science

Here is an introduction to the R programming language, focusing on its application in data science. It covers foundational topics such as installation, data manipulation, function writing, debugging and code optimisation, alongside advanced concepts like parallel computation and data analysis case studies. The text includes practical guidance on handling data structures, using packages such as {dplyr} and {readr} as well as working with dates, times and regular expressions. Additional sections address control structures, scoping rules and profiling techniques, while the author also discusses resources for staying updated through a podcast and accessing e-book versions for ongoing revisions.

Hands-On Programming with R

Designed for individuals with no prior coding experience, the book provides an introduction to programming in R while using practical examples to teach fundamental concepts such as data manipulation, function creation and the use of R's environment system. It is structured around hands-on projects, including simulations of weighted dice, playing cards and a slot machine, alongside explanations of core programming principles like objects, notation, loops and performance optimisation. Additional sections cover installation, package management, data handling and debugging techniques. While the book is written using RMarkdown and published under a Creative Commons licence, a physical edition is available through O’Reilly.

Advanced R

What you have here is one of several books written by Hadley Wickham. This one is published in its second edition as part of Chapman and Hall's R Series and is aimed primarily at R users who want to deepen their programming skills and understanding of the language, though it is also useful for programmers migrating from other languages. The book covers a broad range of topics organised into sections on foundations, functional programming, object-oriented programming, metaprogramming and techniques, with the latter including debugging, performance measurement and rewriting R code in C++.

Cookbook for R

Unlike Paul Teetor's separately published R Cookbook, the Cookbook for R was created by Winston Chang. It offers solutions to common tasks and problems in data analysis, covering topics such as basic operations, numbers, strings, formulas, data input and output, data manipulation, statistical analysis, graphs, scripts and functions, and tools for experiments.

R for Data Science

The second edition of R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund offers a structured approach to learning data science with R, covering essential skills such as data visualisation, transformation, import, programming and communication. Organised into chapters that explore workflows, data manipulation techniques and tools like Quarto for reproducible research, the book emphasises practical applications and best practices for handling data effectively.

R Graphics Cookbook

The R Graphics Cookbook, 2nd edition, offers a comprehensive guide to creating visualisations in R, structured into chapters that cover foundational skills such as installing and using packages, loading data from various formats and exploring datasets through basic plots. It progresses to detailed techniques for constructing bar graphs, line graphs, scatter plots and histograms, alongside methods for customising axes, annotations, themes and legends.

The book also addresses advanced topics like colour application, faceting data into subplots, generating specialised graphs such as network diagrams and heat maps and preparing data for visualisation through reshaping and summarising. Additional sections focus on refining graphical outputs for presentation, including exporting to different file formats and adjusting visual elements for clarity and aesthetics, while an appendix provides an overview of the {ggplot2} system.

R Markdown: The Definitive Guide

Published by Chapman & Hall/CRC, R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garrett Grolemund covers the R Markdown document format, which has been in use since 2012 and is built on the knitr and Pandoc tools. The format allows users to embed code within Markdown documents and compile the results into a range of output formats including PDF, HTML and Word. The guide covers a broad scope of practical applications, from creating presentations, dashboards, journal articles and books to building interactive applications and generating blogs, reflecting how the ecosystem has matured since the {rmarkdown} package was first released in 2014.

A key principle running throughout is that Markdown's deliberately limited feature set is a strength rather than a drawback, encouraging authors to focus on content rather than complex typesetting. Despite this simplicity, the format remains highly customisable through tools such as Pandoc templates, LaTeX and CSS. Documents produced in R Markdown are also notably portable, as their straightforward syntax makes conversion between output formats more reliable, and because results are generated dynamically from code rather than entered manually, they are far more reproducible than those produced through conventional copy-and-paste methods.

R Markdown Cookbook

The R Markdown Cookbook is a practical guide designed to help users enhance their ability to create dynamic documents by combining analysis and reporting. It covers essential topics such as installation, document structure, formatting options and output formats like LaTeX, HTML and Word, while also addressing advanced features such as customisations, chunk options and integration with other programming languages. The book provides step-by-step solutions to common tasks, drawing on examples from online resources and community discussions to offer clear, actionable advice for both new and experienced users seeking to improve their workflow and explore the full potential of R Markdown.

RMarkdown for Scientists

This book provides a practical guide to using R Markdown for scientists, developed from a three-hour workshop and designed to evolve as a living resource. It covers essential topics such as setting up R Markdown documents, integrating with RStudio for efficient workflows, exporting outputs to formats like PDF, HTML and Word, managing figures and tables with dynamic references and captions, incorporating mathematical equations, handling bibliographies with citations and style adjustments, troubleshooting common issues and exploring advanced R Markdown extensions.

bookdown: Authoring Books and Technical Documents with R Markdown

Here is a guide to using the {bookdown} package, which extends R Markdown to facilitate the creation of books and technical documents. It covers Markdown syntax, integration of R code, formatting options for HTML, LaTeX and e-book outputs and features such as cross-referencing, custom blocks and theming. The package supports both multipage and single-document outputs, and its applications extend beyond traditional books to include course materials, manuals and other structured content. The work includes practical examples, publishing workflows and details on customisation, alongside information about licensing and the availability of a printed version.

[blogdown]: Creating Websites with R Markdown

Though the authors note that some information may be outdated due to recent updates to Hugo and the {blogdown} package, and they direct readers to additional resources for the latest features and changes, this book still provides a guide to building static websites using R Markdown and the Hugo static site generator, emphasising the advantages of this approach for creating reproducible, portable content. It covers installation, configuration, deployment options such as Netlify and GitHub Pages, migration from platforms like WordPress and advanced topics including custom layouts and version control as well as practical examples, workflow recommendations and discussions on themes, content management and technical aspects of website development.

[pagedown]: Create Paged HTML Documents for Printing from R Markdown

The R package {pagedown} enables users to create paged HTML documents suitable for printing to PDF, using R Markdown combined with a JavaScript library called paged.js, that later of which implements W3C specifications for paged media. While tools like LaTeX and Microsoft Word have traditionally dominated PDF production, pagedown offers an alternative approach through HTML and CSS, supporting a range of document types including resumes, posters, business cards, letters, theses and journal articles.

Documents can be converted to PDF via Google Chrome, Microsoft Edge or Chromium, either manually or through the chrome_print() function, with additional support for server-based, CI/CD pipeline and Docker-based workflows. The package provides customisable CSS stylesheets, a CSS overriding mechanism for adjusting fonts and page properties, and various formatting features such as lists of tables and figures, abbreviations, footnotes, line numbering, page references, cover images, running headers, chapter prefixes and page breaks. Previewing paged documents requires a local or remote web server, and the layout is sensitive to browser zoom levels, with 100% zoom recommended for the most accurate output.

Dynamic Documents with R and knitr

Developed by Yihui Xie and inspired by the earlier {Sweave} package, {knitr} is an R package designed for dynamic report generation that consolidates the functionality of numerous other add-on packages into a single, cohesive tool. It supports multiple input languages, including R, Python and shell scripts, as well as multiple output markup languages such as LaTeX, HTML, Markdown, AsciiDoc and reStructuredText. The package operates on a principle of transparency, giving users full control over how input and output are handled, and runs R code in a manner consistent with how it would behave in a standard R terminal.

Among its notable features are built-in caching, automatic code formatting via the {formatR} package, support for more than 20 graphics devices and flexible options for managing plots within documents. It also allows advanced users to define custom hooks and regular expressions to extend and tailor its behaviour further. The package is affiliated with the Foundation for Open Access Statistics, a nonprofit organisation promoting free software, open access publishing and reproducible research in statistics.

Mastering Shiny

Mastering Shiny is a comprehensive guide to developing web applications using R, focusing on the Shiny framework designed for data scientists. It introduces core concepts such as user interface design, reactive programming and dynamic content generation, while also exploring advanced topics like performance optimisation, security and modular app development. The book covers practical applications across industries, from academic teaching tools to real-time analytics dashboards, and aims to equip readers with the skills to build scalable, maintainable applications. It includes detailed chapters on workflow, layout, visualisation and user interaction, alongside case studies and technical best practices.

Engineering Production-Grade Shiny Apps

This is aimed at developers and team managers who already possess a working knowledge of the Shiny framework for R and wish to advance beyond the basics toward building robust, production-ready applications. Rather than covering introductory Shiny concepts or post-deployment concerns, the book focuses on the intermediate ground between those two stages, addressing project management, workflow, code structure and optimisation.

It introduces the {golem} package as a central framework and guides readers through a five-step workflow covering design, prototyping, building, strengthening and deployment, with additional chapters on optimisation techniques including R code performance, JavaScript integration and CSS. The book is structured to serve both those with project management responsibilities and those focused on technical development, acknowledging that in many small teams these roles are carried out by the same individual.

Outstanding User Interfaces with Shiny

Written by David Granjon and published in 2022, Outstanding User Interfaces with Shiny is a book aimed at filling the gap between beginner and advanced Shiny developers, covering how to deeply customise and enhance Shiny applications to the point where they become indistinguishable from classic web applications. The book spans a wide range of topics, including working with HTML and CSS, integrating JavaScript, building Bootstrap dashboard templates, mobile development and the use of React, providing a comprehensive resource that consolidates knowledge and experience previously scattered across the Shiny developer community.

R Packages

Now in its second edition, R Packages by Hadley Wickham and Jennifer Bryan is a freely available online guide that teaches readers how to develop packages in R. A package is the core unit of shareable and reproducible R code, typically comprising reusable functions, documentation explaining how to use them and sample data. The book guides readers through the entire process of package development, covering areas such as package structure, metadata, dependencies, testing, documentation and distribution, including how to release a package to CRAN. The authors encourage a gradual approach, noting that an imperfect first version is perfectly acceptable provided each subsequent version improves on the last.

Mastering Spark with R

Written by Javier Luraschi, Kevin Kuo and Edgar Ruiz, Mastering Spark with R is a comprehensive guide designed to take readers from little or no familiarity with Apache Spark or R through to proficiency in large-scale data science. The book covers a broad range of topics, including data analysis, modelling, pipelines, cluster management, connections, data handling, performance tuning, extensions, distributed computing, streaming and contributing to the Spark ecosystem.

Happy Git and GitHub for the useR

Here is a practical guide written by Jenny Bryan and contributors, aimed primarily at R users involved in data analysis or package development. It covers the installation and configuration of Git alongside GitHub, the development of key workflows for common tasks and the integration of these tools into day-to-day work with R and R Markdown. The guide is structured to take readers from initial setup through to more advanced daily workflows, with particular attention paid to how Git and GitHub serve the needs of data science rather than pure software development.

JavaScript for R

Written by John Coene and intended for release as part of the CRC Press R series, JavaScript for R explore how the R programming language and JavaScript can be used together to enhance data science workflows. Rather than teaching JavaScript as a standalone language, the book demonstrates how a limited working knowledge of it can meaningfully extend what R developers can achieve, particularly through the integration of external JavaScript libraries.

The book covers a broad range of topics, progressing from foundational concepts through to data visualisation using the {htmlwidgets} package, bidirectional communication with Shiny, JavaScript-powered computations via the V8 engine and Node.js and the use of modern JavaScript tools such as Vue, React and webpack alongside R. Practical examples are woven throughout, including the building of interactive visualisations, custom Shiny inputs and outputs, image classification and machine learning operations, with all accompanying code made publicly available on GitHub.

HTTP Testing in R

This guide addresses challenges faced by developers of R packages that interact with web resources, offering strategies to create reliable unit tests despite dependencies on internet connectivity, authentication and external service availability. It explores tools such as {vcr}, {webmockr}, {httptest} and {webfakes}, which enable mocking and recording HTTP requests to ensure consistent testing environments, reduce reliance on live data and improve test reliability. The text also covers advanced topics like handling errors, securing tests and ensuring compatibility with CRAN and Bioconductor, while emphasising best practices for maintaining test robustness and contributor-friendly workflows. Funded by rOpenSci and the R Consortium, the resource aims to support developers in building more resilient and maintainable R packages through structured testing approaches.

The Shiny AWS Book

The Shiny AWS Book is an online resource designed to teach data scientists how to deploy, host and maintain Shiny web applications using cloud infrastructure. Addressing a common gap in data science education, it guides readers through a range of DevOps technologies including AWS, Docker, Git, NGINX and open-source Shiny Server, covering everything from server setup and cost management to networking, security and custom configuration.

{ggplot2}: Elegant Graphics for Data Analysis

The third edition of {ggplot2}: Elegant Graphics for Data Analysis provides an in-depth exploration of the Grammar of Graphics framework, focusing on the theoretical foundations and detailed implementation of the ggplot2 package rather than offering step-by-step instructions for specific visualisations. Written by Hadley Wickham, Danielle Navarro and Thomas Lin Pedersen, the book is presented as an online work-in-progress, with content structured across sections such as layers, scales, coordinate systems and advanced programming topics. It aims to equip readers with the knowledge to customise plots according to their needs, rather than serving as a direct guide for creating predefined graphics.

YaRrr! The Pirate’s Guide to R

Written by Nathaniel D. Phillips, this is a beginner-oriented guide to learning the R programming language from the ground up, covering everything from installation and basic navigation of the RStudio environment through to more advanced topics such as data manipulation, statistical analysis and custom function writing. The guide progresses logically through foundational concepts including scalars, vectors, matrices and dataframes before moving into practical areas such as hypothesis testing, regression, ANOVA and Bayesian statistics. Visualisation is given considerable attention across dedicated chapters on plotting, while later sections address loops, debugging and managing data from a variety of file formats. Each chapter includes practical exercises to reinforce learning, and the book concludes with a solutions section for reference.

Data Visualisation: A Practical Introduction

Data Visualisation: A Practical Introduction is a forthcoming second edition from Princeton University Press, written by Kieran Healy and due for release in March 2026, which teaches readers how to explore, understand and present data using the R programming language and the {ggplot2} library. The book aims to bridge the gap between works that discuss visualisation principles without teaching the underlying tools and those that provide code recipes without explaining the reasoning behind them, instead combining both practical instruction and conceptual grounding.

Revised and updated throughout to reflect developments in R and {ggplot2}, the second edition places greater emphasis on data wrangling, introduces updated and new datasets, and substantially rewrites several chapters, particularly those covering statistical models and map-drawing. Readers are guided through building plots progressively, from basic scatter plots to complex layered graphics, with the expectation that by the end they will be able to reproduce nearly every figure in the book and understand the principles that inform each choice.

The book also addresses the growing role of large language models in coding workflows, arguing that genuine understanding of what one is doing remains essential regardless of the tools available. It is suitable for complete beginners, those with some prior R experience, and instructors looking for a course companion, and requires the installation of R, RStudio and a number of supporting packages before work can begin.

Learning R for Data Analysis: Going from the basics to professional practice

22^nd March 2026

R has grown from a specialist statistical language into one of the most widely recognised tools for working with data. Across tutorials, community sites, training platforms and industry resources, it is presented as both a programming language and a software environment for statistical computing, graphics and reporting. It was created by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand, and its name draws on the first letter of their first names while also alluding to the Bell Labs language S. It is freely available under the GNU General Public Licence and runs on Linux, Windows and macOS, which has helped it spread across research, education and industry alike.

What Makes R Distinctive

What makes R notable is its combination of programming features with a strong focus on data analysis. Introductory material, such as the tutorials at Tutorialspoint and Datamentor, repeatedly highlights its support for conditionals, loops, user-defined recursive functions and input and output, but these sit alongside effective data handling, a broad set of operators for arrays, lists, vectors and matrices and strong graphical capabilities. That mixture means R can be used for straightforward scripts and for complex analytical workflows. A beginner may start by printing "Hello, World!" with the print() function, while a more experienced user may move on to regression models, interactive dashboards or automated reporting.

The Learning Progression

Learning materials generally present R in a structured progression. A beginner is first introduced to reserved words, variables and constants, operators and the order in which expressions are evaluated. From there, the path usually moves into flow control through if…else, ifelse(), for, while, repeat and the use of break and next, before functions follow naturally, including return values, environments and scope, recursive functions, infix operators and switch(). Most sources agree that confidence with the syntax and fundamentals is the real starting point, and this early sequence matters because it helps learners become comfortable reading and writing R rather than only copying examples.

After the basics, attention tends to turn to the structures that make R so useful for data work. Vectors, matrices, lists, data frames and factors appear in nearly every introductory course because they are central to how information is stored and manipulated. Object-oriented concepts also emerge quite early in some routes through the language, with classes and objects extending into S3, S4 and reference classes. For someone coming from spreadsheets or point-and-click statistical software, this shift can feel significant, but it also opens the way to more reproducible and flexible analysis.

Visualisation

Visualisation is another recurring theme in R education. Basic chart types such as bar plots, histograms, pie charts, box plots and strip charts are common early examples because they show how quickly data can be turned into graphics. More advanced lessons widen the scope through plot functions, multiple plots, saving graphics, colour selection and the production of 3D plots.

Beyond base plotting, there is extensive evidence of the central role of {ggplot2} in contemporary R practice. Data Cornering demonstrates this well, with articles covering how to create funnel charts in R using {ggplot2} and how to diversify stacked column chart data label colours, showing how R is used not only to summarise data but also to tell visual stories more clearly. In the pharmaceutical and clinical research space, the PSI VIS-SIG blog is published by the PSI Visualisation Special Interest Group and summarises its monthly Wonderful Wednesday webinars, presenting real-world datasets and community-contributed chart improvements alongside news from the group.

Data Wrangling and the Tidyverse

Much of modern R work is built around data wrangling, and here the {tidyverse} has become especially prominent. Claudia A. Engel's openly published guide Data Wrangling with R (last updated 3rd November 2023) sets out a preparation phase that assumes some basic R knowledge, a recent installation of R and RStudio and the installation of the {tidyverse} package with install.packages("tidyverse") followed by library(tidyverse). It also recommends creating a dedicated RStudio project and downloading CSV files into a data subdirectory, reinforcing the importance of organised project structure.

That same guide then moves through data manipulation with {dplyr}, covering selecting columns and filtering rows, pipes, adding new columns, split-apply-combine, tallying and joining two tables, before moving on to {tidyr} topics such as long and wide table formats, pivot_wider, pivot_longer and exporting data. These topics reflect a broader pattern in the R ecosystem because data import and export, reshaping, combining tables and counting by group recur across teaching resources as they mirror common analytical tasks.

Applications and Professional Use

The range of applications attached to R is wide, though data science remains the clearest centre of gravity. Educational sources describe R as valuable for data wrangling, visualisation and analysis, often pointing to packages such as {dplyr}, {tidyr}, {ggplot2} and {Shiny}. Statistical modelling is another major strand, with R offering extensible techniques for descriptive and inferential statistics, regression analysis, time series methods and classical tests. Machine learning appears as a further area of growth, supported by a large and expanding package ecosystem. In more advanced contexts, R is also linked with dashboards, web applications, report generation and publishing systems such as Quarto and R Markdown.

R's place in professional settings is underscored by the breadth of organisations and sectors associated with it. Introductory resources mention companies such as Google, Microsoft, Facebook, ANZ Bank, Ford and The New York Times as examples of organisations using R for modelling, forecasting, analysis and visualisation. The NHS-R Community promotes the use of R and open analytics in health and care, building a community of practice for data analysis and data science using open-source software in the NHS and wider UK health and care system. Its resources include reports, blogs, webinars and workshops, books, videos and R packages, with webinar materials archived in a publicly accessible GitHub repository. The R Validation Hub, supported through the pharmaR initiative, is a collaboration to support the adoption of R within a biopharmaceutical regulatory setting and provides tools including the {riskmetric} package, the {riskassessment} app and the {riskscore} package for assessing package quality and risk.

The Wider Ecosystem

The wider ecosystem around R is unusually rich. The R Consortium promotes the growth and development of the R language and its ecosystem by supporting technical and social infrastructure, fostering community engagement and driving industry adoption. It notes that the R language supports over two million users and has been adopted in industries including biotech, finance, research and high technology. Community growth is visible not only through organisations and conferences but through user groups, scholarships, project working groups and local meetups, which matters because learning a language is easier when there is an active support network around it.

Another sign of maturity is the depth of R's package and publication landscape. rdrr.io provides a comprehensive index of over 29,000 CRAN packages alongside more than 2,100 Bioconductor packages, over 2,200 R-Forge packages and more than 76,000 GitHub packages, making it possible to search for packages, functions, documentation and source code in one place. Rdocumentation, powered by DataCamp, covers 32,130 packages across CRAN and Bioconductor and offers a searchable interface for function-level documentation. The Journal of Statistical Software adds a scholarly dimension, publishing open-access articles on statistical computing software together with source code, with full reproducibility mandatory for publication. R-bloggers aggregates R news and tutorials contributed by hundreds of R bloggers, while R Weekly curates a community digest and an accompanying podcast, both helping users keep pace with the steady flow of tutorials, package releases, blog posts and developments across the R world.

Where to Begin

For beginners, one recurring challenge is knowing where to start, and different learning routes reflect different backgrounds. Datamentor points learners towards step-by-step tutorials covering popular topics such as R operators, if...else statements, data frames, lists and histograms, progressing through to more advanced material. R for the Rest of Us offers a staged path through three core courses, Getting Started With R, Fundamentals of R and Going Deeper with R, and extends into nine topics courses covering Git and GitHub, making beautiful tables, mapping, graphics, data cleaning, inferential statistics, package development, reproducibility and interactive dashboards with {Shiny}. The site is explicitly designed for people who may never have coded before and also offers the structured R in 3 Months programme alongside training and consulting. RStudio Education (now part of Posit) outlines six distinct ways to begin learning R, covering installation, a free introductory webinar on tidy statistics, the book R for Data Science, browser-based primers, and further options suited to different learning styles, along with guidance on R Markdown and good project practices.

Despite the variety, the underlying advice is consistent: start by learning the basics well enough to read and write simple code, practise regularly beginning with straightforward exercises and gradually take on more complex tasks, then build projects that matter to you because projects create context and make concepts stick. There is no suggestion that mastery comes from passively reading documentation alone, as practical engagement is treated as essential throughout. The blog Stats and R exemplifies this philosophy well, with the stated aim of making statistics accessible to everyone by sharing, explaining and illustrating statistical concepts and, where appropriate, applying them in R.

That practical engagement can take many forms. Someone interested in data journalism may focus on visualisation and reproducible reporting, while a researcher may prioritise statistical modelling and publishing workflows, and a health analyst may use R for quality assurance, open health data and clinical reporting. Others may work with {Shiny}, package development, machine learning, Git and GitHub or interactive dashboards. The variety shows that R is not confined to a single use case, even if statistics and data science remain the common thread.

Free Learning Resources for R

It is also worth noting that R learning is supported by a great deal of freely available material. Statistics Globe, founded in 2017 by Joachim Schork and now an education and consulting platform, offers more than 3,000 free tutorials and over 1,000 video tutorials on YouTube, spanning R programming, Python and statistical methodology. STHDA (Statistical Tools for High-Throughput Data Analysis) covers basics, data import and export, reshaping, manipulation and visualisation, with material geared towards practical data analysis at every level. Community sites, webinar repositories and newsletters add further layers of accessibility, and even where paid courses exist, the surrounding free ecosystem is substantial.

Taken together, these sources present R as far more than a niche programming language. It is a mature open-source environment with a strong statistical heritage, a practical orientation towards data work and a well-developed community of learners, teachers, developers and organisations. Its core concepts are approachable enough for beginners, yet its package ecosystem and publishing culture support highly specialised and advanced work. For anyone looking to enter data analysis, statistics, visualisation or related areas, R offers a route that begins with simple code and can extend into large-scale analytical workflows.

Moving from 32-Bit to 64-Bit SAS on Windows

15^th March 2026

Moving from 32-bit SAS on Microsoft Windows to a 64-bit environment can look deceptively straightforward from the outside. The operating system is still Windows, programmes often run without alteration, and many data sets open just as expected. Beneath that continuity, however, sit several technical differences that matter considerably in practice, especially for organisations with long-lived code, established format libraries and regular exchanges with Microsoft Office files.

What makes this transition particularly awkward is that SAS treats some of these changes as more than a simple in-place upgrade. As Jacques Thibault notes in his PharmaSUG 2012 paper, a new operating system will often be accompanied by a new version of surrounding applications, and what matters most is ensuring sufficient time and resources to fully test existing programmes under the new environment before committing to the change. SAS file types are not uniformly portable across the 32-bit to 64-bit boundary, and support behaviour also differs by SAS release, with SAS 9.3 marking the point at which some earlier friction was meaningfully reduced. As of 2025, the current release of the SAS 9 line is SAS 9.4 Maintenance 9 (M9), and organisations running any SAS 9.4 release benefit from the data-set interoperability improvements first introduced in SAS 9.3, whilst the catalog and Office-integration issues described in this article remain relevant across all SAS 9.x environments.

Data Sets and Catalogs: A Fundamental Distinction

The broadest distinction is between SAS data sets and SAS catalogs. Data sets are generally more forgiving, while catalogs are not. SAS Usage Note 38339 explains that when upgrading from 32-bit to 64-bit Windows SAS in releases earlier than SAS 9.3, Cross-Environment Data Access (CEDA) is invoked to access 32-bit SAS data sets. CEDA allows the file to be read without immediate conversion, though it can impose restrictions and may reduce performance. The same note states directly that 64-bit SAS provides no access to 32-bit catalogs at all.

That distinction sits at the centre of most migration problems, and it is the reason a move that feels routine can catch teams off guard when they first encounter the ERROR: CATALOG was created for a different operating system message. As Chris Hemedinger explains in a post on The SAS Dummy, the move from 32-bit SAS for Windows to 64-bit SAS for Windows is, for all intents and purposes, a platform change from SAS's perspective, even though only the bit architecture has changed, and SAS catalogs are not portable across platforms.

How SAS Handles Data Sets Across the Boundary

For data sets, the picture is comparatively manageable. If a 32-bit SAS data set is opened in a 64-bit SAS session in releases before SAS 9.3, SAS writes a note to the log stating that the file is native to another host or that its encoding differs from the current session encoding, and that Cross-Environment Data Access will be used, which might require additional CPU resources and might reduce performance. This is SAS performing translation work in the background, and whilst useful for continued access, it is not always ideal for regular production use.

There is an important nuance that changes things significantly with SAS 9.3. In 32-bit SAS on Windows, the data representation is WINDOWS_32, whilst in 64-bit SAS on Windows it is WINDOWS_64. Hemedinger notes that in SAS 9.3 the developers taught SAS for Windows to bypass the CEDA layer when the only encoding difference is WINDOWS_32 versus WINDOWS_64. SAS Knowledge Base article 38379 confirms this, stating that from SAS 9.3 onwards, Windows 32-bit data sets can be read, written and updated in Windows 64-bit SAS, and vice versa, as a result of a change in how SAS determines file compatibility at open time. Users on SAS 9.3 and later, including all SAS 9.4 maintenance releases, may therefore see fewer warnings and less friction with ordinary data sets originating in 32-bit Windows SAS.

Converting Data Sets to Native 64-Bit Format

Even with those SAS 9.3 improvements, many organisations prefer to convert files into the native 64-bit format rather than rely indefinitely on cross-environment access. For entire libraries, PROC MIGRATE is the recommended mechanism. SAS Usage Note 38339 notes that for releases preceding SAS 9.3, PROC MIGRATE can migrate 32-bit SAS data sets to 64-bit, changing their format so that CEDA is no longer required.

The advantages of PROC MIGRATE over the older conversion procedures are set out in detail by Diane Olson and David Wiehle of SAS Institute in their paper hosted by the University of Delaware. Unlike PROC COPY, PROC MIGRATE retains deleted observations, migrates audit trails, preserves all integrity constraints and automatically retains created and last-modified date/times, compression, encryption, indexes and passwords from the source library. It is designed to produce members in the target library that differ from the source only in being in the new SAS format.

When the task concerns individual SAS data files rather than a whole library, SAS Usage Note 38339 points to PROC COPY with the NOCLONE option. Used in a 64-bit SAS session, this copies a 32-bit Windows data set into a new file that is native to the 64-bit environment. The NOCLONE option prevents SAS from cloning the original data representation during the copy, so that the resulting file is written in the target environment's native format and CEDA is no longer needed to process it. Thibault's PharmaSUG paper illustrates this with an example using PROC COPY with the NOCLONE option together with an OUTREP setting on the target LIBNAME statement to force creation in the desired representation.

Catalogs: The Hard Problem

Catalogs are a different matter entirely. If a user running 64-bit SAS attempts to open a catalog created in a 32-bit SAS session, the familiar error appears: ERROR: CATALOG was created for a different operating system. In the case of format catalogs, a related message often reads ERROR: File LIBRARY.FORMATS.CATALOG was created for a different operating system, and this is frequently followed by failures to use user-defined formats attached to variables. As the SSCC guidance from the University of Wisconsin-Madison notes, this can prevent 64-bit SAS from reading the data set at all, with the error about formats recorded only in the log whilst the visible symptom is simply that the table did not open.

This matters because catalogs are machine-dependent. User-defined formats created by PROC FORMAT are usually stored in catalogs, often in a member named FORMATS. If those formats were built in 32-bit SAS, 64-bit SAS cannot use the catalog directly, and this affects not only explicit formatting in code but also routine data viewing because a data set linked to permanent user-defined formats may fail to display properly unless the associated format catalog is converted.

Options for Migrating Format Catalogs

There are several ways to address catalog incompatibility. If the original PROC FORMAT source code still exists, the cleanest option is simply to rerun it under 64-bit SAS, producing a fresh native catalog. The SSCC guidance treats this as the easiest solution that preserves the formats themselves, and it also describes a short-term workaround: adding a bare format female; statement to the DATA or PROC step, which removes the custom format from that variable, so there is no need to read the problem catalog file at all.

When source code is not available, transport-based conversion is the answer. In a 32-bit SAS session, PROC CPORT creates a transport file from the catalog library, and in a 64-bit SAS session, PROC CIMPORT recreates the catalog in the new environment. SAS Knowledge Base article KB0041614 provides sample code that creates a transport file in 32-bit SAS using proc cport lib=my32 file=trans memtype=catalog; select formats; and then unloads it in 64-bit SAS using PROC CIMPORT, after which a new Formats.sas7bcat file should be present in the target library. The same article notes that if access to a 32-bit SAS session is simply not available, the system option NOFMTERR can be submitted as a last resort: this allows the underlying data values to be displayed whilst user-defined formats are ignored, avoiding the error without converting the catalog.

A more robust route for user-defined formats is to avoid moving the catalog as a catalog at all. PROC FORMAT can write format definitions to a standard SAS data set using CNTLOUT, and later rebuild them from that data set using CNTLIN. Because SAS data sets are generally portable across the 32-bit to 64-bit boundary, this method sidesteps the catalog incompatibility directly. KB0041614 describes CNTLOUT/CNTLIN as the most robust method available for migrating user-defined format libraries. Karin LaPann, writing in a poster presented at a meeting of the Philadelphia Area SAS Users Group, reaches the same conclusion and recommends always creating data sets from format catalogs and storing them alongside the data in the same library as a matter of good practice.

Caveats: Item Stores, Compiled Macros and the PIPE Engine

SAS Usage Note 38339 explicitly states that stored compiled macro catalogs are not supported by PROC CPORT and must be recompiled in the new operating environment, with SAS Note 46846 covering compatibility guidance for those files specifically. The note also warns that the 32-bit version of SAS should not be removed until it can be verified that all 32-bit catalogs have been successfully migrated.

Thibault's PharmaSUG paper identifies two further file types that require attention. SAS Item Store files (.sas7bitm), which organisations may use to store standard PROC TEMPLATE output templates, are not compatible across 32-bit and 64-bit environments, and the practical solution is to recreate them under the new environment using the same programme that created them originally, targeting a different output directory to avoid a mixed 32-bit and 64-bit directory. Thibault also notes that programmes using the PIPE engine may produce errors on Windows 64-bit environments, and recommends replacing such code with newer SAS functions such as filename, dopen and dread to avoid the issue altogether. These are not universal blockers, but they underline why testing is essential rather than assumed.

Microsoft Office Integration After the Move

Another area where 64-bit moves catch users out is access to Microsoft Excel and Access files. The issue is not SAS data compatibility but the bit-ness of the Microsoft data providers. In 64-bit SAS for Windows, attempts to use PROC IMPORT with DBMS=EXCEL, PROC EXPORT with Excel or Access options, or LIBNAME EXCEL can fail with errors such as ERROR: Connect: Class not registered or Connection Failed. As Hemedinger explains, the cause is that the 64-bit SAS process cannot use the built-in data providers for Microsoft Excel or Microsoft Access, which are usually 32-bit modules. Thibault's paper confirms that installation of the PC Files Server on the same machine will be required, since the required 32-bit ODBC drivers are incompatible with 64-bit SAS on Windows.

The workarounds depend on the file type and local setup. SAS/ACCESS to PC Files provides methods such as DBMS=EXCELCS, DBMS=ACCESSCS and LIBNAME PCFILES, all of which use the PC Files Server as an intermediary, with an autostart feature that minimises configuration changes to existing SAS programmes. For .xlsx files, DBMS=XLSX removes the Microsoft data providers from the equation entirely and requires no additional setup from SAS 9.3 Maintenance 1 onwards. Installing 64-bit Microsoft Office may appear to solve the bit-ness mismatch by supplying 64-bit providers, but as Hemedinger cautions, Microsoft recommends the 64-bit version of Office in only a few circumstances, and that route can introduce other incompatibilities with how Office applications are used.

Identifying 32-Bit Catalogs in a Mixed Environment

In mixed environments, a practical challenge is identifying which catalogs are still 32-bit and which are already 64-bit. This was precisely the problem Michael Raithel posed on LinkedIn in March 2015, after finding that no SAS facility, whether PROC CATALOG, PROC CONTENTS, PROC DATASETS or the Dictionary Tables, provided a direct way to distinguish them. His solution treats the .sas7bcat file as a flat file rather than a catalog, reading the first record and searching for the character strings W32_7PRO (identifying a 32-bit catalog) and X64_7PRO (identifying a 64-bit catalog). The macro he developed can be run against any number of catalogs and builds a SAS data set recording the bit-ness and full path of each file, making large-scale inventory automation entirely practical during a phased transition.

For broader validation work, the Olson and Wiehle paper pairs PROC MIGRATE with macros based on PROC CONTENTS, PROC DATASETS and PROC COMPARE, documenting what existed in the source library before migration and verifying what exists in the target library afterwards. For highly regulated or large-scale environments, that kind of structured checking is not optional.

Navigating the Transition Without Unnecessary Disruption

The main lesson from all of this is that moving from 32-bit to 64-bit SAS on Windows is not simply a matter of reinstalling software and carrying on unchanged. Much will work as before, particularly with ordinary data sets and particularly in SAS 9.3 and later. Catalogs, format libraries, item stores and Microsoft Office integration, however, require deliberate attention.

The transition is not so much problematic as predictable. Keeping 32-bit SAS available until catalog migration is confirmed, using PROC MIGRATE for full libraries, using PROC COPY with NOCLONE for individual data sets, converting format catalogs via CPORT/CIMPORT or CNTLOUT/CNTLIN, recreating item stores and compiled macros in the new environment and testing Office-related workflows and PIPE based code before deployment together form a sound path through the process. With that preparation in place, the advantages of a 64-bit environment can be gained without avoidable disruption.

Security is a behaviour, not a tick-box

11^th February 2026

Cybersecurity is often discussed in terms of controls and compliance, yet most security failures begin and end with human action. A growing body of practice now places behaviour at the centre, drawing on psychology, neuroscience, history and economics to help people replace old habits with new ones. George Finney's Well Aware Security have built its entire approach around this idea, reframing awareness training as a driver of measurable outcomes rather than a box-ticking exercise, with coaches helping colleagues identify and build upon their existing strengths. It is also personal by design, using insights about how minds work to guide change one habit at a time rather than expecting wholesale transformation overnight.

This emphasis on behaviour is not a dismissal of technical skill so much as a reminder that skill alone is insufficient. Security is not a competency you either possess or lack; it is a behaviour that can be learned, reinforced and normalised. As social beings, we have always gathered for mutual protection, meaning the desire to contribute to collective security is already present in most people. Turning that impulse into daily action requires structure and patience, and it thrives when a supportive culture takes root.

Culture matters because norms are powerful. In a team where speed and convenience consistently override prudence, individuals who try to do the right thing can feel isolated. Conversely, when an organisation embraces cybersecurity at every level, a small group can create sufficient leverage to shift practices across the whole business. Research has found that organisations with below-average culture ratings are significantly more likely to experience a data breach than their peers, and controls alone cannot close that gap when behaviours are pulling in the opposite direction.

This is why advocates of habit-based security speak of changing one step at a time, celebrating progress and maintaining momentum. The same thinking underpins practical measures at home and at work, where small changes in how devices and data are managed can reduce risk materially without making technology difficult to use.

Network-Wide Blocking with Pi-hole

One concrete example of this approach is network-wide blocking of advertising and tracking domains using a DNS sinkhole. Pi-hole has become popular because it protects all devices on a network without requiring any client-side software to be installed on each one. It runs lightly on Linux, blocks content outside the browser (such as within mobile apps and smart TVs) and can optionally act as a DHCP server so that new devices are protected automatically upon joining the network.

Pi-hole's web dashboard surfaces insights into DNS queries and blocked domains, while a command-line interface and an API offer further control for those who need it. It caches DNS responses to speed up everyday browsing, supports both IPv4 and IPv6, and scales from small households to environments handling very high query volumes. The project is free and open source, sustained by donations and volunteer effort.

Choosing What to Block

Selecting what to block is a point at which behaviour and technology intersect. It is tempting to load every available blocklist in the hope of maximum protection, but as Avoid the Hack notes in its detailed guide to Pi-hole blocklists, more is not always better. Many lists draw from common sources, so stacking them can add redundancy without improving coverage and may increase false positives (instances where legitimate sites are mistakenly blocked).

The most effective approach begins by considering what you want to block and why, then balancing that against the requirements of your devices and services. Blocking every Microsoft domain, for instance, could disrupt operating system updates or break websites that rely on Azure. Likewise, blacklisting all domains belonging to a streaming or gaming platform may render apps unusable. Aggressive configurations are possible, but they work best when paired with careful allow-listing of domains essential to your services. Allow lists require ongoing upkeep as services move or change, so they are not a one-off exercise.

Recommended Blocklists

A practical starting point is the well-maintained Steven Black unified hosts file, which consolidates several reputable sources and many users find sufficient straight away. From there, curated collections help tailor coverage further. EasyList provides a widely trusted foundation for blocking advertising and integrates with browser extensions such as uBlock Origin, while its companion list EasyPrivacy can add stronger tracking protection at the cost of occasional breakage on certain sites.

Hagezi maintains a comprehensive set of DNS blocklists, including "multi" variants of different sizes and aggression levels, built from numerous sources. Selecting one of the multi variants is usually preferable to layering many individual category lists, which can reintroduce the overlap you were trying to avoid. Firebog organises its lists by risk: green entries carry a lower risk of causing breakage, while blue entries are more aggressive, giving you the option to mix and match according to your comfort level.

Some projects bundle many sources into a single combination list. OISD is one such option, with its Basic variant focusing mainly on advertisements, Full extending to malware, scams, phishing, telemetry and tracking, and a separate NSFW set covering adult content. OISD updates roughly every 24 hours and is comprehensive enough that many users would not need additional lists. The trade-off is placing a significant degree of trust in a single maintainer and limiting the ability to assign different rule sets to different device groups within Pi-hole, so it is worth weighing convenience against flexibility before committing.

The Blocklist Project organises themed lists covering advertising, tracking, malware, phishing, fraud and social media domains, and these work with both Pi-hole and AdGuard Home. The project completed a full rebuild of its underlying infrastructure, replacing an inconsistent mix of scripts with a properly tested Python pipeline, automated validation on pull requests and a cleaner build process.

Existing list URLs are unchanged, so anyone already using the project's lists need not reconfigure anything. That said, the broader principle holds regardless of which project you use: blocklists can become outdated if not actively maintained, reducing their effectiveness over time.

Using Regular Expressions

For more granular control, Pi-hole supports regular expressions to match domain patterns. Regex entries are powerful and can be applied both to block and to allow traffic, but they reward specificity. Broad patterns risk false positives that break legitimate services, so community-maintained regex recommendations are a safer starting point than writing everything from scratch. Pi-hole's own documentation explains how expressions are evaluated in detail. Used judiciously, regex rules extend what list-based blocking can achieve without turning maintenance into an ongoing burden.

Installing Pi-hole

Installation is straightforward. Pi-hole can be deployed in a Linux container or directly on a supported operating system using an automated installer that asks a handful of questions and configures everything in under ten minutes. Once running, you point clients to use it as their DNS resolver, either by setting DHCP options on your router, so devices adopt it automatically, or by updating network settings on each device individually. Pairing Pi-hole with a VPN extends ad blocking to mobile devices when away from home, so limited data plans go further by not downloading unwanted content. Day-to-day management is handled via the web interface, where you can add domains to block or allow lists, review query logs, view long-term statistics and audit entries, with privacy modes that can be tuned to your environment.

Device-Level Adjustments

Network filtering is one layer in a defence-in-depth approach, and a few small device-level changes can reduce friction without sacrificing safety. Bitdefender's Safepay, for example, is designed to isolate banking and shopping sessions within a hardened browser environment. If its prompts become intrusive, you can turn off notifications by opening the Bitdefender interface, selecting Privacy, then Safepay settings, and toggling off both Safepay notifications and the option to use a VPN with Safepay. Bookmarked sites can still auto-launch Safepay unless you also disable the automatic-opening option. Even with notifications suppressed, you can start Safepay manually from the dashboard whenever you want the additional protection.

On Windows, unwanted prompts from Microsoft Edge about setting it as the default browser can be handled without resorting to arcane steps. The Windows Club covers the full range of methods available. Dismissing the banner by clicking "Not now" several times usually prevents it from reappearing, though a browser update or reset may bring the message back. Advanced users can disable the recommendations via edge://flags, or apply a registry policy under HKEY_CURRENT_USERSoftwarePoliciesMicrosoftEdge by setting DefaultBrowserSettingEnabled to 0. In older environments such as Windows 7, a Group Policy setting exists to stop Edge checking whether it is the default browser. These changes should be made with care, particularly in managed environments where administrators enforce default application associations across the estate.

Knowing What Your Devices Reveal

Awareness also begins with understanding what your devices reveal to the wider internet. Services like WhatIsMyIP.com display your public IP address, the approximate location derived from it and your internet service provider. For most home users, a public IP address is dynamic rather than fixed, meaning it can change when a router restarts or when an ISP reallocates addresses; on mobile networks it may change more frequently still as devices move between towers and routing systems.

Such tools also provide lookups for DNS and WHOIS information, and they explain the difference between public and private addressing. Complementary checks from WhatIsMyBrowser.com summarise your browser version, whether JavaScript and cookies are enabled, and whether known trackers or ad blockers are detected. Sharing that information with support teams can make troubleshooting considerably faster, since it quickly narrows down where problems are likely to sit.

Protecting Your Accounts

Checking for Breached Credentials

Account security is another area where habits do most of the heavy lifting. Checking whether your email address appears in known data breaches via Have I Been Pwned helps you decide when to change passwords or enable stronger protections. The service, created by security researcher Troy Hunt, tracks close to a thousand breached websites and over 17.5 billion compromised accounts, and offers notifications as well as a searchable dataset. Finding your address in a breach does not mean your account has been taken over, but it does mean you should avoid reusing passwords and should enable two-factor authentication wherever it is available.

Two-Factor Authentication

Authenticator apps provide time-based codes that attackers cannot guess, even when armed with a reused password. Aegis Authenticator is a free, open-source option for Android that stores your tokens in an encrypted vault with optional biometric unlock. It offers a clean interface with multiple themes, supports icons for quick identification and allows import and export from a wide range of other apps. Backups can be automatic, and you remain in full control, since the app works entirely offline without advertisements or tracking.

For users who prefer cloud backup and multi-device synchronisation, Authy from Twilio offers a popular alternative that pairs straightforward setup with secure backup and support for using tokens across more than one device. Both approaches strengthen accounts significantly, and the choice often comes down to whether you value local control above all else or prefer the convenience of synchronisation.

Password Management

Strong, unique passwords remain essential even alongside two-factor authentication. KeePassXC is a cross-platform password manager for Windows, macOS and Linux that keeps your credentials in an encrypted database stored wherever you choose, rather than on a vendor's servers. It is free and open source under the GPLv3 licence, and its development process is publicly visible on GitHub.

The project has undergone rigorous external scrutiny. On the 17th of November 2025, KeePassXC version 2.7.9 was awarded a Security Visa by the French National Cybersecurity Agency (ANSSI) under its First-level Security Certification (CSPN) programme, with report number ANSSI-CSPN-2025/16. The certification is valid for three years and is recognised in France and by the German Federal Office for Information Security. More recent releases such as version 2.7.11 focus on bug fixes and usability improvements, including import enhancements, better password-generation feedback and refinements to browser integration. Because data are stored locally, you can place the database in a private or shared cloud folder if you wish to sync between devices, while encryption remains entirely under your control.

Secure Email with Tuta

Email is a frequent target for attackers and a common source of data leakage, so the choice of client can make a meaningful difference. Tuta provides open-source desktop applications for Linux, Windows and macOS that bring its end-to-end encrypted mail and calendar to the desktop with features that go beyond the web interface. The clients are signed so that updates can be verified independently, and Tuta publishes its public key, so users can confirm signatures themselves.

There is a particular focus on Linux, with support for major distributions including Ubuntu, Debian, Fedora, Arch Linux, openSUSE and Linux Mint. Deep operating-system integration enables conveniences such as opening files as attachments directly from context menus on Windows via MAPI, setting Tuta as the default mail handler, using the system's secret storage and applying multi-language spell-checking. Hardware key support via U2F is available across all desktop clients, and offline mode means previously indexed emails, calendars and contacts remain accessible without an internet connection.

Tuta does not support IMAP because downloading and storing messages unencrypted on devices would undermine its end-to-end encryption model. Instead, features such as import and export are built directly into the clients; paid plans including Legend and Unlimited currently include email import that encrypts messages locally before uploading them. The applications are built on Electron to maintain feature parity across platforms, and Tuta offers the desktop clients free to all users to ensure that core security benefits are not gated behind a subscription.

Bringing Culture and Tooling Together

These individual strands reinforce one another when combined. A network-wide blocker reduces exposure to malvertising and tracking while nudging everyone in a household or office towards safer defaults. Small device-level settings cut noise without removing protection, which helps people maintain good habits because security becomes less intrusive. Visibility tools demystify what the internet can see and how browsers behave, which builds confidence. Password managers and authenticator apps make strong credentials and second factors the norm rather than the exception, and a secure email client protects communications by default.

None of these steps requires perfection, and each can be introduced one at a time. The key is to focus on outcomes, think like a coach and make security personal, so that habits take root and last.

There is no single fix that will stop every attack. One approach that does help is consistent behaviour supported by thoughtful choices of software and services. Start with one change that removes friction while adding protection, then build from there. Over time, those choices shape a culture in which people feel they have a genuine role in keeping themselves and their organisations safe, and the technology they rely upon reflects that commitment.

Running local Large Language Models on desktop computers and workstations with 8GB VRAM

7^th February 2026

Running large language models locally has shifted from being experimental to practical, but expectations need to match reality. A graphics card with 8 GB of VRAM can support local workflows for certain text tasks, though the results vary considerably depending on what you ask the models to do.

Understanding the Hardware Foundation

The Critical Role of VRAM

The central lesson is that VRAM is the engine of local performance on desktop systems. Whilst abundant system RAM helps avoid crashes and allows larger contexts, it cannot replace VRAM for throughput.

Models that fit in VRAM and keep most of their computation on the GPU respond promptly and maintain a steady pace. Those that overflow to system RAM or the CPU see noticeable slowdowns.

Hardware Limitations and Thresholds

On a desktop GPU with 8 GB of VRAM, this sets a practical ceiling. Models in the 7 billion to 14 billion parameter range fit comfortably enough to exploit GPU acceleration for typical contexts.

Much larger models tend to offload a significant portion of the work to the CPU. This shows up as pauses, lower token rates and lag when prompts become longer.

Monitoring GPU Utilisation

GPU utilisation is a reliable way to gauge whether a setup is efficient. When the GPU is consistently busy, generation is snappy and interactive use feels smooth.

A model like llama3.1:8b can run almost entirely on the GPU at a context length of 4,096 tokens. This translates into sustained responsiveness even with multi-paragraph prompts.

Model Selection and Performance

Choosing the Right Model Size

A frequent instinct is to reach for the largest model available, but size does not equal usefulness when running locally on a desktop or workstation. In practice, models between 7B and 14B parameters are what you can run on this class of hardware, though what they do well is more limited than benchmark scores might suggest.

What These Models Actually Do Well

Models in this range handle certain tasks competently. They can compress and reorganise information, expand brief notes into fuller text, and maintain a reasonably consistent voice across a draft. For straightforward summarisation of documents, reformatting content or generating variations on existing text, they perform adequately.

Where things become less reliable is with tasks that demand precision or structured output. Coding tasks illustrate this gap between benchmarks and practical use. Whilst llama3.1:8b scores 72.6% on the HumanEval benchmark (which tests basic algorithm problems), real-world coding tasks can expose deeper limitations.

Commit message generation, code documentation and anything requiring consistent formatting produce variable results. One attempt might give you exactly what you need, whilst the next produces verbose or poorly structured output.

The gap between solving algorithmic problems and producing well-formatted, professional code output is significant. This inconsistency is why larger local models like gpt-oss-20b (which requires around 16GB of memory) remain worth the wait despite being slower, even when the 8GB models respond more quickly.

Recommended Models for Different Tasks

Llama3.1:8b handles general drafting reasonably well and produces flowing output, though it can be verbose. Benchmark scores place it above average for its size, but real-world use reveals it is better suited to free-form writing than structured tasks.

Phi3:medium is positioned as stronger on reasoning and structured output. In practice, it can maintain logical order better than some alternatives, though the official documentation acknowledges quality of service limitations, particularly for anything beyond standard American English. User reports also indicate significant knowledge gaps and over-censorship that can affect practical use.

Gemma3 at 12B parameters produces polished prose and smooths rough drafts effectively when properly quantised. The Gemma 3 family offers models from 1B to 27B parameters with 128K context windows and multimodal capabilities in the larger sizes, though for 8GB VRAM systems you are limited to the 12B variant with quantisation. Google also offers Gemma 3n, which uses an efficient MatFormer architecture and Per-Layer Embedding to run on even more constrained hardware. These are primarily optimised for mobile and edge devices rather than desktop use.

Very large models remain less efficient on desktop hardware with 8 GB VRAM. Attempting to run them results in heavy CPU offloading, and the performance penalty can outweigh any quality improvement.

Memory Management and Configuration

Managing Context Length

Context length sits alongside model size as a decisive lever. Every extra token of context demands memory, so doubling the window is not a neutral choice.

At around 4,096 tokens, most of the well-matched models stay predominantly on the GPU and hold their speed. Push to 8,192 or beyond, and the memory footprint swells to the point where more of the computation ends up taking place on the CPU and in system RAM.

Ollama's Keep-Alive Feature

Ollama keeps already loaded models resident in VRAM for a short period after use so that the next call does not pay the penalty of a full reload. This is expected behaviour and is governed by a keep_alive parameter that can be adjusted to hold the model for longer if a burst of work is coming, or to release it sooner when conserving VRAM matters.

Practical Memory Strategies

Breaking long jobs into a series of smaller, well-scoped steps helps both speed and stability without constraining the quality of the end result. When writing an article, for instance, it can be more effective to work section by section rather than asking for the entire piece in one pass.

Optimising the Workflow

The Benefits of Streaming

Streaming changes the way output is experienced, rather than the content that is ultimately produced. Instead of waiting for a block of text to arrive all at once, words appear progressively in the terminal or application. This makes longer pieces easier to manage and revise on the fly.

Task-Specific Model Selection

Because each model has distinct strengths and weaknesses, matching the tool to the task matters. A fast, GPU-friendly model like llama3.1:8b works for general writing and quick drafting where perfect accuracy is not critical. Phi3:medium may handle structured content better, though it is worth testing against your specific use case rather than assuming it will deliver.

Understanding Limitations

It is important to be clear about what local models in this size range struggle with. They are weak at verifying facts, maintaining strict factual accuracy over extended passages, and providing up-to-date knowledge from external sources.

They also perform inconsistently on tasks requiring precise structure. Whilst they may pass coding benchmarks that test algorithmic problem-solving, practical coding tasks such as writing commit messages, generating consistent documentation or maintaining formatting standards can reveal deeper limitations. For these tasks, you may find yourself returning to larger local models despite preferring the speed of smaller ones.

Integration and Automation

Using Ollama's Python Interface

Ollama integrates into automated workflows on desktop systems. Its Python package allows calls from scripts to automate summarisation, article generation and polishing runs, with streaming enabled so that logs or interfaces can display progress as it happens. Parameters can be set to control context size, temperature and other behavioural settings, which helps maintain consistency across batches.

Building Production Pipelines

The same interface can be linked into website pipelines or content management tooling, making it straightforward to build a system that takes notes or outlines, expands them, revises the results and hands them off for publication, all locally on your workstation. The same keep_alive behaviour that aids interactive use also benefits automation, since frequently used models can remain in VRAM between steps to reduce start-up delays.

Recommended Configuration

Optimal Settings for 8 GB VRAM

For a desktop GPU with 8 GB of VRAM, an optimal configuration builds around models that remain GPU-efficient whilst delivering acceptable results for your specific tasks. Llama3.1:8b, phi3:medium and gemma3:12b are the models that fit this constraint when properly quantised, though you should test them against your actual workflows rather than relying on general recommendations.

Performance Monitoring

Keeping context windows around 4,096 tokens helps sustain GPU-heavy operation and consistent speeds, whilst streaming smooths the experience during longer outputs. Monitoring GPU utilisation provides an early warning if a job is drifting into a configuration that will trigger CPU fallbacks.

Planning for Resource Constraints

If a task does require more memory, it is worth planning for the associated slowdown rather than assuming that increasing system RAM or accepting a bigger model will compensate for the VRAM limit. Tuning keep_alive to the rhythm of work reduces the frequency of reloads during sessions and helps maintain responsiveness when running sequences of related prompts.

A Practical Content Creation Workflow

Multi-Stage Processing

This configuration supports a division of labour in content creation on desktop systems. You start with a compact model for rapid drafting, switch to a reasoning-focused option for structured expansions if needed, then finish with a model known for adding polish to refine tone and fluency. Insert verification steps between stages to confirm facts, dates and citations before moving on.

Because each stage is local, revisions maintain privacy, with minimal friction between idea and execution. When integrated with automation via Ollama's Python tools, the same pattern can run unattended for batches of articles or summaries, with human review focused on accuracy and editorial style.

In Summary

Desktop PCs and workstations with 8 GB of VRAM can support local LLM workflows for specific tasks, though you need realistic expectations about what these models can and cannot do reliably. They handle basic text generation and reformatting, though prone to hallucinations and misunderstandings. They struggle with precision tasks, structured output and anything requiring consistent formatting. Whilst they may score well on coding benchmarks that test algorithmic problem-solving, practical coding tasks can reveal deeper limitations.

The key is to select models that fit the VRAM envelope, keep context lengths within GPU-friendly bounds, and test them against your actual use cases. For tasks where local models prove inconsistent, such as generating commit messages or producing reliably structured output, larger local models like gpt-oss-20b (which requires around 16GB of memory) may still be worth the wait despite being slower. Local LLMs work best when you understand their limitations and use them for what they genuinely do well, rather than expecting them to replace more capable models across all tasks.

Additional Resources

Ollama Official Documentation
Ollama GitHub Repository
Hugging Face model hub
LM Studio for local model management
Jan AI for local AI deployment
Meta Llama 3.1 Model Card
Microsoft Phi-3 Documentation
Google Gemma 3 Overview
Google Gemma 3n Overview
Artificial Analysis for model benchmarks

Four technical portals that still deliver after decades online

3^rd February 2026

The early internet was built on a different kind of knowledge sharing, one driven by individual expertise, community generosity and the simple desire to document what worked. Four informative websites that started in that era, namely MDN Web Docs, AskApache, WindowsBBS and Office Watch, embody that spirit and remain valuable today. They emerged at a time when technical knowledge was shared through forums, documentation and personal blogs rather than social media or algorithm-driven platforms, and their legacy persists in offering clarity and depth in an increasingly fragmented digital landscape.

MDN Web Docs

MDN Web Docs stands as a cornerstone of modern web development, offering comprehensive coverage of HTML, CSS, JavaScript and Web APIs alongside authoritative references for browser compatibility. Mozilla started the project in 2005 under the name Mozilla Developer Centre, and it has since grown into a collaborative effort of considerable scale. In 2017, Mozilla announced a formal partnership with Google, Microsoft, Samsung and the W3C to consolidate web documentation on a single platform, with Microsoft alone redirecting over 7,700 of its MSDN pages to MDN in that year.

For developers, the site is not merely a reference tool but a canonical guide that ensures standards are adhered to and best practices followed. Its tutorials, guides and learning paths make it indispensable for beginners and seasoned professionals alike. The site's community-driven updates and ongoing contributions from browser vendors have cemented its reputation as the primary source for anyone building for the web.

AskApache

AskApache is a niche but invaluable resource for those managing Apache web servers, built by a developer whose background lies in network security and penetration testing on shared hosting environments. The site grew out of the founder's detailed study of .htaccess files, which, unlike the main Apache configuration file httpd.conf, are read on every request and offer fine-grained, per-directory control without requiring root access to the server. That practical origin gives the content its distinctive character: these are not generic tutorials, but hard-won techniques born from real-world constraints.

The site's guides on blocking malicious bots, configuring caching headers, managing redirects with mod_rewrite and preventing hot-linking are frequently cited by system administrators and WordPress users. Its specificity and longevity have made it a trusted companion for those maintaining complex server environments, covering territory that mainstream documentation rarely touches.

WindowsBBS

WindowsBBS offers a clear window into the era when online forums were the primary hub for technical support. Operating in the tradition of classic bulletin board systems, the site has long been a resource for users troubleshooting Windows installations, hardware compatibility issues and malware removal. It remains completely free, sustained by advertisers and community donations, which reflects the ethos of mutual aid that defined early internet culture.

During the Windows XP and Windows 7 eras, community forums of this kind were essential for solving problems that official documentation often overlooked, with volunteers providing detailed answers to questions that Microsoft's own support channels would not address. While the rise of social media and centralised support platforms has reduced the prominence of such forums, WindowsBBS remains a testament to the power of community-driven problem-solving. Its straightforward structure, with users posting questions and experienced volunteers providing answers, mirrors the collaborative spirit that made the early web such a productive environment.

Office Watch

Office Watch has served as an independent source of Microsoft Office news, tips and analysis since 1996, making it one of the longer-running specialist publications of its kind. Its focus on Microsoft Office takes in advanced features and hidden tools that are seldom documented elsewhere, from lesser-known functions in Excel to detailed comparisons between Office versions and frank assessments of Microsoft's product decisions. That independence gives it a voice that official resources cannot replicate.

The site serves power users seeking to make the most of the software they use every day, with guides and books that extend its reach beyond the website itself. In an era where software updates are frequent and often poorly explained, Office Watch provides the kind of context and plain-spoken clarity that official documentation rarely offers.

The Enduring Value of Depth and Community

These four sites share a common thread: they emerged when technical knowledge was shared openly by experts and enthusiasts rather than filtered through algorithms or paywalls, and they retain the value that comes from that approach. Their continued relevance speaks to what depth, specificity and community can achieve in the digital world. While platforms such as Stack Overflow and GitHub Discussions have taken over many of the roles these sites once played, the original resources remain useful for their historical context and the quality of their accumulated content.

As the internet continues to evolve, the lessons from these sites are worth remembering. The most useful knowledge is often found at the margins, where dedicated individuals take the time to document, explain and share what they have learned. Whether you are a developer, a server administrator or an everyday Office user, these resources are more than archives: they are living repositories of expertise, built by people who cared enough to write things down properly.

Installing PowerShell on Linux Mint for some cross-platform testing

25^th November 2025

Given how well shell scripting works on Linux and my familiarity with it, the need to install PowerShell on a Linux system may seem surprising. However, this was part of some testing that I wanted to do on a machine that I controlled before moving the code to a client's system. The first step was to ensure that any prerequisites were in place:

sudo apt update sudo apt install -y wget apt-transport-https software-properties-common

After that, the next moves were to download and install the required package for instating Microsoft repository details:

wget -q https://packages.microsoft.com/config/ubuntu/24.04/packages-microsoft-prod.deb sudo dpkg -i packages-microsoft-prod.deb

Then, I could install PowerShell itself:

sudo apt update sudo apt install -y powershell

When it was in place, issuing the following command started up the extra shell for what I needed to do:

pwsh

During my investigations, I found that my local version of PowerShell was not the same as on the client's system, meaning that any code was not as portable as I might have expected, Nevertheless, it is good to have this for future reference and proves how interoperable Microsoft has needed to become.

Latest developments in the AI landscape: Consolidation, implementation and governance

22^nd November 2025

Artificial intelligence is moving through another moment of consolidation and capability gain. New ways to connect models to everyday tools now sit alongside aggressive platform plays from the largest providers, a steady cadence of model upgrades, and a more defined conversation about risk and regulation. For companies trying to turn all this into practical value, the story is becoming less about chasing the latest benchmark and more about choosing a platform, building the right connective tissue, and governing data use with care. The coming year looks set to reward those who simplify the user experience, embed AI directly into work and adopt proportionate controls rather than blanket bans.

I. Market Structure and Competitive Dynamics

Platform Consolidation and Lock-In

Enterprise AI appears to be settling into a two-platform market. Analysts describe a landscape defined more by integration and distribution than raw model capability, evoking the cloud computing wars. On one side sit Microsoft and OpenAI, on the other Google and Gemini. Recent signals include the pricing of Gemini 3 Pro at around two dollars per million tokens, which undercuts much of the market, Alphabet's share price strength, and large enterprise deals for Gemini integrated with Google's wider software suite. Google is also promoting Antigravity, an agent-first development environment with browser control, asynchronous execution and multi-agent support, an attempt to replicate the pull of VS Code within an AI-native toolchain.

The implication for buyers is higher switching costs over time. Few expect true multi-cloud parity for AI, and regional splits will remain. Guidance from industry commentators is to prioritise integration across the existing estate rather than incremental model wins, since platform choices now look like decade-long commitments. Events lined up for next year are already pointing to that platform view.

Enterprise Infrastructure Alignment

A wider shift in software development is also taking shape. Forecasts for 2026 emphasise parallel, multi-agent systems where a planning agent orchestrates a set of execution agents, and harnesses tune themselves as they learn from context. There is growing adoption of a mix-of-models approach in which expensive frontier models handle planning, and cheaper models do the bulk of execution, bringing near-frontier quality for less money and with lower latency. Team structures are changing as a result, with more value placed on people who combine product sense with engineering craft and less on narrow specialisms.

ServiceNow and Microsoft have announced a partnership to coordinate AI agents across organisations with tighter oversight and governance, an attempt to avoid the sprawl that plagued earlier automation waves. Nvidia has previewed Apollo, a set of open AI physics models intended to bring real-time fidelity to simulations used in science and industry. Albania has appointed an AI minister, which has kicked off debate about how governments should manage and oversee their own AI use. CIOs are being urged to lead on agentic AI as systems become capable of automating end-to-end workflows rather than single steps.

New companies and partnerships signal where capital and talent are heading. Jeff Bezos has returned to co-lead Project Prometheus, a start-up with $6.2 billion raised and a team of about one hundred hires from major labs, focused on AI for engineering and manufacturing in the physical world, an aim that aligns with Blue Origin interests. Vik Bajaj is named as co-CEO.

Deals underline platform consolidation. Microsoft and Nvidia are investing up to $5 billion and $10 billion respectively (totalling $15 billion) in Anthropic, whilst Anthropic has committed $30 billion in Azure capacity purchases with plans to co-design chips with Nvidia.

Commercial Model Evolution

Events and product launches continue at pace. xAI has released Grok 4.1 with an emphasis on creativity and emotional intelligence while cutting hallucinations. On the tooling front, tutorials explain how ChatGPT's desktop app can record meetings for later summarisation. In a separate interview, DeepMind's Demis Hassabis set out how Gemini 3 edges out competitors in many reasoning and multimodal benchmarks, slightly trails Claude Sonnet 4.5 in coding, and is being positioned for foundations in healthcare and education though not as a medical-grade system. Google is encouraging developers towards Antigravity for agentic workflows.

Industry leaders are also sketching commercial models that assume more agentic behaviour, with Microsoft's Satya Nadella promising a "positive-sum" vision for AI while hinting at per-agent pricing and wider access to OpenAI IP under Microsoft's arrangements.

II. Technical Implementation and Capability

Practical Connectivity Over Capability

A growing number of organisations are starting with connectors that allow a model to read and write across systems such as Gmail, Notion, calendars, CRMs, and Slack. Delivered via the Model Context Protocol, these links pull the relevant context into a single chat, so users spend less time switching windows and more time deciding what to do. Typical gains are in hours saved each week, lower error rates, and quicker responses. With a few prompts, an assistant can draft executive email summaries, populate a Notion database with leads from scattered sources, or propose CRM follow-ups while showing its working.

The cleanest path is phased: enable one connector using OAuth, trial it in read-only mode, then add simple routines for briefs, meeting preparation or weekly reports before switching on write access with a "show changes before saving" step. Enterprise controls matter here. Connectors inherit user permissions via OAuth 2.0, process data in memory, and vendors point to SOC 2, GDPR and CCPA compliance alongside allow and block lists, policy management, and audit logs. Many governance teams prefer to begin read-only and require approvals for writes.

There are limits to note, including API rate caps, sync delays, context window constraints and timeouts for long workflows. They are poor fits for classified data, considerable bulk operations or transactions that cannot tolerate latency. Some industry observers regard Claude's current MCP implementation, particularly on desktop, as the most capable of the group. Playbooks for a 30-day rollout are beginning to circulate, as are practitioner workshops introducing go-to-market teams to these patterns.

Agentic Orchestration Entering Production

Practical comparisons suggest the surrounding tooling can matter more than the raw model for building production-ready software. One report set a 15-point specification across several environments and found that Claude Code produced all features end-to-end. The same spec built with Gemini 3 inside Antigravity delivered two thirds of the features, while Sonnet 4.5 in Antigravity delivered a little more than half, with omissions around batching, progress indicators and robust error handling.

Security remains a live issue. One newsletter reports that Anthropic said state-backed Chinese hackers misused Claude to autonomously support a large cyberattack, which has intensified calls for governance. The background hum continues, from a jump in voice AI adoption to a German ruling on lyric copyright involving OpenAI, new video guidance steps in Gemini, and an experimental "world model" called Marble. Tools such as Yorph are receiving attention for building agentic data pipelines as teams look to productionise these patterns.

Tooling Maturity Defining Outcomes

In engineering practice, Google's Code Wiki brings code-aware documentation that stays in sync with repositories using Gemini, supported by diagrams and interactive chat. GitLab's latest survey suggests AI increases code creation but also pushes up demand for skilled engineers alongside compliance and human oversight. In operations, Chronosphere has added AI remediation guidance to cut observability noise and speed root-cause analysis while performance testing is shifting towards predictive, continuous assurance rather than episodic tests.

Vertical Capability Gains

While the platform picture firms up, model and product updates continue at pace. Google has drawn attention with a striking upgrade to image generation, based on Gemini 3. The system produces 4K outputs with crisp text across multiple languages and fonts, can use up to 14 reference images, preserves identity, and taps Google Search to ground data for accurate infographics.

Separately, OpenAI has broadened ChatGPT Group Chats to as many as 20 people across all pricing tiers, with privacy protections that keep group content out of a user's personal memory. Consumer advocates have used the moment to call out the risks of AI toys, citing safety, privacy and developmental concerns, even as news continues to flow from research and product teams, from the release of OLMo 3 to mobile features from Perplexity and a partnership between Stability and Warner Music Group.

Anthropic has answered with Claude Opus 4.5, which it says is the first model to break the 80 percent mark on SWE-Bench Verified while improving tool use and reasoning. Opus 4.5 is designed to orchestrate its smaller Haiku models and arrives with a price cut of roughly two thirds compared to the 4.1 release. Product changes include unlimited chat length, a Claude Code desktop app, and integrations that reach across Chrome and Excel.

OpenAI's additions have a more consumer flavour, with a Shopping Research feature in ChatGPT that produces personalised product guidance using a GPT-5 mini variant and plans for an Instant Checkout flow. In government, a new US executive order has launched the "Genesis Mission" under the Department of Energy, aiming to fuse AI capabilities across 17 national labs for advances in fields such as biotechnology and energy.

Coding tools are evolving too. OpenAI has previewed GPT-5.1-Codex-Max, which supports long-running sessions by compacting conversational history to preserve context while reducing overhead. The company reports 30 percent fewer tokens and faster performance over sessions that can run for more than a day. The tool is already available in the Codex CLI and IDE, with an API promised.

Infrastructure news out of the Middle East points to large-scale investment, with Saudi HUMAIN announcing data centre plans including xAI's first international facility alongside chips from Nvidia and AWS, and a nationwide rollout of Grok. In computer vision, Meta has released SAM 3 and SAM 3D as open-source projects, extending segmentation and enabling single-photo 3D reconstruction, while other product rollouts continue from GPT-5.1 Pro availability to fresh funding for audio generation and a marketing tie-up between Adobe and Semrush.

On the image side, observers have noted syntax-aware code and text generation alongside moderation that appears looser than some rivals. A playful "refrigerator magnet" prompt reportedly revealed a portion of the system prompt, a reminder that prompt injection is not just a developer concern.

Video is another area where capabilities are translating into business impact. Sora 2 can generate cinematic, multi-shot videos with consistent characters from text or images, which lets teams accelerate marketing content, broaden A/B testing and cut the need for studios on many projects. Access paths now span web, mobile, desktop apps and an API, and the market has already produced third-party platforms that promise exports without watermarks.

Teams experimenting with Sora are being advised to measure success by outcomes such as conversion rates, lower support loads or improved lead quality rather than just aesthetic fidelity. Implementation advice favours clear intent, structured prompts and iterative variation, with more advanced workflows assembling multi-shot storyboards, using match cuts to maintain rhythm, controlling lighting for continuity and anchoring character consistency across scenes.

III. Governance, Risk and Regulation

Governance as a Product Requirement

Amid all this activity, data risk has become a central theme for AI leaders. One governance specialist has consolidated common problem patterns into the PROTECT framework, which offers a way to map and mitigate the most material risks.

The first concern is the use of public AI tools for work content, which raises the chance of leakage or unwanted training on proprietary data. The recommended answer combines user guidance, approved internal alternatives, and technical or legal controls such as data scanning and blocking.

A second pressure point is rogue internal projects that bypass review, create compliance blind spots and build up technical debt. Proportionate oversight is key, calibrated to data sensitivity and paired with streamlined governance, so teams are not incentivised to route around it.

Third-party vendors can be opportunistic with data, so due diligence and contractual clauses need to prevent cross-customer training and make expectations clear with templates and guidance.

Technical attacks are another strand, from prompt injection to data exfiltration or the misuse of agents. Layered defences help here, including input validation, prompt sanitisation, output filtering, monitoring, red-teaming, and strict limits on access and privilege.

Embedded assistants and meeting bots come with permission risks when they operate over shared drives and channels, and agentic systems can amplify exposure if left unchecked, so the advice is to enforce least-privilege access, start on low-risk data, and keep robust audit trails.

Compliance risks span privacy laws such as GDPR with their demands for a lawful basis, IP and copyright constraints, contractual obligations, and the AI Act's emphasis on data quality. Legal and compliance checks need to be embedded at data sourcing, model training and deployment, backed by targeted training.

Finally, cross-border restrictions matter. Transfers should be mapped across systems and sub-processors, with checks for Data Privacy Framework certification, standard contractual clauses where needed, and transfer impact assessments that take account of both GDPR and newer rules such as the US Bulk Data Transfer Rule.

Regulatory Pragmatism

Regulators are not standing still, either. In the European Commission has proposed amendments to the AI Act through a Digital Omnibus package as the trilogue process rolls on. Six changes are in focus:

High-risk timelines would be tied to the approval of standards, with a backstop of December 2027 for Annex III systems and August 2028 for Annex I products if delays continue, though the original August 2026 date still holds otherwise.
Transparency rules on AI-detectable outputs under Article 50(2) would be delayed to February 2027 for systems placed on the market before August 2026, with no delay for newer systems.
The plan removes the need to register Annex III systems in the public database where providers have documented under Article 6(3) that a system is not high risk.
AI literacy would shift from a mandatory organisation-wide requirement to encouragement, except where oversight of high-risk systems demands it.
There is also a move to centralise supervision by the AI Office for systems built on general-purpose models by the same provider, and for huge online platforms and search engines, which is intended to reduce fragmentation across member states.
Finally, proportionality measures would define Small Mid-Cap companies and extend simplified obligations and penalty caps that currently apply to SMEs.

If adopted, the package would grant more time and reduce administrative load in some areas, at the expense of certainty and public transparency.

IV. Strategic Implications

The picture that emerges is one of pragmatic integration. Connectors make it feasible to keep work inside a single chat while drawing on the systems people already use. Platform choices are converging, so it makes sense to optimise for the suite that fits the current stack and to plan for switching costs that accumulate over time.

Agentic orchestration is moving from slides to code, but teams will get further by focusing on reliable tooling, clear governance and value measures that match business goals. Regulation is edging towards more flexible timelines and centralised oversight in places, which may lower administrative load without removing the need for discipline.

The sensible posture is measured experimentation: start with read-only access to lower-risk data, design routines that remove drudgery, introduce write operations with approvals, and monitor what is actually changing. The tools are improving quickly, yet the organisations that benefit most will be those that match innovation with proportionate controls and make thoughtful choices now that will hold their shape for the decade ahead.

« Older Entries «