AI & Data Science Jottings

17:02, 26^th March 2026

Google's Gemini CLI introduces a plan mode that defaults to read-only operations, allowing the agent to explore codebases, search for patterns and propose strategies without modifying files until explicitly approved. This approach prioritises research and clarification before implementation, using higher-reasoning models for planning and faster models for execution. The mode integrates with tools like Conductor, which organises development workflows into structured plans and supports read-only access to external systems such as GitHub and Postgres during planning phases. By enforcing a deliberate, human-confirmed workflow, the feature addresses concerns around unauthorised changes and aligns with enterprise needs for governance and risk management, positioning it as a foundational element in AI-assisted development practices.

12:11, 26^th March 2026

Understanding a large, unfamiliar codebase can be a slow and frustrating process, but a range of free AI-powered tools now exist to make it considerably more manageable. Google Code Wiki scans a repository after each commit and produces structured documentation complete with diagrams, and includes a chat interface powered by Gemini AI, with free access available for public repositories. DeepWiki similarly generates interactive documentation for any GitHub repository by taking a URL and producing architectural diagrams and module breakdowns alongside a conversational query interface. ExplainGitHub offers quick summaries, visual maps and an AI chat feature for exploring public repositories without requiring sign-up. GitDocs AI focuses on generating README files and other documentation automatically by analysing a repository and producing sections, examples and templates, with free and paid tiers available. Finally, GitSummarize converts a repository into a full documentation hub with automatic summaries, is open-source and free to try, though details on paid or enterprise pricing remain unclear.

12:04, 26^th March 2026

Prompt engineering, while once the fastest route to extracting useful behaviour from language models, tends to break down as real-world systems grow more complex, becoming brittle, difficult to validate and increasingly costly to maintain. Thus, concept engineering represents the next level of abstraction, shifting the focus from carefully worded instruction strings to explicitly defined building blocks comprising inputs, outputs, constraints, tools and success criteria. In practice, this means establishing output contracts through structured schemas, breaking workflows into composable and testable modules, iterating based on measurable metrics rather than instinct and keeping tool behaviour deterministic and well-defined.

Frameworks such as DSPy and structured output mechanisms from providers like OpenAI are already pushing the industry in this direction, while emerging research into concept-level control within model internals points to an even deeper layer of abstraction ahead. Practical adoption does not require overhauling existing systems all at once, and teams can begin by writing a simple concept specification before drafting any instructions, formalising output formats into validated schemas, introducing at least one measurable evaluation loop and separating distinct reasoning stages such as classification, decision-making and language generation into clearly bounded steps.

Common pitfalls include hiding ambiguity inside loosely defined free-form fields, skipping evaluation entirely and over-modularising workflows in ways that introduce unnecessary latency and compounding errors. The broader principle is that reliable, portable and maintainable language model systems are built on well-defined concepts first, with instructions serving as just one implementation detail within that larger structure.

10:29, 26^th March 2026

The shift towards self-hosting data science tools in 2026 is driven by the desire for cost efficiency, customisation and greater control over workflows, with open-source alternatives offering viable replacements for cloud-based services. Tools such as JupyterLab provide a flexible, self-contained environment for interactive notebooks, while MLflow enables private experiment tracking and model management. Apache Airflow supports dynamic pipeline orchestration, DVC ensures version control for large datasets and models and platforms like Metabase or Apache Superset facilitate data visualisation and collaboration.

These solutions require initial setup and ongoing maintenance, including infrastructure provisioning and configuration, but they eliminate recurring subscription costs and enhance data sovereignty. However, adopting them demands technical expertise in areas such as containerisation, database management and system scaling, making them particularly suitable for teams seeking long-term operational autonomy and tailored workflows.

10:25, 26^th March 2026

Building reliable multi-agent AI systems requires careful selection of an orchestration framework, and several strong options are currently available to engineers working in this space. LangGraph, developed by the LangChain team, uses a graph-based approach with explicit state management and support for cyclic workflows and human-in-the-loop capabilities. CrewAI models agents as crew members with defined roles and goals, supporting both sequential and hierarchical task execution in a way that is accessible for developers new to agentic AI. Pydantic AI prioritises type safety and validation, offering model-agnostic support along with durable execution and a built-in evaluation system. Google's Agent Development Kit integrates deeply with Vertex AI and Google Cloud services, emphasising scalability and multimodal input handling suited to enterprise deployments. Microsoft Research's AutoGen focuses on conversational multi-agent systems where agents communicate back and forth, including collaborative code writing and execution. Microsoft's Semantic Kernel takes an enterprise-oriented approach with sophisticated planning, memory management and a plugin architecture designed for integration with existing services. Finally, LlamaIndex Agent Workflows uses an event-driven architecture particularly well suited to agents that need to retrieve and reason over large document collections. The most appropriate choice among these frameworks depends on the specific use case, team expertise and production requirements.

10:15, 26^th March 2026

Modern data architecture revolves around four distinct approaches, each designed to address different storage, processing and organisational challenges. A data warehouse is a centralised repository of structured, pre-processed data that follows a schema-on-write principle, making it well-suited to fast business intelligence queries and reliable reporting, with popular implementations including Snowflake, Amazon Redshift and Google BigQuery. A data lake takes the opposite approach, storing raw data of any type in its native format and applying structure only at the point of analysis, making it far more flexible and cost-effective for machine learning and big data workloads, though poor governance can render it an unmanageable data swamp. A lakehouse combines the low-cost, flexible storage of a data lake with the performance and reliability features of a data warehouse, such as ACID transactions and schema enforcement, eliminating the need to maintain two separate systems and serving both analysts and data scientists from a single unified layer. A data mesh is fundamentally different in nature, being an organisational rather than a purely technological framework that distributes data ownership to the business domains that generate and understand it best, treating datasets as managed products and relying on federated governance to maintain standards across a large enterprise. The appropriate choice depends on organisational size, data variety and team structure, and in large enterprises these approaches are often used in combination rather than in isolation.

10:01, 26^th March 2026

Google's NotebookLM can be used to transform raw, disorganised notes into a structured product requirements document (PRD) by uploading relevant files and using carefully crafted prompts to guide the tool's output. Once uploaded, the files form the basis of a grounded, retrieval-style system that generates summaries and responds to specific instructions, allowing product managers to prioritise user pain points over speculative ideas and structure outputs around sections such as problem statements, core features and success metrics. The resulting draft can be refined through further conversation, with the tool citing its own uploaded materials to support each claim.

From there, Google Antigravity, an AI-powered integrated development environment, can take that PRD and translate it into functioning software. By entering requirements into its Agent Manager, users can instruct AI agents to produce an implementation plan, generate database schemas, scaffold application components and progressively build out a working prototype, with each step requiring human review before the next begins. Together, the two tools represent a workflow that takes a product from loosely gathered research through to a deployable software prototype with relatively minimal manual coding effort.

14:53, 24^th March 2026

Visualizing and Understanding Convolutional Networks

This study explores the application of large convolutional neural networks to image classification, focusing on visualising internal representations and understanding their properties. By examining feature activations, the research reveals that these networks develop structured, interpretable patterns that become increasingly invariant and discriminative across layers. Experiments demonstrate that deeper networks perform better, highlighting the importance of architectural depth over specific components.

The model's ability to generalise to other datasets, such as Caltech-101 and Caltech-256, is notable, achieving results that surpass existing benchmarks. However, performance on PASCAL data is less consistent, suggesting potential dataset biases. The findings challenge the effectiveness of small-scale benchmarks and suggest that improvements in loss functions could enhance the model's ability to handle multi-object scenes. The work also outlines methods for debugging and refining network performance through visualisation and ablation studies, contributing to a broader understanding of how these systems learn and generalise.

12:47, 24^th March 2026

Hidden Technical Debt in Machine Learning Systems

Much of the discussion around technical debt in machine learning systems revolves around the challenges of maintaining scalable, reliable and efficient infrastructure. The paper highlights how the complexity of modern ML workflows often leads to accumulated debt, particularly in areas such as abstraction design, testing practices and organisational culture. For instance, the reliance on ad hoc solutions for data processing or model training can create long-term maintenance burdens, while insufficient testing frameworks may obscure the impact of changes to algorithms or data pipelines.

The authors argue that these issues are not merely technical, but deeply tied to how teams approach development and collaboration. A lack of emphasis on refactoring, for example, can result in systems that become increasingly difficult to modify or extend. Similarly, the pressure to deliver results quickly may encourage short-term fixes that compromise long-term stability.

The paper also underscores the importance of fostering a culture where reducing technical debt is valued as highly as improving model accuracy. Efforts to address these challenges require deliberate strategies, such as investing in better abstractions for distributed computing, adopting rigorous testing methodologies and ensuring that teams are incentivised to prioritise maintainability. The authors suggest that successful ML systems are those where technical debt is actively managed, rather than left to accumulate. This involves not only engineering practices, but also a shift in how organisations perceive and reward contributions to system health.

Ultimately, the paper serves as a reminder that technical debt in ML is not an inevitable byproduct of innovation, but a consequence of choices made during development. Paying it off demands sustained attention, cross-functional collaboration and a willingness to invest in solutions that may not yield immediate returns but ensure the longevity of the systems being built.

12:55, 20^th March 2026

The AI Act is undergoing several key revisions aimed at refining its regulatory framework. One major change involves expanding the scope of when sensitive personal data can be processed for bias detection and correction, extending beyond high-risk AI systems to other AI models and deployers. This adjustment lowers the threshold for data usage from "strictly necessary" to "necessary," though the Council has proposed narrower conditions compared to the Commission’s plan. Another significant shift is the centralisation of oversight for AI systems built on general-purpose AI models. The European AI Office would gain exclusive authority over such systems when developed by the same provider, though exceptions remain for sectors like law enforcement, financial services and critical infrastructure, which would retain national regulatory control.

The Act also introduces proportionality measures for small mid-cap enterprises, aligning their penalty caps with those of SMEs. This includes simplified technical documentation requirements for high-risk AI system providers within this category. Other updates include broadening the use of sensitive data for bias correction, enhancing the European AI Office’s regulatory powers and adjusting the conditions under which such data can be processed. The Council has tempered some of the Commission’s proposals, particularly in areas involving national security and critical services. These changes reflect a balance between tightening oversight in high-risk domains and providing flexibility for smaller organisations, while also addressing concerns around data privacy and regulatory overlap.

« Older Entries «

» Newer Entries »