TOPIC: STACK OVERFLOW
A survey of commenting systems for static websites
This piece grew out of a practical problem. When building a Hugo website, I went looking for a way to add reader comments. The remotely hosted options I found were either subscription-based or visually intrusive in ways that clashed with the site design. Moving to the self-hosted alternatives brought a different set of difficulties: setup proved neither straightforward nor reliably successful, and after some time I concluded that going without comments was the more sensible outcome.
That experience is, it turns out, a common one. The commenting problem for static sites has no clean solution, and the landscape of available tools is wide enough to be disorienting. What follows is a survey of what is currently out there, covering federated, hosted and self-hosted approaches, so that others facing the same decision can at least make an informed choice about where to invest their time.
Federated Options
At one end of the spectrum sit the federated solutions, which take the most principled approach to data ownership. Federated systems such as Cactus Comments stand out by building on the Matrix open standard, a decentralised protocol for real-time communication governed by the Matrix.org Foundation. Because comments exist as rooms on the Matrix network, they are not siloed within any single server, and users can engage with discussions using an existing Matrix account on any compatible home server, or follow threads using any Matrix client of their choosing. Site owners, meanwhile, retain the flexibility to rely on the public Cactus Comments service or to run their own Matrix home server, avoiding third-party tracking and centralised control alike. The web client is LGPLv3 licensed and the backend service is AGPLv3 licensed, making the entire stack free and open source.
Solutions for Publishers and Media Outlets
For publishers and media organisations, Coral by Vox Media offers a well-established and feature-rich alternative. Originally founded in 2014 as a collaboration between the Mozilla Foundation, The New York Times and The Washington Post, with funding from the Knight Foundation, it moved to Vox Media in 2019 and was released as open-source software. It provides advanced moderation tools supported by AI technology, real-time comment alerts and in-depth customisation through its GraphQL API. Its capacity to integrate with existing user authentication systems makes it a compelling choice for organisations that wish to maintain editorial control without sacrificing community engagement. Coral is currently deployed across 30 countries and in 23 languages, a breadth of adoption that reflects its standing among publishers of all sizes. The team has recently expanded the product to include a live Q&A tool alongside the core commenting experience, and the open-source codebase means that organisations with the technical resources can self-host the entire platform.
A strong alternative for publishers who handle large discussion volumes is GraphComment, a hosted platform developed by the French company Semiologic. It takes a social-network-inspired approach, offering threaded discussions with real-time updates, relevance-based sorting, a reputation-based voting system that enables the community to assist with moderation, and a proprietary Bubble Flow interface that makes individual threads indexable by search engines. All data are stored on servers based in France, which will appeal to publishers with European data-residency requirements. Its client list includes Le Monde, France Info and Les Echos, giving it considerable credibility in the media sector.
Hosted Solutions: Ease of Setup and Performance
Hosted solutions cater to those who prioritise simplicity and page performance above all else. ReplyBox exemplifies this approach, describing itself as 15 times lighter than Disqus, with a design focused on clean aesthetics and fast page loads. It supports Markdown formatting, nested replies, comment upvotes, email notifications and social login via Google, and it comes with spam filtering through Akismet. A 14-day free trial is available with no payment required, and a WordPress plugin is offered for those already on that platform.
Remarkbox takes a similarly restrained approach. Founded in 2014 by Russell Ballestrini after he moved his own blog to a static site and found existing solutions too slow or ad-laden, it is open source, carries no advertising and performs no user tracking. Readers can leave comments without creating an account, using email verification to confirm their identity, and the platform operates on a pay-what-you-can basis that keeps it accessible to smaller sites. It supports Markdown with real-time comment previews and deeply nested replies, and its developer notes that comments that are served through the platform contribute to SEO by making user-generated content indexable by search engines.
The choice between hosted and self-hosted systems often hinges on the trade-off between convenience and control. Staticman was a notable option in this space, acting as a Node.js bridge that committed comment submissions as data files directly to a GitHub or GitLab repository. However, its website is no longer accessible, and the project has been effectively abandoned since around 2020, with its maintainers publicly confirming in early 2024 that neither they nor the original author have been active on it for some time and that no volunteer has stepped forward to take it over. Those with a need for similar functionality are directed by the project's own contributors towards Cloudflare Workers-based alternatives. Utterances remains a viable option in this category, using GitHub Issues as its backend so that all comment data stays within a repository the site owner already controls. It requires some technical setup, but rewards that effort with complete data ownership and no external dependencies.
Open-Source, Self-Hosted Options
For developers who value privacy and data sovereignty above the convenience of a hosted service, open-source and self-hosted options present a natural fit. Remark42 is an actively maintained project that supports threaded comments, social login, moderation tools and Telegram or email notifications. Written in Python and backed by a SQLite database, Isso has been available since 2013 and offers a straightforward deployment with a small resource footprint, together with anonymous commenting that requires no third-party authentication. Both projects reflect a broader preference among privacy-conscious developers for keeping comment data entirely under their own roof.
The Case of Disqus
Valued for its ease of integration and its social features, Disqus remains one of the most widely recognised hosted commenting platform. However, it comes with well-documented drawbacks. Disqus operates as both a commenting service and a marketing and data company, collecting browsing data via tracking scripts and sharing it with third-party advertising partners. In 2021, the Norwegian Data Protection Authority notified Disqus of its intention to issue an administrative fine of approximately 2.5 million euros for processing user data without valid consent under the General Data Protection Regulation. However, following Disqus's response, the authority's final decision in 2024 was to issue a formal reprimand rather than impose the financial penalty. The proceedings nonetheless drew renewed attention to the privacy implications of relying on the platform. Site owners who prefer the convenience of a hosted service without those trade-offs may find more suitable alternatives in Hyvor Talk or CommentBox, both of which are designed around privacy-first principles and minimal setup.
Bridging the Gap: Talkyard and Discourse
Functioning as both a commenting system and a full community forum, Talkyard occupies an interesting position in the landscape. It can be embedded on a blog in the same manner as a traditional commenting widget, yet it also supports standalone discussion boards, making it a viable option for content creators who anticipate their audience outgrowing a simple comment section.
It also happens that Discourse operates on a similar principle but at greater scale, providing a fully featured forum platform that can be embedded as a comment section on external pages. Co-founded by Jeff Atwood (also a co-founder of Stack Overflow), Robin Ward and Sam Saffron, it is an open-source project whose server side is built on Ruby on Rails with a PostgreSQL database and Redis cache, while the client side uses Ember.js. Both Talkyard and Discourse are available as hosted services or as self-hosted installations, and both carry open-source codebases for those who wish to inspect or extend them.
Self-Hosting Discourse With Cloudflare CDN
For those who wish to take the self-hosted route, Discourse distributes an official Docker image that considerably simplifies deployment. The process begins by cloning the official repository into /var/discourse and running the bundled setup tool, which prompts for a hostname, administrator email address and SMTP credentials. A Linux server with at least 2 GB of memory is required, and a SWAP partition should be enabled on machines with only 1 GB.
Pairing a self-hosted instance with Cloudflare as a global CDN is a practical choice, as Cloudflare provides CDN acceleration, DNS management and DDoS mitigation, with a free tier that suits most community deployments. When configuring SSL, the recommended approach is to select Full mode in the Cloudflare SSL/TLS dashboard and generate an origin certificate using the RSA key type for maximum compatibility. That certificate is then placed in /var/discourse/shared/standalone/ssl/, and the relevant Cloudflare and SSL templates are introduced into Discourse's app.yml configuration file.
One important point during initial DNS setup is to leave the Cloudflare proxy status set to DNS only until the Discourse configuration is complete and verified, switching it to Proxied only afterwards to avoid redirect errors during first deployment. Email setup is among the more demanding aspects of running Discourse, as the platform depends on it for user authentication and notifications. The notification_email setting and the disable_emails option both require attention after a fresh install or a migration restore. Once configuration is finalised, running ./launcher rebuild app from the /var/discourse directory completes the build, typically within ten minutes.
Plugins can be added at any time by specifying their Git repository URLs in the hooks section of app.yml and triggering a rebuild. Discourse creates weekly backups automatically, storing them locally under /var/discourse/shared/standalone/backups, and these can be synchronised offsite via rsync or uploaded automatically to Amazon S3 if credentials are configured in the admin panel.
At a Glance
| Solution | Type | Best For |
|---|---|---|
| Cactus Comments | Federated, open source | Privacy-centric sites |
| Coral | Open source, hosted or self-hosted | Publishers and newsrooms |
| GraphComment | Hosted | Enhanced engagement and SEO |
| ReplyBox | Hosted | Simple static sites |
| Remarkbox | Hosted, optional self-host | Speed and simplicity |
| Utterances | Repository-backed | Developer-owned data |
| Remark42 | Self-hosted, open source | Privacy and control |
| Isso | Self-hosted, open source | Minimal footprint |
| Hyvor Talk | Hosted | Privacy-focused ease of use |
| CommentBox | Hosted | Clean design, minimal setup |
| Talkyard | Hosted or self-hosted | Comments and forums combined |
| Discourse | Hosted or self-hosted | Rich discussion communities |
| Disqus | Hosted | Ease of integration (privacy caveats apply) |
Closing Thoughts
None of the options surveyed here is without compromise. The hosted services ask you to accept some degree of cost, design constraint or data trade-off. The self-hosted and repository-backed tools demand technical time that can outweigh the benefit for a small or personal site. The federated approach is principled but asks readers to have, or create, a Matrix account before they can participate. It is entirely reasonable to weigh all of that and, as I did, conclude that going without comments is the right call for now. The landscape does shift, and a solution that is cumbersome today may become more accessible as these projects mature. In the meantime, knowing what exists and where the friction lies is a reasonable place to start.
Four technical portals that still deliver after decades online
The early internet was built on a different kind of knowledge sharing, one driven by individual expertise, community generosity and the simple desire to document what worked. Four informative websites that started in that era, namely MDN Web Docs, AskApache, WindowsBBS and Office Watch, embody that spirit and remain valuable today. They emerged at a time when technical knowledge was shared through forums, documentation and personal blogs rather than social media or algorithm-driven platforms, and their legacy persists in offering clarity and depth in an increasingly fragmented digital landscape.
MDN Web Docs
MDN Web Docs stands as a cornerstone of modern web development, offering comprehensive coverage of HTML, CSS, JavaScript and Web APIs alongside authoritative references for browser compatibility. Mozilla started the project in 2005 under the name Mozilla Developer Centre, and it has since grown into a collaborative effort of considerable scale. In 2017, Mozilla announced a formal partnership with Google, Microsoft, Samsung and the W3C to consolidate web documentation on a single platform, with Microsoft alone redirecting over 7,700 of its MSDN pages to MDN in that year.
For developers, the site is not merely a reference tool but a canonical guide that ensures standards are adhered to and best practices followed. Its tutorials, guides and learning paths make it indispensable for beginners and seasoned professionals alike. The site's community-driven updates and ongoing contributions from browser vendors have cemented its reputation as the primary source for anyone building for the web.
AskApache
AskApache is a niche but invaluable resource for those managing Apache web servers, built by a developer whose background lies in network security and penetration testing on shared hosting environments. The site grew out of the founder's detailed study of .htaccess files, which, unlike the main Apache configuration file httpd.conf, are read on every request and offer fine-grained, per-directory control without requiring root access to the server. That practical origin gives the content its distinctive character: these are not generic tutorials, but hard-won techniques born from real-world constraints.
The site's guides on blocking malicious bots, configuring caching headers, managing redirects with mod_rewrite and preventing hot-linking are frequently cited by system administrators and WordPress users. Its specificity and longevity have made it a trusted companion for those maintaining complex server environments, covering territory that mainstream documentation rarely touches.
WindowsBBS
WindowsBBS offers a clear window into the era when online forums were the primary hub for technical support. Operating in the tradition of classic bulletin board systems, the site has long been a resource for users troubleshooting Windows installations, hardware compatibility issues and malware removal. It remains completely free, sustained by advertisers and community donations, which reflects the ethos of mutual aid that defined early internet culture.
During the Windows XP and Windows 7 eras, community forums of this kind were essential for solving problems that official documentation often overlooked, with volunteers providing detailed answers to questions that Microsoft's own support channels would not address. While the rise of social media and centralised support platforms has reduced the prominence of such forums, WindowsBBS remains a testament to the power of community-driven problem-solving. Its straightforward structure, with users posting questions and experienced volunteers providing answers, mirrors the collaborative spirit that made the early web such a productive environment.
Office Watch
Office Watch has served as an independent source of Microsoft Office news, tips and analysis since 1996, making it one of the longer-running specialist publications of its kind. Its focus on Microsoft Office takes in advanced features and hidden tools that are seldom documented elsewhere, from lesser-known functions in Excel to detailed comparisons between Office versions and frank assessments of Microsoft's product decisions. That independence gives it a voice that official resources cannot replicate.
The site serves power users seeking to make the most of the software they use every day, with guides and books that extend its reach beyond the website itself. In an era where software updates are frequent and often poorly explained, Office Watch provides the kind of context and plain-spoken clarity that official documentation rarely offers.
The Enduring Value of Depth and Community
These four sites share a common thread: they emerged when technical knowledge was shared openly by experts and enthusiasts rather than filtered through algorithms or paywalls, and they retain the value that comes from that approach. Their continued relevance speaks to what depth, specificity and community can achieve in the digital world. While platforms such as Stack Overflow and GitHub Discussions have taken over many of the roles these sites once played, the original resources remain useful for their historical context and the quality of their accumulated content.
As the internet continues to evolve, the lessons from these sites are worth remembering. The most useful knowledge is often found at the margins, where dedicated individuals take the time to document, explain and share what they have learned. Whether you are a developer, a server administrator or an everyday Office user, these resources are more than archives: they are living repositories of expertise, built by people who cared enough to write things down properly.
AI infrastructure under pressure: Outages, power demands and the race for resilience
The past few weeks brought a clear message from across the AI landscape: adoption is racing ahead, while the underlying infrastructure is working hard to keep up. A pair of major cloud outages in October offered a stark stress test, exposing just how deeply AI has become woven into daily services.
At the same time, there were significant shifts in hardware strategy, a wave of new tools for developers and creators and a changing playbook for how information is found online. There is progress on resilience and efficiency, yet the system is still bending under demand. Understanding where it held, where it creaked and where it is being reinforced sets the scene for what comes next.
Infrastructure Stress and Outages
The outages dominated early discussion. An AWS incident that lasted around 15 hours and disrupted more than a thousand services was followed nine days later by a global Azure failure. Each cascaded across systems that depend on them, illustrating how AI now amplifies the consequences of platform problems.
This was less about a single point of failure and more about the growing blast radius when connected services falter. The effect on productivity was visible too: a separate 10-hour ChatGPT downtime showed how fast outages of core AI tools now translate into lost work time.
Power Demand and Grid Strain
Behind the headlines sits a larger story about electricity, grids and planning. Data centres accounted for roughly 4% of US electricity use in 2024, about 183 TWh and the International Energy Agency projects around 945 TWh by 2030, with AI as a principal driver.
The averages conceal stark local effects. Wholesale prices near dense clusters have spiked by as much as 267% at times, household bills are rising by about $16–$18 per month in affected areas and capacity prices in the PJM market jumped from $28.92 per megawatt to $329.17. The US grid faces an upgrade bill of about $720 billion by 2030, yet permitting and build timelines are long, creating a bottleneck just as demand accelerates.
Technical Grid Issues
Technical realities on the grid add another layer of challenge. Fast load swings from AI clusters, harmonic distortions and degraded power quality are no longer theoretical concerns. A Virginia incident in which 60 data centres disconnected simultaneously did not trigger a collapse but did reveal the fragility introduced by concentrated high-performance compute.
Security and New Failure Modes
Security risks are evolving in parallel. Agentic systems that can plan, reason and call tools open new failure modes. AI-enabled spear phishing appears to be 350% more effective than traditional attempts and could be 50 times more profitable, a worrying backdrop when outages already have a clear link to lost productivity.
Security considerations now reach into the tools people use to access AI as well. New AI browsers attract attention, and with that comes scrutiny. OpenAI's Atlas and Perplexity's Comet launched with promising features, yet researchers flagged critical issues.
Comet is vulnerable to "CometJacking", a malicious URL hijack that enables data theft, while Atlas suffered a cross-site request forgery weakness that allowed persistent code injection into ChatGPT memory. Both products have been noted for assertive data collection.
Caution and good hygiene are prudent until the fixes and policies settle. It is a reminder that the convenience of integrating models directly into browsing comes with a new attack surface.
Efficiency and Mitigation Strategies
Industry responses are gathering pace. Efficiency remains the first lever. Hyperscalers now report power usage effectiveness around 1.08 to 1.09, compared with more typical figures of 1.5 to 1.6. Direct chip cooling can cut energy needs by up to 40%.
Grid-interactive operations and more work at the edge offer ways to smooth demand and reduce concentration risk, while new power partnerships hint at longer-term change. Microsoft's agreement with Constellation on nuclear power is one example of how compute providers are thinking beyond incremental efficiency gains.
An emerging pattern is becoming visible through these efforts. Proactive regional planning and rapid efficiency improvements could allow computational output to grow by an order of magnitude, while power use merely doubles. More distributed architectures are being explored to reduce the hazard of over-concentration.
A realistic outlook sets data centres at around 3% of global electricity use by 2030, which is notable but still smaller than anticipated growth from electric vehicles or air conditioning. If the $720 billion in grid investment materialises, it could add around 120 GW of capacity by 2030, as much as half of which would be absorbed by data centres. The resilience gap is real, but it appears to be narrowing, provided the sector moves quickly to apply lessons from each failure.
Regional and Policy Responses
Regional policies are starting to encourage resilience too. Oregon's POWER Act asks operators to contribute to grid robustness, Singapore's tight focus on efficiency has delivered around a 30% power reduction even as capacity expands and a moratorium in Dublin has pushed growth into more distributed build-outs. On the U.S. federal government side, the Department of Homeland Security updated frameworks after a 2024 watchdog warning, with AI risk programmes now in place for 15 of the 16 critical infrastructure sectors.
Hardware Competition and Strategy
Competition is sharpening. Anthropic deepened its partnership with Google Cloud to train on TPUs, a move that challenges Nvidia's dominance and signals a broader rebalancing in AI hardware. Nvidia's chief executive has acknowledged TPUs as robust competition.
Another fresh entry came from Extropic, which unveiled thermodynamic sampling units, a probabilistic chip design that claims up to 10,000-fold lower energy use than GPUs for AI workloads. Development kits are shipping and a Z-1 chip is planned for next year, yet as with any radical architecture, proof at scale will take time.
Nvidia, meanwhile, presented an ambitious outlook, targeting $500 billion in chip revenue by 2026 through its Blackwell and Rubin lines. The US Department of Energy plans seven supercomputers comprising more than 100,000 Blackwell GPUs and the company announced partnerships spanning pharmaceuticals, industrials and consumer platforms.
A $1 billion investment in Nokia hints at the importance of AI-centric networks. New open-source models and datasets accompanied the announcements, and the company's share price surged to a record.
Corporate Restructuring
Corporate strategy and hardware choices also entered a new phase. OpenAI completed its restructuring into a public benefit corporation, with a rebranded OpenAI Foundation holding around $130 billion in equity and allocating $25 billion to health and AI resilience. Microsoft's stake now sits at about 27% and is worth roughly $135 billion, with technology rights retained through 2032. Both parties have scope to work with other partners. OpenAI committed around $250 billion to Azure yet retains the ability to use other compute providers. An independent panel will verify claims of artificial general intelligence, an unusual governance step that will be watched closely.
Search and Discovery Evolution
Away from infrastructure, the way audiences find and trust information is shifting. Search is moving from the old aim of ranking for clicks to answer engine optimisation, where the goal is to be quoted by systems such as ChatGPT, Claude or Perplexity.
The numbers explain why. Google handled more than five trillion queries in 2024, while generative platforms now process around 37.5 million prompt-like searches per day. Google's AI Overviews, which surface summary answers above organic results, have reshaped click behaviour.
Independent analyses report top-ranking pages seeing click-through rates fall by roughly a third where Overviews appear, with some keywords faring worse, and a Pew study finds overall clicks on such results dropping from 15% to 8%. Zero-click searches rose from around 56% to 69% between May 2024 and May 2025.
Chegg's non-subscriber traffic fell by 49% in this period, part of an ongoing dispute with Google. Google counters that total engagement in covered queries has risen by about 10%. Whichever way that one reads the data, the direction is clear: visibility is less about rank position and more about being cited by a summarising engine.
In practice, that means structuring content, so a model can parse, trust and attribute it. Clear Q&A-style sections with direct answers, followed by context and cited evidence, help models extract usable statements. Schema markup for FAQs and how-to content improves machine readability.
Measuring success also changes. Traditional analytics rarely show when an LLM quotes a source, so teams are turning to tools that track citations in AI outputs and tying those to conversion quality, branded search volume and more in-depth engagement with pricing or documentation. It is not a replacement for SEO so much as a layer that reinforces it in an AI-first environment.
Developer Tools and Agentic Workflows
On the tools front, developers saw an acceleration in agent-centred workflows. Cursor launched its first in-house coding model, Composer, which aims for near-frontier quality while generating code around four times faster, often in under 30 seconds.
The broader Cursor 2.0 update added multi-agent capabilities, with as many as eight assistants able to work in parallel, alongside browsing, a test browser and voice controls. The direction of travel is away from single-shot completions and towards orchestration and review. Tutorials are following suit, demonstrating how to scaffold tasks such as a Next.js to-do application using planning files, parallel agent tasks and quick integration, with voice prompts in the loop.
Open-source and enterprise ecosystems continue to expand. GitHub introduced Agent HQ for coordinating coding agents, Google released Pomelli to generate marketing campaigns and IBM's Granite 4.0 Nano models brought larger on-device options in the 350 million to 1.5 billion parameter range.
FlowithOS reported strong scores on agentic web tasks, while Mozilla announced an open speech dataset initiative, and Kilo Code, Hailuo 2.3 and other projects broadened choice across coding and video. Grammarly rebranded as Superhuman, adding "Superhuman Go" agents to speed up writing tasks.
Creative Tools and Partnerships
Creative workflows are evolving quickly, too. Adobe used its MAX event to add AI assistants to Photoshop and Express, previewed an agent called Project Moonlight, and upgraded Firefly with conversational "Prompt to Edit" controls, custom image models and new video features including soundtracks and voiceovers. Partnerships mean Gemini, Veo and Imagen will sit inside Adobe tools, and Premiere's editing capabilities now extend to YouTube Shorts.
Figma acquired Weavy and rebranded it as Figma Weave for richer creative collaboration, and Canva unveiled its own foundation "Design Model" alongside a Creative Operating System meant to produce fully editable, AI-generated designs. New Canva features take in a revised video suite, forms, data connectors, email design, a 3D generator and an ad creation and performance tool called Grow, while Affinity is relaunching as a free, integrated professional app. Other entrants are trying to blend model strengths: one agent was trailed with Sora 2 clip stitching, Veo 3.1 visuals and multimodel blending for faster design output.
Music rights and AI found a new footing. Universal Music Group settled a lawsuit with Udio, the AI music generator, and the two will form a joint venture to launch a licensed platform in 2026. Artists who opt in will be paid both for training models on their catalogues and for remixes. Udio disabled song downloads following the deal, which annoyed some users, and UMG also announced a "responsible AI" alliance with Stability AI to build tools for artists. These arrangements suggest a path towards sanctioned use of style and catalogue, with compensation built in from the start.
Research and Introspection
Research and science updates added depth. Anthropic reported that its Claude system shows limited introspection, detecting planted concepts only about 20% of the time, separating injected "thoughts" from text and modulating its internal focus. That highlights both the promise and limits of transparency techniques, and the potential for models to conceal or fail to surface certain internal states.
UC Berkeley researchers demonstrated an AI-driven load balancing algorithm with around 30% efficiency improvements, a result that could ripple through cloud performance. IBM ran quantum algorithms on AMD FPGAs, pointing to progress in hybrid quantum-classical systems.
OpenAI launched an AI-integrated web browser positioned as a challenger to incumbents, Perplexity released a natural-language patents search and OpenAI's Aardvark, a GPT-5-based security agent, entered private beta.
Anthropic opened a Tokyo office and signed a cooperation pact with Japan's AI Safety Institute. Tether released QVAC Genesis I, a large open STEM dataset of more than one million data points and a local workbench app aimed at making development more private and less dependent on big platforms.
Age Restrictions and Policy
Meanwhile, policy considerations are reaching consumer platforms. Character AI will restrict users under 18 from open-ended chatbot conversations from late November, replacing them with creative tools and adding behaviour-based age detection, a response to pressure and proposals such as the GUARD Act.
Takeaways
Put together, the picture is one of rapid interdependence and swift correction. The infrastructure is not breaking, but it is being stretched, and recent failures have usefully mapped the weak points. If the sector continues to learn quickly from its own missteps, the resilience gap will continue to narrow, and the next round of outages will be less disruptive than the last.
Investment is flowing into grids and cooling, policy is nudging towards resilience, and compute providers are hedging hardware bets by searching for efficiency and supply assurance. On the application layer, agents are becoming a primary interface for work, creative tools are converging around editability and control, and discovery is shifting towards being quoted by machines rather than clicked by humans.
Security lapses at the interface are a reminder that novelty often arrives before maturity. The most likely path from here is uneven but forward: data centre power may rise, yet efficiency and distribution can blunt the impact; answer engines may compress clicks, yet they can send higher intent visitors to clear, well-structured sources; hardware competition may fragment the stack, yet it can also reduce concentration risk.
A look at the Julia programming language
Several open-source computing languages get mentioned when talking about working with data. Among these are R and Python, but there are others; Julia is another one of these. It took a while before I got to check out Julia because I felt the need to get acquainted with R and Python beforehand. There are others like Lua to investigate too, but that can wait for now.
With the way that R is making an incursion into clinical data reporting analysis following the passage of decades when SAS was predominant, my explorations of Julia are inspired by a certain contrariness on my part. Alongside some small personal projects, there has been some reading in (digital) book form and online. Concerning the latter of these, there are useful tutorials like Introduction to Data Science: Learn Julia Programming, Maths & Data Science from Scratch or Julia Programming: a Hands-on Tutorial. Like what happens with R, there are online versions of published books available free of charge, and they include Interactive Visualization and Plotting with Julia. Video learning can help too and Jane Herriman has recorded and shared useful beginner's guides on YouTube that start with the basics before heading onto more advanced subjects like multiple dispatch, broadcasting and metaprogramming.
This piece of learning has been made of simple self-inspired puzzles before moving on to anything more complex. That differs from my dalliance with R and Python, where I ventured into complexity first, not least because of testing them out with public COVID data. Eventually, I got around to doing that with Julia too, though my interest was beginning to wane by then, and Julia's abilities for creating multipage PDF files were such that the PDF Toolkit was needed to help with this. Along the way, I have made use of such packages as CSV.jl, DataFrames.jl, DataFramesMeta, Plots, Gadfly.jl, XLSX.jl and JSON3.jl, among others. After that, there is PrettyTables.jl to try out, and anyone can look at the Beautiful Makie website to see what Makie can do. There are plenty of other packages creating graphs, such as SpatialGraphs.jl, PGFPlotsX and GRUtils.jl. For formatting numbers, options include Format.jl and Humanize.jl.
So far, my primary usage has been with personal financial data together with automated processing and backup of photo files. The photo file processing has taken advantage of the ability to compile Julia scripts for added speed because just-in-time compilation always means there is a lag before the real work begins.
VS Code is my chosen editor for working with Julia scripts, since it has a plugin for the language. That adds the REPL, syntax highlighting, execution and data frame viewing capabilities that once were added to the now defunct Atom editor by its own plugin. While it would be nice to have a keyboard shortcut for script execution, the whole thing works well and is regularly updated.
Naturally, there have been a load of queries as I have gone along and the Julia Documentation has been consulted as well as Julia Discourse and Stack Overflow. The latter pair have become regular landing spots on many a Google search. One example followed a glitch that I encountered after a Julia upgrade when I asked a question about this and was directed to the XLSX.jl Migration Guides where I got the information that I needed to fix my code for it to run properly.
There is more learning to do as I continue to use Julia for various things. Once compiled, it does run fast like it has been promised. The syntax paradigm is akin to R and Python, but there are Julia-specific features too. If you have used the others, the learning curve is lessened but not eliminated completely. This is not an object-oriented language as such, but its functional nature makes it familiar enough for getting going with it. In short, the project has come a long way since it started more than ten years ago. There is much for the scientific programmer, but only time will tell if it usurped its older competitors. For now, I will remain interested in it.
Broadening data science horizons: Useful Python packages for working with data
My response to changes in the technology stack used in clinical research is to develop some familiarity with programming and scripting platforms that complement and compete with SAS, a system with which I have been programming since 2000. While one of these has been R, Python is another that has taken up my attention, and I now also have Julia in my sights as well. There may be others to assess in the fullness of time.
While I first started to explore the Data Science world in the autumn of 2017, it was in the autumn of 2019 that I began to complete LinkedIn training courses on the subject. Good though they were, I find that I need to actually use a tool to better understand it. At that time, I did get to hear about Python packages like Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn and Beautiful Soup though it took until of spring of this year for me to start gaining some hands-on experience with using any of these.
During the summer of 2020, I attended a BCS webinar on the CodeGrades initiative, a programming mentoring scheme inspired by the way classical musicianship is assessed. In fact, one of the main progenitors is a trained classical musician and teacher of classical music who turned to Python programming when starting a family to have a more stable income. The approach is that a student selects a project and works their way through it, with mentoring and periodic assessments carried out in a gentle and discursive manner. Of course, the project has to be engaging for the learning experience to stay the course, and that point came through in the webinar.
That is one lesson that resonates with me with subjects as diverse as web server performance and the ongoing pandemic supplying data, and there are other sources of public data to examine as well before looking through my own personal archive gathered over the decades. Though some subjects are uplifting while others are more foreboding, the key thing is that they sustain interest and offer opportunities for new learning. Without being able to dream up new things to try, my knowledge of R and Python would not be as extensive as it is, and I hope that it will help with learning Julia too.
In the main, my own learning has been a solo effort with consultation of documentation along with web searches that have brought me to the likes of Real Python, Stack Abuse, Data Viz with Python and R and others for longer tutorials as well as threads on Stack Overflow. Usually, the web searching begins when I need a steer on a particular or a way to resolve a particular error or warning message, but books are always worth reading even if that is the slower route. While those from the Dummies series or from O'Reilly have proved must useful so far, I do need to read them more completely than I already have; it is all too tempting to go with the try the "programming and search for solutions as you go" approach instead.
To get going, many choose the Anaconda distribution to get Jupyter notebook functionality, but I prefer a more traditional editor, so Spyder has been my tool of choice for Python programming and there are others like PyCharm as well. Because Spyder itself is written in Python, it can be installed using pip from PyPi like other Python packages. It has other dependencies like Pylint for code management activities, but these get installed behind the scenes.
The packages that I first met in 2019 may be the mainstays for doing data science, but I have discovered others since then. It also seems that there is porosity between the worlds of R and Python, so you get some Python packages aping R packages and R has the Reticulate package for executing Python code. There are Python counterparts to such Tidyverse stables as dplyr and ggplot2 in the form of Siuba and Plotnine, respectively. Though the syntax of these packages are not direct copies of what is executed in R, they are close enough for there to be enough familiarity for added user-friendliness compared to Pandas or Matplotlib. The interoperability does not stop there, for there is SQLAlchemy for connecting to MySQL and other databases (PyMySQL is needed as well) and there also is SASPy for interacting with SAS Viya.
While Python may not have the speed of Julia, there are plenty of packages for working with larger workloads. Of these, Dask, Modin and RAPIDS all have their uses for dealing with data volumes that make Pandas code crawl. As if to prove that there are plenty of libraries for various forms of data analytics, data science, artificial intelligence and machine learning, there also are the likes of Keras, TensorFlow and NetworkX. These are just a selection of what is available, and there is always the possibility of checking out others. It may be tempting to stick with the most popular packages all the time, especially when they do so much, but it never hurts to keep an open mind either.