Technology Tales

Adventures in consumer and enterprise technology

TOPIC: COMPUTER DATA

Synthetic Data: The key to unlocking AI's potential in healthcare

18th July 2025

The integration of artificial intelligence into healthcare is being hindered by challenges such as data scarcity, privacy concerns and regulatory constraints. Healthcare organisations face difficulties in obtaining sufficient volumes of high-quality, real-world data to train AI models, which can accurately predict outcomes or assist in decision-making.

Synthetic data, defined as algorithmically generated data that mimics real-world data, is emerging as a solution to these challenges. This artificially generated data mirrors the statistical properties of real-world data without containing any sensitive or identifiable information, allowing organisations to sidestep privacy issues and adhere to regulatory requirements.

By generating datasets that preserve statistical relationships and distributions found in real data, synthetic data enables healthcare organisations to train AI models with rich datasets while ensuring sensitive information remains secure. The use of synthetic data can also help address bias and ensure fairness in AI systems by enabling the creation of balanced training sets and allowing for the evaluation of model outputs across different demographic groups.

Furthermore, synthetic data can be generated programmatically, reducing the time spent on data collection and processing and enabling organisations to scale their AI initiatives more efficiently. Ultimately, synthetic data are becoming a critical asset in the development of AI in healthcare, enabling faster development cycles, improving outcomes and driving innovation while maintaining trust and security.

Copying only updated new or updated files by command line in Linux or Windows

2nd August 2014

With a growing collection of photographic images, I often find myself making backups of files using copy commands and the data volumes are such that I don't want to keep copying the same files over and over again, so incremental file transfers are what I need. So commands like the following often get issued from a Linux command line:

cp -pruv [source] [destination]

Because this is on Linux, it is the bash shell that I use, so the switches may not apply with others like ssh, fish or ksh. For my case, p preserves file properties such as its time and date and the cp command does not do this always, so it needs adding. The r switch is useful because the copy then in recursive, so only a directory needs to be specified as the source and the destination needs to be one level up from a folder with the same name there to avoid file duplication. It is the u switch that makes the file copy incremental, and the v one issues messages to the shell that show how the copying is going. Seeing a file name issued by the latter does tell you how much more needs to be copied and that the files are going where they should.

What inspired this post though is my need to do the same in a Windows session, and issuing xcopy commands will achieve the same end. Here are two that will do the needful:

xcopy [source] [destination] /d /s

xcopy [source] [destination] /d /e

In both cases, it is the d switch that ensures that the copy is incremental, and you can add a date too, with a colon between it and the /d, if you see fit. The s switch copies only directories that contain files, while the e one copies even empty directories. Using the d switch without either of those did not trigger any copying action when I tried, so I reckon that you cannot do without either of them. By default, both of these commands issue output to the command line so you can keep an eye on what is happening, and this especially is useful when ensuring that files are going to the right destination because the behaviour differs from that of the bash shell on Linux.

Using Data Step to Create a Dataset Template from a Dataset in SAS

23rd November 2010

Recently, I wanted to make sure that some temporary datasets that were being created during data processing in a dataset creation program weren't truncating values or differed from the variable lengths in the original. It was then that a brainwave struck me: create an empty dataset shell using data step, and use that set all the variable lengths for me when the new datasets were concatenated to it. The code turned out to be very simple and here is an example of how it looked:

data shell;
    stop;
    set example;
run;

The STOP statement, prevents the data step from reading in any of the values in the template dataset and just its header is written out to another (empty) dataset that can be used to set things up as you would want them to be. It certainly was a quick solution in my case.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.