16:58, 15th June 2021
Error Bar Plot in R - Adding Error Bars
Error bars are graphical tools used in data visualisation to represent the variability or uncertainty within a dataset, commonly expressing one standard deviation, one standard error or a 95% confidence interval. A smaller standard deviation bar suggests that data points cluster closely around the mean, indicating greater reliability, whilst a larger one signals wider spread and less reliability. Overlapping standard deviation bars may hint that differences between groups are not statistically significant, whereas non-overlapping bars suggest a potentially significant difference, though a formal statistical test is always required before drawing any firm conclusion.
In R, the ggplot2 package provides a straightforward means of creating error bar plots, with data first summarised to calculate means and standard deviations using functions such as ddply or aggregate. From there, bar charts and line graphs can each be enhanced with error bars using the geom_errorbar function, and these can reflect either full symmetrical ranges or upper bars only, depending on the analytical need.
16:49, 11th June 2021
Problem Note 41684: RTF output appears truncated when a very long text string spans multiple pages
A known issue in SAS Base affects RTF output when the content within a single table cell is long enough to span multiple pages, causing Microsoft Word to display only the first page despite the full content being present in the underlying file. In SAS 9.4M4, the problem was resolved through the introduction of the NOTRKEEP option in the ODS RTF statement. For earlier versions, two workarounds are available: using the MSOFFICE2K tagset to generate the RTF output, or employing a data step to split the long content into multiple observations, as demonstrated in Sample 24672.
10:53, 11th June 2021
Adding a Column to a Pandas DataFrame Based on an If-Else Condition
A dataset of over 4,000 tweets was analysed using Python to determine whether posts containing images receive more likes and retweets than those without. Using NumPy's where() function, a new Boolean column was added to a Pandas DataFrame to flag whether each tweet contained an image, and the results indicated that image-based tweets averaged nearly three times as many likes and retweets as those without images.
To explore the data further, NumPy's select() function was applied to categorise tweets into four engagement tiers based on like counts, revealing that while images appeared to improve performance, they were not a guarantee of success, with over 83% of the highest-performing tweets still having no image attached. The broader technical takeaway is that both np.where() and np.select() offer straightforward and practical methods for adding new columns to a Pandas DataFrame based on conditional logic applied to existing data.
11:37, 9th June 2021
5 Tasks To Automate With Python
Python offers a range of practical automation capabilities that can save time and reduce repetitive effort in daily workflows. A Mac-based script using the mac-say library can convert any file into an audiobook, while a simple requests-based script can retrieve weather reports for any city on demand. Currency conversion can be handled just as easily through the currencyconverter library, which allows quick conversions between currencies directly from the command line. For those who struggle with a disorganised folder of downloads, a watchdog-based script can monitor a specified directory and automatically sort incoming files into subfolders based on their type, covering images, PDFs, videos and audio files. Finally, a morning setup script using Python's built-in webbrowser module can open a predefined set of browser tabs automatically, removing the need to do so manually each day. Together, these examples illustrate how Python can be applied to everyday tasks with relatively little code and a handful of third-party libraries.
13:57, 3rd June 2021
Usage Note 64615: The SAS log displays the error "Invalid JSON in input near line XXX column XXX: Some code points did not transcode"
When using a LIBNAME statement with the JSON engine in SAS, an error may appear in the log stating that invalid JSON was encountered and that some code points did not transcode. This occurs because UTF-8 characters in the dataset do not map to the default SAS session encoding. The recommended fix is to open SAS in Unicode mode by navigating to the Start menu and selecting the Unicode Support option under SAS 9.4.
09:10, 3rd June 2021
Working With JSON Data in Python
Handling data in modern applications often involves working with structured formats like JSON. This format is widely used for transferring information between systems or storing data in document-oriented databases. Python provides robust tools to manage JSON data, enabling seamless conversion between Python objects and JSON strings. Understanding the syntax of JSON is essential, as it relies on key-value pairs and supports nested structures. When converting between Python and JSON, care must be taken due to differences in data types, such as the absence of JSON equivalents for Python-specific types like sets.
Writing and reading JSON files is straightforward with Python’s built-in libraries, allowing for both simple and complex data structures to be persisted or retrieved efficiently. Validating JSON syntax ensures that data remains consistent and error-free, which is crucial when dealing with external sources. Techniques such as pretty-printing JSON in the terminal enhance readability, while minifying JSON reduces file size for storage or transmission.
These practices are particularly useful when working with APIs or managing large datasets. Python’s flexibility in handling JSON data makes it a valuable tool for developers, whether they are building web applications, processing data, or interacting with external services. Mastery of these concepts equips developers to manage data effectively across various domains, from backend systems to data science workflows.
09:10, 3rd June 2021
Python's json.dumps() function serialises Python objects such as dictionaries, lists and strings into JSON-formatted strings, making it useful for tasks like sending data through APIs or storing structured data. The function accepts several optional parameters that control its behaviour, including indent for formatting output with readable spacing, sort_keys for arranging dictionary keys alphabetically, skipkeys for automatically ignoring incompatible key types such as tuples, ensure_ascii for handling non-ASCII characters, allow_nan for permitting special numerical values, and separators for customising how items and key-value pairs are divided. The function always returns a string object, meaning that a Python dictionary passed through it will produce a string representation rather than a dictionary, and the same principle applies to lists, which are converted into JSON arrays.
09:09, 3rd June 2021
Writing to a File with Python's print() Function
Python's print() function can be configured to write output to files instead of the console by redirecting the standard output stream, a process achievable both through command-line execution and within scripts using the sys module. By temporarily assigning the sys.stdout object to a file, text can be directed to a file during script execution, with the original output restored afterward to avoid unintended side effects. Similarly, the standard error stream can be redirected for error messages and the file parameter in the print() function offers a more direct method to specify output destinations without altering the global stdout setting, allowing for flexible and targeted file writing within Python programs.
11:27, 28th May 2021
Programiz is a programming education platform used by around 4.9 million people each month, with a mission to make learning to code more accessible and straightforward. Operated by a small team of dedicated developers, it offers a broad range of free tutorials covering languages and technologies such as Python, Java, C, C++, JavaScript, SQL, HTML, CSS, TypeScript, Kotlin, Swift, Rust, Ruby and Go, alongside paid courses for those seeking more structured learning. The platform also provides online compilers and editors for a wide variety of programming languages, allowing users to write and run code directly in their browser. Recognising that written content alone is not sufficient for an effective learning experience, Programiz has extended its offering to include mobile applications for iOS and Android, covering languages such as Python, C, Java and C++.
11:25, 28th May 2021
fpdf2 is a Python library designed for simple and fast PDF document generation, forked from the earlier PyFPDF project. It supports Python 3.10 and above, offering a broad range of features including Unicode font embedding for a wide variety of languages, image embedding, SVG import, barcode and chart generation, table creation, HTML to PDF conversion and document encryption and signing. The library integrates with popular frameworks such as Django, Flask and FastAPI, and has been adopted by several open-source projects. It is available via PyPI and maintained by a community of contributors, with more than 1,300 unit tests and validation through multiple PDF checkers ensuring reliability across releases.