The Crucial Role of Release Control in R for Healthcare Organizations

The R programming language, a general purpose language developed by statisticians that grows dynamically through the contributions of a worldwide community of developers, is a common choice for serious statistical work.
Author

Guest Blog Post

Published

June 24, 2024

Guest blog contributed by Ning Leng, People and Product Leader, Roche-Genentech; Eric Nantz, Director, Eli Lilly and Company; Ben Straub, Principal Programmer, GSK; Sam Parmar, Statistical Data Scientist, Pfizer

Supporting the science of drug development requires computational tools with careful implementations of core statistical functions and data structures. The R programming language, a general purpose language developed by statisticians that grows dynamically through the contributions of a worldwide community of developers, is a common choice for serious statistical work. However, managing new versions of the core R language and the hundreds of specialized libraries (called packages in R) necessary to support multiple development groups in a way that ensures the consistency, reproducibility, and reliability of results poses many practical challenges

The FDA, for example, requires that the software and tools supporting a clinical trial submission are capable of producing reproducible results for an extended period of time. This means submitting code based on a version of R that is sufficiently tested and stable yet new enough to support the critical R packages over the required FDA time horizon.

So, how is the R environment release managed across different healthcare organizations? We interviewed individuals from different pharma companies to learn their internal approaches to keep their R environment up-to-date and secure.

Here is what we have: four companies and four somewhat complex bespoke solutions. It seems likely that if we interviewed a hundred representatives from a hundred different companies we would get at least a hundred different solutions. It is also not difficult to imagine that multiple protocols for managing R and package versions imposed a fairly complex project management solution on the FDA as it simultaneously deals with submissions from multiple sponsors.

In the R Consortium’s R Submissions Work Group meeting we have been discussing whether there might be a simple solution for at least dealing with the R versioning problem that might serve as a de facto standard for the industry. One suggestion that has gained some traction is that sponsors use the previous minor and latest patched R version for a submission. For example, if R version 4.4.0 is currently available then it is suggested that a sponsor uses the latest patch version (4.3.z). If R version 4.5.0 becomes available, then it is suggested that a sponsor uses the latest patch version (4.4.z). This ensures that the minor version is stable and most likely available to all stakeholders. Of course, if a version change eliminates a security problem, that might be preferred. (Note that R versions are organized R x.y.z where, x is the major version, y is the minor version, and z is the patch version.)

We would love to hear what you think. Please, go to Issue number 117 on the GitHub repository of our working group and leave a comment.