Discussing the minimum effective dose of biology a biostatistician should comprehend to collaborate with life scientists (By Romain-Daniel Gosselin, 6-7 min read)
I am among those who have dedicated a large part of their activity to educating life scientists to biostatistics 101 and good statistical practices. I have been doing it since 2014. Yet, although improving statistical literacy among my fellow biologists is important, the question we will ask today pertains to the exact corollary: how much biology should a biostatistician know? In the present post, I aggregate the results of anecdotal discussions I had with colleagues over the years, those I stumbled onto on the Internet, and my own opinions.
Is the question relevant in the first place?
At first, the mathematical science mastered by statisticians seems so immovable that it may seem strange to make them acquire any biology when they work as biostatisticians. After all, who cares how much biology a biostatistician knows? A data is a data, and it is the experimenters in need of guidance who will have to adapt their design or analysis to the statistical truth, right?
Let’s be clear, biostatisticians do not vitally need any training in biology to perform daily biostatistics tasks, they can get through by applying probability, mathematical statistics, and programming they learned and adapt on a case by case basis. That is the position taken by the proponents of a No Biology hard line among them. However, the question is debated and the collective answer is far from consensual. So, what could be the purpose of knowing some biology? I will only mention three potential motivations:
- Fluidify communication with researchers. By understanding well the science behind the research design, collected data, and analyses, you will communicate more effectively, work more comfortably and reinforce trust and credibility.
- Improve work quality. Although you may deliver biologically-agnostic assistance considering data as data, having some background knowledge of biological systems or techniques involved helps work more accurately and enables the delivery of tailored and more relevant comments.
- Empower you to adopt a proactive approach. Some biology will arm you to anticipate the methodological elements that will be relevant in the future by knowing the current details of the research project.
OK, fair enough, but how much biology? Short answer: it depends…
…on the type of professional partnership you have.
Depending on whether you are an external contractor, a consultant, or an in-house biostatistician, the level of familiarity with the organisation’s biology may differ. Contractors and consultants, unless they are specialists in particular fields (eg.: clinical trials, omics), may remain a little stranger to the core biology behind the mandated work. In-house biostatisticians, on the other hand, should often be more accustomed to the research domain.
…on the domain you get to collaborate with.
The knowledge depth will not be the same depending on the field of the laboratories you collaborate with, because the familiarity with statistics of the local life science community differs. Your biology expertise may be less critical when interacting with quantitative biologists or bioinformaticians who already worked out most of the upstream design and programming framework and require very specific questions than with less quantitative biologists asking for a complete design and analytic workflow. The flip side is that the apparently advanced literacy of some life scientists may mask questionable statistical habits perpetuated for ages in their field (eg.: over-reliance on specific tests, misconceptions of test assumptions, inappropriate use of specific regression models, or suboptimal correction for multiple testing). Providing adequate and convincing solutions for these entrenched traditions may precisely require some more advanced conceptual and technical grasp of biology.
…on who you work with exactly.
Regardless of the global statistical depth of field you enter into, the individuals you work with daily will have their own statistical literacy. Even when working with a lab from a very quantitative field, you may directly interact with a biologist who has a very entry level in statistics asking you for guidance from A to Z. Your biology knowledge will catalyze efficacy and save you a lot of time.
…on what you are hired for.
Being hired by a researcher to review a series of grant applications or protocols to identify weaknesses in design or suggest improvements in planned analyses requires less knowledge of the underlying biology and lab techniques than a potential long-term involvement in a complex, multivariable, longitudinal and computationally-heavy project.
Get enough to understand the jargon.
There is a significant (to say the least) language gap between biologists and mathematicians, which suddenly stands out a mile when one is stuck in a room with members of both corporations. If you are a statistician, you certainly already felt it when terms you employed like model fitting, residuals, maximum likelihood, or stochastic superiority clearly failed to ring any bell on your interlocutor’s side. Having an entry level of the fundamental jargon used in life science laboratories will help you process information more quickly, efficiently, and professionally.
The biology lingo.
Beyond any niche knowledge tied to specific laboratories, acquiring some familiarity with a short list of foundational concepts is the key to work with labs across biological domains. Acquire an introductory grasp of the items listed in Table 1. For example, prepare a one-page summary for each concept, which approximates a concise A-level or first-year undergraduate overview, and read it as needed.

Once again, the idea is not to try to learn as much as you can, but to get enough beginner literacy to work with biologists. Think about you and what you expect from biomedical researchers during your collaborations. You do not assume any random biologist to show mastery in differentiating between generalized estimating equation and generalized linear models, or to know the method of moments for estimation. No serious life scientist will require that you know which protein domains are phosphorylated and interact between their favorite transcription factors to bind to a specific DNA sequence, or to know the molecular pathophysiology of hypovolemic shock. If they do, you probably better run away.
The laboratory lingo.
In the laboratory, a specific jargon is used to describe the research done. To make things even more confusing, not only is this vocabulary only partially overlapping across disciplines of life sciences, but it is also often different from the one statisticians use. Independent or dependent variables, levels, units, blocks, or contrasts for examples are terms virtually never heard in many biology settings. Even worse, some of these idioms have a double-meaning in life science and statistics. For instance the term "sample" does not automatically refer to a group of randomized units, but may be used for a single vial, swab or collected observation, in other words a sampling or experimental unit. In that specific case, one may understand that life scientists tend to easily mix up the concepts of distributions of observations and sampling distributions, laying the ground for many misconceptions. Another example is embodied by the terms probability, risk, and likelihood, which are typically used interchangeably in plain English including at the bench but take on different meanings in statistics. You will have to become accustomed to terms employed in the lab, translate them into practically usable statistical concepts, and clarify them with your collaborators whenever needed.
Advanced and knowledge if you get specialized.
If you get more specialized, either because you identify a niche you wish to excel in or because you work in a particular institute, you may clearly benefit from acquiring a more advanced literacy in the associated concepts. Of course, if you are hired in such an environment, the learning process may occur passively through meetings, discussions, seminars, or exploring the data. However, the learning curve may be steep and you could be well inspired to actively acquire some specific notions in relation to this research ecosystem, which would match some lessons seen at the bachelor level by life science students. Table 2 gives a list (of course non-exhaustive) of some examples.

Concluding advice: understand what “research” entails in your life-science sub-field.
On data frames and spreadsheets, numbers look like numbers whether they are in vitro protein quantifications or calcium recordings from mice cerebral cortices. However, before you grumble about limited sample sizes or high variance and suggest exhaustive redesign, it is useful to know that the former will likely take days or perhaps weeks to be recollected while the latter may require months of animal surgery or perhaps be impossible to re-do. Along the same lines, wet lab research or field research may be very resource-consumming, both financially and in terms of workload. Even a handful of new experiments sometimes costs dozens of thousands of CHF/Euros/USD of reagents (without salaries!), and weeks spent in the lab, evenings and week-ends included. Even the term “experiment” itself has an unclear definition depending on the experimenter, technique or field, ranging from a single cell labeling to a multitechnique and multivariable factorial design over months. Finally, you should have a good grasp of the ethical stakes tied to the field of research you work in, such as the 3R (replacement, reduction, refinement) in animal research, or ethics of research on human beings.
Your ability to integrate some concrete procedural and logistical knowledge of life science research in your biostatistics consulting will make your recommendations and analyses more powerful. It may even empower you with an ability to anticipate previously unseen issues that might arise in the future of the project.

Images created with Chat-GPT, text 100% written by RDG

