Blessed be the Population

Today we will bend our minds around the concept and benefits of Population Calling. In the course of recent events this is an obvious choice. Not just because the concept has great potential in health care; being immersed in the subject by releasing a product in this space sort of defines my mindset. We use the term Population Calling; some call this Joint Calling, others call it Pooled Calling.

In my current setting I could easily vote for the latter, as I am spending some leisure time at a pool in Turkey. I will resist the temptation to associate this with, and elaborate on the vocation of a lifeguard (even though we could craft some strong analogies), and focus on the serious aspect of two models for NGS data processing, their use cases and what is required to make Population Calling scalable, usable and widely deployed.

Improving Quality methods

Population Calling is an encompassing term which means: submit multiple samples into a single variant call run and use the aggregate information of all samples to improve the sensitivity and precision for (a) each sample in the group or (b) the group treated as a single entity.

 When focusing on single sample (a) results, aggregated group information, both positive and negative, is applied to be more sensitive to variants that appear with high frequency in the sample group, and be less sensitive to variants which appear with low frequency in the sample group. For different cases different consensus based call improvement strategies may be applied, including strategies to identify and suppress systematic errors stemming from library prep, sequencing, mapping or variant calling.

When focusing on the group (b) as a single entity, shallow coverage samples are used and ‘stacked’ together as if they were a single sample. This would highlight the calls in the ‘group’ where each sample would not have sufficient ‘body’ to produce a reliable call. We refer to this special method of Population Calling as Stacked Calling.

Stacked Calling is beneficial to get insight in the profile of the ‘group’ and the individual result is not of primary interest. For example in demographic or homogeneous cohort studies, or when a large number of samples are taken from a harvest or breeding line in the AgBio Industry, Stacked Calling will provide a high quality answer at significantly reduced cost and time to result.

Genetic overview

Next to the quality improvement element, Population Calling provides the overview of the genetic profile of a ‘population’. This is of specific use for association studies comparing the genomic profile of different groups using exome, full genome or target data.

In the popular sense populations are thought of as very large numbers of samples. Nationwide studies quoting 100,000 samples are well covered in the press. In phase 3 clinical testing, the studied group ranges from 400 to 800 samples. A larger scale study may include 5,000 to 10,000 patient samples and when clinics start producing data from genetic tests, cohorts over 100,000 patient samples can easily be generated over time. Any growing group in this context could at some point qualify for further categorization leading to more focus groups.

Benefits of Population Calling

Be aware though that cohort studies in the range of 50 to 100 will also benefit from Population Calling, as we have seen with the 17-sample Illumina’s Platinum Genomes pedigree set (without injecting pedigree knowledge).

Note that for Population Calling all functionality in terms of noise reduction, INDEL optimization for short reads, context examination and filtering, as is ‘common practice’ for single sample Variant Calling, is applied. Combining the power of many samples adds a true big data challenge to today’s common practice of having sufficient difficulty dealing with the time and resource required to do single sample variant calling.

Time to change

As a result, to date, Population Calling has been the preserved territory of the happy few who have data, knowledge, resource, budget, stamina and the patience to wait for the results. Cases are known of running and nurturing large clusters for months before results are produced. This must change, as there is too much value in the concept to leave it where it is.

We do not have enough space this time to get into the depth of the mechanisms which are required to make population calling work in a dynamic environment and efficient enough to allow both researchers and clinicians to get maximum benefit of the method. For now, I will direct you to our website and recent online event to show the power of population calling at a price point which makes it a viable option for everybody. And again, don’t ask me how. We end with our motto of economical health in a healthy economy. Blessed be the Population!