The Quality of Speed

In our first blog post “Big Data: fear and challenge” we wrote about the habit to store data with very detailed metadata. It is rooted in the hesitation to throw away what might be of use in future. This could be for in-depth analysis at later stage, compliance, legal defense or simple inspection/reference.

All valid reasons which come at a cost, which needs to be balanced with a benefit that is beyond emotion. As we pointed out in our earlier blog, it boils down to what choices are made and what responsibilities we as individuals, or as a community are willing to carry.

Speed or quality?

Today’s topic carries a seemingly similar theme: reluctance to move to high-speed methods for data processing, caused by a perceived implicit choice between speed and quality. This is based on the hypothesis that thorough data processing takes time, and a dramatic speed increase requires one to be more shallow.

Let’s take a deeper look at the topic. As we moved through time, we have been presented with new technologies that, even though been met with resistance at first, changed the shape of our society completely.

Each and every of these technological changes carries at least two elements: speed increase and size reduction. Consider the Personal Computer, video processing, telecommunication, medical equipment, sequencers or the Internet of Things (IoT). The common theme is: we create faster and smaller devices, we produce more data, and when we start using this data, we get better understanding because we learn quicker and more comprehensively.

Looking back, nobody would argue that the quality of video processing of ten years ago was higher than today. Here we hit the sweet spot of our topic. The increase in quality of video processing is a direct consequence of an increase in speed and a reduction in size of data and devices. In every field, bringing such new technology within reach of a growing community starts an avalanche of innovation and cost reduction. As a result, the medium is now accessible and usable by a large audience producing and sharing more images and videos than ever before … and with unparalleled quality.

A different perspective

In the genomics data processing community we still see the overriding perception that any data processing speed increase of one or two orders can only be achieved by cutting corners (resulting in quality reduction), or plain old-fashioned hardware acceleration. This perception is wrong. Let me explain why:

  1. As this industry is still in its infancy, many tools originate from content experts. These have different roots and a completely different approach than the computer experts who work in e.g. video processing or gaming industries. They are generally known as the geeks that turn every byte and bit around to extract maximum user experience from the available electronics. As the genomics industry is growing, new companies start looking at the original problems from a different perspective and find ways to mobilize the full potential of modern processors and provide solutions which are as good or better and much, much faster. Bottom line: the power was already there, we just had not found the way to go full throttle. The gaming industry as an example has grown way and way beyond the ‘geek’ image, so it’s not a bad thing.
  2. With higher speed, processing can be done on larger and more diverse datasets, yielding more accurate results. For example, one project in the Netherlands required mapping of plant data for which no one-to-one reference existed. Although there were a series of references, it was not known which was the best one to choose. Because of available speed, a selection of the data set was run against ALL available references. Why Choose? Based on coverage metrics, the reference closest to the sample-set was selected and all samples where mapped against this reference. It is clear that a reference, which is closer to the data set, will give a higher quality result with fewer false positives when it comes to variant selection and association studies.
  3. When I spoke with people of the VU University in Amsterdam, the topic of speed was explored along another quality dimension. They explained that getting a genetic ‘tissue test result’ in 30 minutes (note that this includes preparation, sequencing and data processing), would already position this diagnosis within the bounds of a surgery window. In other words, a surgeon can make an immediate decision concerning the scope of an operation based on the genetic profile of the tested tissue.
  4. When analysis capabilities, fuelled by increased speed and footprint reduction, becomes available to a much wider user group, the shared learning of the empowered parallel brains of the community will yield more insights and increase the overall quality of, in this case, the diagnostic result.

The above argument links the speed component to an increase in quality, and voids the perception that a massive increase in speed can only be achieved by cutting corners and thus reduces the output quality.

Too early for fixation

In essence it has been proven that the modern (Intel) processor capacity is no longer the limiting factor for genomics data processing. The main issue is bandwidth and data footprint. This brings us to the topic of hardware acceleration. When we speak about this with people in the industry we hear the common opinion that hardware accelerated processing of genomics data is still too early.

Again, in analogy to video processing, specific hardware devices will become a solution when this field has reached a higher level of maturity such as happened when mpeg-x became widely adopted and energy and heat requirements fueled the creation of small devices.

Yet, as times have changed, it is only common sense to have all the goodies directly connected to the memory highway and integrated in one commodity processor to maintain the agility of software till the field becomes more settled.

Finally

In conclusion of the line of thought of today, I want to emphasize again that major changes in society have been triggered and accelerated by dramatic speed increases and size reductions in any field. Quality as a function of these two components has always gone up, while cost is driven down. So pushing the speed envelope in genetics will contribute to economical health in a health economy.