Techbio is a speciation event

High-entropy neologisms & the evolution of life sciences firms

Dec 29, 2023

“Techbio” has recently entered the lexicon of the life sciences industry. I was initially dismissive of the idea that the term conveyed any meaning beyond an aspiration for the high margins and feasibility of the software industry. My cynicism has since subsided and I’ve come to wholly embrace the new term as a high-entropy vector that distinguishes two related but distinct species of business.

tl;dr: Techbios manipulate information as much as molecules. They’re defined by building an in silico model of biology, an associated data corpus, and a predict-validate experimental loop that allows them to search otherwise intractably large hypothesis spaces. They use these tools to develop better products, more rapidly. As businesses, they have higher initial capital requirements, but more defensible moats and greater compounding returns to scale.

Etymology of an industry

The word “biotech” brings to mind clean lab coats, perhaps a white-walled laboratory a few floors above the street somewhere in South San Francisco, California or Cambridge, Mass. The actual origins are somewhat…muddier.

Living in 1910’s Hungary, Kroly Erkey developed a new method to raise and fatten hogs for food during a famine. Erkey proposed that any process like his that manipulated biology to solve human problems might best be described as “biotechnologie.” Fifty years later, Herb Boyer and Rob Swanson broke ground on Genentech’s first labs just south of San Francisco’s gleaming hills and inherited the mantle of Erkey’s ambition. Whereas Erkey’s biotech made macroscopic manipulations to the organisms that share our world, the 20th century’s biotech repurposed life’s molecular and cellular constituents to achieve breakthroughs in both medicine and industry.

Our industry’s neologism has more recently been inverted to describe yet another new breed of company — a “techbio.” At first blush it can be hard to distinguish this third generation of life engineering firm from the second, but I’ve recently been convinced that there is indeed a unique approach employed by techbio firms that constitutes a speciation event from the parental biotech strain.

Classifying species of enterprise

What makes a techbio firm different from a biotech, beyond the vintage of the buzzword?

Where biotech firms engineered life for the first time at the molecular level, techbio companies primarily engineer life at the level of information. Biotechs innovate at the scale of atoms, and techbios at the scale of bits.

What classification rules might we employ to make the distinction? To my mind, a true techbio firm:

Builds an In silico Model of the biological process sufficient to predict the effect of changes to key engineering parameters
Collects & curates a Data Corpus describing a biological system more completely than ever before
Generates value from the model by Predicting and Validating useful modifications to a biological process to make it faster, cheaper, or more effective

Learning by observing

Life sciences firms generate value by engineering or measuring a biological systems. Whether designing therapeutics or building new materials with synthetic biology, a life science firm must understand which manipulations or measurements will generate value before they can make a product.

Therein lies the challenge! Biological systems are complex, and there are often many more hypotheses about how to achieve a goal than a firm can readily test. There may be thousands of molecules and millions of interactions at play in a DNA sequence to be engineered, a diseased cell to be treated, or a blood sample to analyzed.

Biotechs navigate this complexity by choosing problems with optimal median outcomes using prior knowledge.

Techbios choose to tackle hypothesis spaces that maximize expected value and employ quantitive models & large scale data to make them tractable.

Biotech: Optimizing median outcomes with prior knowledge

Traditional biotechs often focus on areas of biology that are relatively well-characterized as a means of efficiently searching through the intractably large number of hypotheses that face them. Therapeutics firms might choose targets based on the abundance of academic literature supporting the disease modifying activity of a protein. A synthetic biology company might engineer strains to produce a metabolite that is quite similar to an existing synthetic route.

Another way to frame this is that traditional biotechs search for hypothesis spaces with the greatest median outcome. Each pairing of a biological target and technology to modify or measure it represents an engineering hypothesis. If the risk on the biological target itself is minimized given prior knowledge, the median performance across target:technology pairs is optimized.

How do we know this thesis is more than mere speculation? In therapeutics, there are typically many firms competing to make medicines against the same known targets. This phenomenon is widely acknowledged as “crowding,” or “herding,” and it appears to be increasing with time.

Techbio: Learning to maximize upside

Techbio firms take a different approach. Rather than restricting their search to areas of biology that have already been “derisked,” these firms explore large hypothesis spaces where the best case outcome has the highest impact.

The key to making this approach tractable is that techbio firms build in silico models of their biological system. In silico models can be built in diverse ways, but their defining characteristic is that they can predict the outcome of an experiment given only the recipe of its components.

We might construct a model that predicts the likelihood that a DNA sequence drives gene expression, that a chemical structure inhibits an enzyme, or that a genetic intervention treats a disease. Using these predictions, techbio firms explore most hypotheses in the world of bits, rather than the world of atoms.

While this may sound fanciful, the molecular foundations of modern biology actually emerged from a similar approach. Pioneering scientists discovered the structures of DNA, proteins, and the patterns of heritability using quantitative models, but until recently these quantitative approaches were unable to make useful predictions for more complex biological systems. Recent advances in artificial intelligence broke through this complexity barrier, allowing scientists to learn the rules a biological system from data1.

Techbio firms leverage these new methods to build in silico models of biological problems that were intractable just ten years ago. Even an imperfect model can be used to prioritize hypotheses, allowing a techbio firm to focus on executing experiments that are most likely to yield outcomes in the long-tail of a power law distributed results.

Constructing a Data Corpus

Before a techbio can build an in silico model, they first need to construct a data corpus that captures the fractal complexity of their biological system.

In silico models that learn from experimental data are often limited less by their computational complexity than by the quality and scale of data available to train them. Machine learning scientists have found across various domains that model performance obeys a scaling law. As training dataset scale increases, so does model performance.

This phenomenon appears to govern the behavior of in silico models of biology as well! Increasing data size has led to increased model performance in regulatory DNA sequence prediction2, protein folding3, and cell geometry prediction4.

Unfortunately, large datasets that capture underlying biology of interest do not yet exist for most problem domains. The number of biological problems is so vast that for any given problem — a cell type you’re hoping to treat in a disease, a metabolic pathway you’re trying to engineer, a protein you’re optimizing for a new role — there may only be a few experiments to date that you can access for training.

This paucity of data represents both a challenge and an opportunity. Techbios can rarely focus purely on the world of bits. Instead, they need to span the chasm between bits and atoms and generate the experimental data necessary to train their in silico models. Given how little data is available externally, a focused techbio company can often generate orders-of-magnitude more data in-house than exists in the entire world externally.

Considered as a species of artificial intelligence company, techbios are in the rare position to generate a differentiated data corpus at unprecedented scale5. This data serves as both a moat and a source of compounding returns. As the data corpus grows, the in silico model performance improves, and the rate at which the techbio can generate additional high-value data points increases as well. The construction of a data corpus therefore represents one of the defining features of a techbio and underlies a virtuous flywheel that can take off at the heart of these businesses.

Converting predictions into value

Foundational in silico models are fascinating, but they do not generate business value automatically6. Techbios need to close the loop on value generation by integrating their models into a product development cycle, and the model can’t be purely incidental for branding value. To generate value, the model must either:

Accelerate the product development process
Reduce the cost of development
Improve the quality of the final product

Many Techbio firms achieve these goals by integrating the in silico models into an active learning process using techniques like Bayesian Optimization7. Active learning allows firms to spend a fixed “budget” of experiments more effectively by using models to choose the most promising hypotheses to test in the world of atoms.

Rather than having to guess at which experiment to do next using human intuition, in silico models can quantitatively integrate all the prior experiments a firm has done to make an informed prediction. In the best case, active learning both reduces the time necessary to discover a successful result and increases the magnitude of success achieved8.

We can think about this process as a simple Predict-Validate loop9.

To describe how this process works in practice, a techbio firm might begin a discovery campaign to find a genetic intervention that treats a disease. At the beginning of the campaign, the firm has only a loose prior on which of the countless interventions might be effective, so they start their search by testing a range of interventions to seed initial training data. These seed data serve to initialize an in silico model that then predicts the outcome of future experiments (Predict). The most promising of these predictions are then validated experimentally (Validate), new data is fed back to the model, and the cycle is repeated, iteratively.

Accelerating discovery

The most obvious benefit of these Predict-Validate loops is that they can find effective interventions more quickly, more cheaply. While closely associated, those two benefits are not necessarily the same thing!

Using drug discovery as an example application, an in silico model may allow researchers to generate more reasonable hypotheses to test. If validation experiments can be parallelized, a techbio firm can then reduce the wall-clock time required to find an effective intervention by replacing human decisions in the Predict phase with model decisions. Human predictions may take days to months, while model decisions take seconds to minutes, so the discovery process can be accelerated even if the same number of validation experiments are performed.

Reducing cost

In silico models might similarly accelerate discovery and reduce cost by allowing a techbio firm to test hypotheses with a higher expected value (i.e. each hypothesis tested is more likely to yield a hit). A firm might then be able to perform fewer validation experiments to find an effective intervention, reducing the cost of the discovery process and accelerating the time to completion.

It’s important to note that this cost benefit is primarily realized at the early stages of product development — searching for drug discovery targets or active compounds, or searching for an optimal strain at benchtop scale in synthetic biology. These discovery phases are rate-limiting for the development of new drugs, but they represent only a minority of the expenses involved in a the discovery process10.

Most of the expense of bringing a new medicine to market is incurred in the development phase of the process — scaling up manufacturing and running clinical trials. For an illustrative example, a survey of drug development firms found that only ~15% of total development costs were pre-clinical, with the remaining ~85% related to downstream development. I’m less familiar with the cost breakdown in synthetic biology and diagnostics, but I believe the overall skew is similar.

It’s harder to reduce the costs of these development stages directly using in silico models (though smart teams are trying). However, if in silico models help techbio firms select drug candidates, strains, or diagnostic approaches that have a higher chance of development success, they can likewise reduce costs in aggregate by expending fewer resources on failed programs.

Increasing the efficacy of final products

In silico models can not only reduce the cost of product development, but also improve the quality of the overall product. Imagine we have a fixed budget of experiments we can run to find an ideal drug target or synthetic strain. An effective in silico model has the potential to help us find a higher “global maximum” on the possible product landscape through the active learning process.

In drug discovery, this might equate to a safer, more effective therapeutic due to better target or molecule selection. As one would hope, the better a therapy is along these dimensions, the more value it tends to generate for the developer11. Developing the best product, not just any product, is likely to generate value in other life science domains as well.

Business implications

The features that distinguish a techbio from a biotech matter not just inside the company, influencing how employees work, how goals are set, and who is hired, but also have important implications for the structure of the business.

Techbio firms develop natural moats, whereas biotechs struggle to do so
Techbios have an abundance of riches at the discovery stage, warranting a more liberal partnership strategy than biotechs
Techbios may require more funding than biotechs to deliver the 1st product, but the cost of the N-th product is lower

Techbio firms naturally develop defensible moats

Executed properly, both the data corpus and in silico model that define a techbio firm represented cornered resources. As the data corpus grows, the in silico model makes better predictions that help a firm expand their data corpus more effectively (e.g. by only running experiments that provide non-redundant information). An accumulated data corpus is difficult for new entrants to replicate, and the returns to scale compound over time. Techbio firms therefore have tangible resources that provide a competitive advantage in their area of expertise. Past success enables future success.

By contrast, biotech firms have historically struggled to develop moats that expand beyond a single asset (i.e. a single drug, engineered strain, or diagnostic test). Intellectual property provides meaningful protection for individual assets, but holding the patent for one asset rarely provides a competitive advantage for developing another, even if it’s highly related12. For a traditional biotech, past success in a therapeutic or application area does not increase the likelihood of future success by default, even in that same domain13.

Techbio firms therefore have a more defensible business model than traditional biotechs. Techbios might be analogized to internet businesses with network effects where success is self-propagating. Biotechs are perhaps more akin to entertainment businesses, where each “hit” (e.g. new asset) requires a unique set of inputs to produce. Taking the analogy a step further, techbios may therefore represent a less volatile species of life science business with a differentiated equity product.

Techbios suffer from an embarrassment of early stage riches

The techbio approach improves the productivity of early stage product development, but involves a resourcing trade-off. A techbio therapeutics firm may discover targets or initial drug discovery hits more efficiently than a biotech, leading to a proliferation of early stage opportunities. Building the Predict-Validate loop consumes resources that might otherwise be dedicated to developing a hit into an asset, so techbios are often faced with a opportunity-resource imbalance — there are more early stage opportunities than there is capital to pursue them.

Techbios are therefore a special case of a “platform biotech,”14 and likely benefit from a more liberal partnership strategy. Asset-focused biotechs need to pursue development partnerships with larger peers (e.g. pharma for therapeutics) carefully, since the future value of the asset they partner may represent a non-trivial fraction of the total enterprise value. Techbios by contrast are likely to generate a long-tail of early stage discoveries that they can’t pursue internally, so partnering early and often is a necessary mechanism to capture maximum value from their Predict-Validate loop.

Techbio companies become more efficient with time

Building a data corpus, in silico model, and Predict-Validate loop consumes resources. A traditional asset-focused biotech can skip these steps and jump straight into the development process for their first asset. In the early years of company development, it’s quite likely that a techbio will require more capital than a traditional biotech to generate that initial asset15.

The real value of the techbio platform is realized in the quality of that first asset and in the reduced cost of assets over time. As the data corpus grows and the model improves, techbios have the potential to develop cheaper, more effective assets. Biotechs don’t benefit from the same compounding returns by default.

Coda

Techbio firms as construed here are a young species. Over the coming years, I look forward to seeing these new entrants unlock previously intractable products that help patients, grow the economy, and reveal new biology that promotes human flourishing.

Please get in touch to talk through any contrasting opinions!

Shameless plug: I’ve argued previously that machine learning methods represent a return to a formal, quantitative modeling of biology, rather than a departure from prior tradition.

See the effect of scaling training data size for Enformer models and Basenji models.

See supplementary figure 2 of the RosettaFold paper showing that proteins with more available sequences in a multi-sequence alignment (MSA) achieve higher performance.

See figure 1 of a preprint from the Recursion Pharma team demonstrating that larger training sets improve cell morphology prediction.

More than just the right parameters are required to “make money damn near automatic.”

Peter Frazier’s tutorial on BayesOpt is incredibly lucid. I highly recommend it for anyone interested in iterative experimental design.

See a great summary on active learning in drug discovery from the inimitable Michael Eisenstein

Another framing is that techbios are implementing a special case of the design-built-test-learn (DBTL) framework where the Learn and Design phases are performed by in silico models. In this frame, we can simplify a DBTL cycle to a Predict-Validate loop (Learn-Design → Predict; Build-Test → Validate).

This point is counterintuitive! How can something be rate limiting, but also not the most expensive part of the process? In drug development, we’re largely limited by knowing what sort of molecules we should target to treat a given disease. Once a molecular target is identified, the tools of drug development are mature enough that we can quite often solve the engineering problem of acting on the target. This isn’t categorically true (see e.g. mutant KRas, p53, or dystrophin as examples of how challenging it can be to “hit” a known target), but on the margin it’s fair to say that finding the right target for a given patient is the hardest part.

However, the process of discovering targets is relatively cheap in comparison to development. The number of programs a company can pursue is limited by the number of strong targets they’ve identified, but the number of medicines they can bring to market is limited by the cost of downstream development for each program.

See analysis from Schulze and Ringel, 2021, Nature Reviews Drug Discovery

For example, the United States Supreme Court recently ruled that Sanofi’s monoclonal antibody to PCSK9 did not infringe on Amgen’s patents covering antibodies that bind to the same site.

There are obviously exceptions to this rule, including platform biotechs developing a new therapeutic class (e.g. Alnylam, Beam, Moderna) and large biotechs with unique internal tools (e.g. Regeneron’s humanized animal models). Institutional knowledge and expertise in an area represent “soft” mechanisms that can increase the likelihood of future success, but even these soft mechanisms can be quite narrow based on how the domain is defined.

See an excellent distillation of the concept from Patrick Malone at KdT and Elliot Hershberg at Not Boring

This isn’t a law of physics and I believe a techbio can be built in a resource-constrained setting, but an initial capital-intensive phase is my modal expectation.

Creode

Discussion about this post