Discussion about this post

User's avatar
Jeremy Zucker's avatar

Interesting and through-provoking. I agree with the author that one of the key distinguishing questions of predictive biology is:

Can the outcome of an experiment Y be predicted from observable features X?

However if this is the question that drives predictive Biologists, then the next statement cannot be true:

"Predictive Biologists are more concerned with measuring the mutual information between two biological phenomena than they are with measuring direct causality."

Please let me explain why.

If I have two molecules A and B that have high mutual information, and I perform 2 experiments where I separately perturb A and B, there are four potential outcomes:

1. A changes when B is perturbed, but B does not change when A is perturbed.

2. B changes when A is perturbed, but A does not change when B is perturbed

3. A does not change when B is perturbed, and B does not change when A is perturbed

4. A changes when B is perturbed and B changes when A is perturbed.

I think you would agree that predictions based on mutual information alone cannot distinguish among these 4 outcomes. But I would claim that predictions based on combining mutual information with causal information can.

What is causal information? It turns out those systems biology wiring diagrams that were assembled from those arduously obtained molecular biology experiments provide precisely the causal assumptions needed to distinguish among the 4 potential outcomes.

In other words, without the causal assumptions encoded in those systems biology models, data-driven machine learning alone is insufficient to succeed in predicting the outcome of an unknown experiment.

Therefore I would suggest predicting the outcome of an unknown experiment is fundamentally a causal estimation problem, not a machine learning prediction problem.

Expand full comment
Dr. Jennifer's avatar

Where is the Ecology here? Ecology is an entire field dedicated to predicting the outcomes of biological interactions at all scales, from molecules and cells to communities distributed across the planet ? We build in silico models of whole organisms, using many of the same mathematical tools, but you would be shocked by how many epistemic gaps exist for even supposedly well-studied species. Some days it seems that 99% of species on Earth have no suitable biological data for models of living organisms. We constantly have to invent new methods to deal with the near absence of suitable environmental and biological measurements or to try to tease some kind of value out of historical observations to propose answers to very basic questions.

I maintain that getting sequencers, etc in the hands of ecologists will result in more and faster advances, because their training covers a huge swaths of the Biosphere and its possiblities, not just Humans.

And, as a counterpoint I also think that the biologists described here should be employing people trained and experienced in modern ecological theory.

Perhaps, Biologists need to become more like Ecologists ...

Expand full comment
3 more comments...

No posts