Modeling variation in rates of continuous trait evolution

Overview

Rates of trait evolution vary markedly across the tree of life, from adaptive radiations to lineages of “living fossils”, but our tools for characterizing and analyzing this rate heterogeneity are still lacking in some cases. My current major research focus is developing novel models, approaches, and algorithms to address some of these gaps in the case of modeling continuous trait evolution.

Evolving rates

Empirical evidence and theory both suggest that the rates of trait evolution are influenced by a vast, interconnected web of various life history and environmental factors. If we assume these factors all change gradually over evolutionary time and usually have subtle effects on rates, it becomes reasonable to actually think of the rate of trait evolution as a kind of trait in and of itself–one we can model the evolution of! For simplicity (and in line with analogous methods for time calibrating phylogenies), let’s assume that rates evolve via a geometric Brownian Motion process characterized by a trend parameter (determining whether rates tend to decrease or increase over time) and rate variance parameter (determining the magnitude of random changes in rates over time).

I developed a Bayesian implementation of this model–which I dubbed “evolving rates” or “evorates” for short–using the probablistic programming language Stan. The implementation is available through my R package up on GitHub, and can be used to infer trend and rate variance parameters from empirical phylogenies and associated continuous trait data, as well as branchwise rates–average rates of trait evolution along each branch in the phylogeny. Notably, the current implementation supports multiple trait values per tip, within-tip variance in trait values (fixed beforehand and/or estimated during model fitting), missing trait data, non-ultrametric trees, and trait values assigned to internal nodes. The only catch is that it doesn’t yet support multivariate trait data!

Check out the associated publication for further information!

Continuous stochastic character mapping

A central aim of macroevolutionary biology is deciphering how environmental and life history factors affect the tempo and mode of evolutionay processes. For example, do lineages of larger organisms generally exhibit slower or faster rates of phenotypic evolution? Such hypotheses are often straight-forward to test using phylogenetic comparative methods provided the evolutionary history of the explanatory factor (e.g., body size) is known, though this is rarely the case in practice. A common workaround is to instead repeatedly simulate the factor’s history and use these simulations, termed stochastic character maps or “simmaps” for short, in place of the true (but unknowable) history. Unfortunately, current implementations of simmaping only work for discrete factors, which is shame because many factors commonly hypothesized to affect evolutionary processes are continuous (e.g., body size, temperature, generation time).

To expand our hypothesis-testing capabilities within the simmaping framework, I developed an algorithm for simmaping continous variables (“contsimmaps”) under a very flexible class of Brownian Motion models (with support for Ornstein-Uhlenbeck models planned for the future). The implementation is already available through my R package up on GitHub, though it lacks most documentation and some convenience features. Similarly to my evorates package, the current implementation supports multiple values per tips, within-tip variance, missing data, non-ultrametric trees, and values assigned to internal nodes. Additionally, this one actually works with multivariate data! It also supports so-called “multiregime” Brownian Motion models, whereby parameters (trends, rates, correlations, etc.) of Brownian Motion models vary according to discrete regimes mapped onto the tree via normal, discrete simmaps as implemented in the phytools package.

I am particularly interested in using contsimmaps to estimate relationships between continuous factors and rates of continuous trait evolution. I now have a preprint up that explores and tests this potential application of contsmmapping (it also describes the math underlying contsimmaps and includes fun empirical example).

State-dependent character evolution

State-dependent speciation and extinction (SSE) models are a major boon to research on lineage diversification dynamics, allowing researchers to efficiently test whether speciation/extinction rates vary according to some discrete trait (consisting of multiple “states”). SSE models are powerful because they combine discrete trait evolution and lineage diversification models into a unified mathematical framework. This additionally allows SSE models to infer so-called “hidden states”–unobserved discrete “traits” that leave a detectable impact on lineage diversification dynamics and/or the evolution of observed discrete traits. Hidden states have become a popular tool for phylogenetic comparative data exploration (“phylogenetic natural history”) and creating better null models for comparative hypothesis testing.

Unfortunately, we really don’t have a similar mathematical framework for uniting discrete and continuous trait evolution models. Most current implementations of state-dependent character evolution (SCE) models (whereby, analogously to SSE, aspects of continuous trait evolution dynamics–like rates–vary according to some discrete trait) work by explicitly simulating evolutionary histories of discrete traits via stochastic character mapping (see above), then fitting continuous trait evolution models assuming these simulated histories are true. Depending on the implementation, this approach ranges from inefficient at best to a crude approximation at worst while also making inference of hidden states extremely difficult (notably, however, a recently-introduced method called hOUwie cleverly circumvents many of the aforementioned issues by using the continuous trait data to dynamically adjust simulated discrete trait histories).

For this project, I aimed to develop an efficient and practical way to calculate exactly how discrete and continuous traits simultaneously change under arbitrary SCE models, obviating the need to simulate discrete trait histories at all when fitting SCE models to empirical data. I managed to come up with a relatively fast and nearly exact solution by discretizing continuous trait space into a large number of “bins” (usually 1024-4096), keeping track of probabilities in each bin, and computing how these probabilities change over time with some useful math tricks (namely, fast Fourier transforms and matrix exponetials). Ultimately, this new framework allows researchers to quickly fit SCE models to infer how discrete traits (including hidden states) impact continuous trait evolution dynamics–including rates! Excitingly, the framework is generalizable to all Levy process-based models of continuous trait evolution–this includes more “drift-y” processes like Brownian Motion and its jumpy/pulsed cousins, but unfortunately excludes more “adaptive-y” processes like Ornstein-Uhlenbeck.

I have implemented my novel SCE (which I’ve started pronouncing “ski” for funsies) framework in an R package on GitHub. Keep in mind that the package is not quite complete yet and basically undocumented. You can read more about how this all works, see how well it works in practice, and check out a cool empirical example in Chapter 3 of my dissertation, now up on ProQuest. Stay tuned for future updates and a preprint!

Bruce Stagg Martin

Overview

Evolving rates

Continuous stochastic character mapping

State-dependent character evolution