(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Diatom abundance in the polar oceans is predicted by genome size [1] ['Wade R. Roberts', 'Department Of Biological Sciences', 'University Of Arkansas', 'Fayetteville', 'Arkansas', 'United States Of America', 'Adam M. Siepielski', 'Andrew J. Alverson'] Date: 2024-08 A principal goal in ecology is to identify the determinants of species abundances in nature. Body size has emerged as a fundamental and repeatable predictor of abundance, with smaller organisms occurring in greater numbers than larger ones. A biogeographic component, known as Bergmann’s rule, describes the preponderance, across taxonomic groups, of larger-bodied organisms in colder areas. Although undeniably important, the extent to which body size is the key trait underlying these patterns is unclear. We explored these questions in diatoms, unicellular algae of global importance for their roles in carbon fixation and energy flow through marine food webs. Using a phylogenomic dataset from a single lineage with worldwide distribution, we found that body size (cell volume) was strongly correlated with genome size, which varied by 50-fold across species and was driven by differences in the amount of repetitive DNA. However, directional models identified temperature and genome size, not cell size, as having the greatest influence on maximum population growth rate. A global metabarcoding dataset further identified genome size as a strong predictor of species abundance in the ocean, but only in colder regions at high and low latitudes where diatoms with large genomes dominated, a pattern consistent with Bergmann’s rule. Although species abundances are shaped by myriad interacting abiotic and biotic factors, genome size alone was a remarkably strong predictor of abundance. Taken together, these results highlight the cascading cellular and ecological consequences of macroevolutionary changes in an emergent trait, genome size, one of the most fundamental and irreducible properties of an organism. To test this hypothesis, we asked whether genome size can predict patterns of diatom abundance across the world’s oceans. Diatoms are single-celled primary producers that account for 20% of global primary production and are keystone species in marine food webs [ 31 ]. We traced the history of genome evolution in one of the most diverse and abundant lineages of marine planktonic diatoms, Thalassiosirales [ 32 , 33 ], to characterize the determinants of genome size on evolutionary timescales. Although a simple association between genome size and body size (cell volume) seems intuitive, a longstanding question is whether genome size drives cell volume, or whether cell volume—an ecologically important and putatively adaptive trait—drives changes in genome size [ 22 , 24 , 34 ]. We used phylogenetic path analysis to test competing directional hypotheses about the relationship between these 2 traits, which have the potential to shape key population demographic parameters that should, in turn, shape species abundances in accord with basic population ecology theory [ 4 – 6 ]. We then used a large metabarcoding database to determine whether genome size predicts geographic patterns of diatom abundance and temperature associations in the global ocean. Our results identified genome size as a strong predictor of global patterns of phytoplankton species abundance. Thus, in the absence of any additional information, this single, emergent property of an organism can help us understand species abundance in the wild. Among these efforts, the size of an organism has emerged as a fundamental and repeatable predictor of abundance—smaller organisms occur in greater numbers than larger ones [ 15 ]. This relationship occurs across unicellular and multicellular lineages, and in terrestrial and aquatic ecosystems [ 15 – 17 ]. A biogeographic component, known as Bergmann’s rule, describes an association between body size and temperature, wherein larger-bodied organisms are found in colder environments and smaller organisms in warmer ones [ 18 , 19 ]. Thus, body size and temperature are frequently woven together as key explanations for organismal abundance. The repeatability of these associations, which link a fundamental organismal trait to its abundance and thermal environment, are heralded as a widespread feature of life on Earth [ 20 , 21 ]. But key questions remain, such as what determines size and whether size alone is the most basic intrinsic, ecologically determinant feature of an organism. For multicellular species, size is a complex trait confounded by tissue differentiation, life history, and development [ 22 – 24 ]. For unicellular organisms, which constitute the bulk of life on Earth, their size may be fundamentally shaped by a single intrinsic feature, the size of their genome [ 23 , 25 ]. Across eukaryotes, genome size varies by many orders of magnitude and is correlated with numerous traits of ecological importance, including body size, metabolism, and life history [ 16 , 26 , 27 ]. As a result, genome size may have important cascading effects on organismal abundance and, ultimately, ecosystem function [ 28 – 30 ]. The abundance of species in nature is a central feature of all life. Because of this centrality, a principal goal of ecology is to understand what determines organismal abundance [ 1 – 3 ]. Theoretical studies have developed an extensive body of work to understand how demographic parameters (e.g., birth and death rates) affect species abundances [ 4 – 6 ], while observational and experimental studies have identified key abiotic (e.g., nutrient supply) and biotic factors (e.g., species interactions such as competition and predation) that shape the abundances of organisms from local to global scales [ 7 – 11 ]. Another equally large body of literature has sought to identify the key intrinsic features of organisms that shape their abundance [ 12 – 14 ]. (A, B) Maps showing the locations of the 210 sampling stations from the Tara Oceans expedition, with points colored to highlight locations within the (A) Arctic, Southern, and South Pacific Oceans or (B) the temperature of each location at the time of collection. The base layer for the maps is from https://cran.r-project.org/web/packages/maps/index.html . Panels C–E show Bayesian multilevel regression models predicting relative species abundance by the interaction of genome size with (C) latitude, (D) ocean region, or (E) temperature. Nonlinear effects of latitude were modeled in (C) using a generalized additive model. Significant estimates for the Arctic, Southern, and South Pacific Oceans are shown with solid lines in (D) . Nonsignificant estimates for the other ocean regions are shown with dotted lines in (D) . Although all predictors are treated as continuous in (E) , we used the model to predict the interactive effect of 4 temperatures (0, 10, 20, and 30°C) with genome size on species abundance. The data and code to generate this figure can be found in https://doi.org/10.5281/zenodo.12608914 . Taking advantage of the global metabarcoding database from the Tara Oceans expedition [ 40 ], we built Bayesian models to test whether genome size influences relative species abundance in the ocean ( Fig 3 ). Using 2 taxonomic assignment methods for operational taxonomic unit (OTU) sequences, we identified 28 species from our study that were also present in ≥10 samples of the Tara Oceans database. This allowed us to test whether latitude, ocean region, and/or ocean temperature interact with genome size to affect species abundance. Latitude had a significant nonlinear interaction with genome size on species abundance ( Fig 3C )—species with larger genomes were more abundant at high latitudes, and species with smaller genomes were more abundant at lower latitudes ( Fig 3C ). The 2 coldest ocean regions, the Arctic and Southern Oceans, were the only ones with a significant positive regression coefficient relating genome size to abundance, whereas all other regions had either no effect or a negative effect (South Pacific Ocean) of genome size on species abundance ( Fig 3D ). We tested the effects of temperature directly and found a significant interaction with genome size to predict abundance, in which species with larger genomes were more abundant in colder temperatures ( Fig 3E ). These results were replicated with a smaller dataset (22 species) of OTUs classified using an alternate method of taxonomic assignment ( S9 Fig ). (A) Average model from phylogenetic path analysis using coverage-based genome size estimates. Arrow color and width represent the direction and magnitude of regression coefficients, indicated by numeric labels (positive: blue; negative: red; nonsignificant: gray). Full lines show coefficients that differ significantly from 0, whereas dotted lines overlap with 0. (B) Standardized regression coefficients and their standard errors (SE) for paths in the model. The data and code needed to generate this figure can be found in https://doi.org/10.5281/zenodo.12608914 . Across the tree of life, genome size and body size are strongly correlated with growth rate, nutrient usage, and other life history traits, but causal relationships and trade-offs among these and other correlated traits are not always clear [ 22 , 34 , 38 , 39 ]. Although causality cannot be inferred directly from comparative analyses of observational data, we can test the relative support for alternative models. To that end, we used phylogenetic path analysis—a type of structural equation modeling that allows for the evaluation of causal hypotheses from empirical data—to test competing hypotheses about the effects of 4 variables on growth rate: genome size, body size (minimum cell volume), temperature, and genomic GC content. We generated 14 alternative hypotheses (i.e., sets of directional relationships) to test whether genome size has no effect (null models), direct effects (direct models), or indirect effects (indirect models) on growth rate (cell doubling time) ( S7 Fig ). Using coverage-based genome size estimates, 3 models (direct1, direct2, and direct4 in S3 Table ) were equally supported, with ΔCICc values <2 and P values >0.2, indicating good fit to the data ( S3 Table ). In the best-fit model, direct4, genome size directly affects cell volume and doubling time, and temperature directly affects genome size and doubling time ( S8 Fig ). The direct1 and direct2 models remove the effect of GC content on genome size ( S7 Fig ). The main difference between the top 2 models (direct4 and direct2) is whether genome size impacts cell volume (direct4) or vice versa (direct2). Averaging the top 3 models resulted in a larger path coefficient for genome size affecting cell volume (0.54 versus 0.17) ( Fig 2 ). Finally, testing the same 14 models with the k-mer instead of coverage-based genome size estimates gave 6 models (including direct4) with equally strong support ( S3 Table ). Importantly, all 6 models support genome size directly impacting cell volume, adding further support for the hypothesis that genome size influences cell volume, not the reverse ( S7 and S8 Figs). The best-fit (indirect2) and average models using k-mer-based genome sizes both suggested that genome size affects doubling times but only indirectly via effects on cell volume, rather than the direct effect of genome size on doubling time supported by coverage-based genome size estimates ( S8 Fig ). Genome size is strongly correlated with body size, measured as cell volume, in microbial eukaryotes [ 23 ]. Although the extent to which increases in genome size require commensurate increases in nuclear and cell volumes is unclear [ 22 ], genome size should exert its greatest influence on the minimum volume of a cell. To test whether genome size predicts cell volume in diatoms, we compiled minimum and maximum volumes for the 51 species in our dataset. Maximum cell volume varied by 5 orders of magnitude across species and minimum cell volume varied by 4 (Figs 1 and S5 and S2 Table ). Increased genome size was associated with increases in both minimum (PGLS r 2 = 0.28, P < 0.001) and maximum cell volume (PGLS r 2 = 0.53, P < 0.001) (Figs 1 and S5 ). We measured maximum growth rates for the species in our study ( S1 Table ) to test whether genome size is a predictor of cell division rate and found that species with larger genomes did indeed have longer doubling times (i.e., slower growth rates) (PGLS r 2 = 0.42, P < 0.001) ( S6 Fig ). Temperature has profound effects on cellular metabolism and growth rate in both multicellular and unicellular organisms [ 37 ], and the addition of temperature to genome size as a predictor of growth rate led to substantial improvement in model fit (PGLS r 2 = 0.73, P < 0.01) ( S6 Fig ). Here, lower temperatures and larger genomes were both associated with decreased growth rate ( S6 Fig ). Genome size was strongly correlated with repetitive DNA content (phylogenetic generalized least squares [PGLS] r 2 = 0.51, P < 0.001) (Figs 1 and S3 ). The percentage of the genome composed of repetitive DNA ranged from 6% in Thalassiosira profunda (genome size: 41 Mb) to 85% in Thalassiosira tumida (genome size: 1.5 Gb) ( S1 Table ). Among the different classes of repetitive DNA, unclassified repetitive elements constituted the largest fraction of most genomes ( S4 Fig ). These are repetitive sequences that could not be classified into known repeat classes, likely due to the paucity of large diatom genomes that have been sequenced to date. The different classes of repetitive elements increased more-or-less proportionally in larger genomes, such that no single class of repetitive DNA disproportionately drove increases in genome size ( S4 Fig ). There was no association between haploid genome size and the average length of genes, exons, or introns ( S3 Fig ), nor the presence of polyploidy ( S1 Table ). Previous studies have linked GC content to genome size variation in both multicellular and unicellular organisms [ 36 ], but genome size was weakly negatively correlated with average genome-wide GC content in these diatoms (PGLS r 2 = 0.08, P = 0.013) ( S3 Fig ). (A) A time-calibrated phylogeny of the diatom order Thalassiosirales, modified from [ 35 ] and with branches colored by minimum cell volume. Light micrographs of live cells illustrate the broad variation in cell volume across the lineage. (B) Bar plots show the estimated genome size and proportions of non-repetitive and repetitive DNA in each genome. Panels C–E show PGLSs models predicting genome size with (C) percentage of repetitive DNA, (D) minimum cell volume, and (E) measured cell doubling time. Black lines show the estimated regression coefficients. The data and code needed to generate this figure can be found in https://doi.org/10.5281/zenodo.12608914 . We characterized the genomes of 67 newly (n = 46) and previously sequenced (n = 21) diatom strains, representing 51 species of Thalassiosirales ( S1 Table ). Haploid genome size varied by nearly 50-fold, from 33 Mb in Cyclotella nana to 1.5 Gb in Thalassiosira tumida ( Fig 1 ) and showed strong phylogenetic signal (Pagel’s λ = 0.998, P < 0.001). Estimates of haploid genome size based on k-mer counting and sequencing coverage were similar and strongly correlated (Spearman’s ρ = 0.984, P < 0.001) ( S1 Fig ). Our estimates of genome size were similar for the 3 strains in our dataset with genome size estimates from flow cytometry ( S1 Table ). For example, our estimate for Cyclotella nana CCMP1335 was 33 Mb, while flow cytometry estimated it at 36 Mb ( S1 Table ). The k-mer-based method was unable to estimate genome sizes for 5 taxa, so our results are based on the coverage-based dataset unless stated otherwise. Thalassiosirales includes marine and freshwater species [ 35 ], but there was no significant difference in genome size between diatoms from the 2 environments (Wilcoxon rank sum test, P = 0.125) ( S2 Fig ). Discussion One of the most basic and defining ecological properties of a species is its abundance in the environment, which is shaped by numerous interacting abiotic and biotic factors [2,6,9]. As a result, considerable attention has been paid to identifying key ecological processes that define a simple, sufficient, and generalizable ecological model explaining species abundance [11]. Rather than focusing on ecological processes, we sought instead to determine whether a fundamental intrinsic property of an organism—the size of its genome—can explain abundance and an associated vital rate, population growth [15,41]. Trait-based models of phytoplankton ecology use the functional traits of individual species or entire communities to understand the biogeography, seasonal dynamics, and future responses of phytoplankton to environmental change [42,43]. Across taxonomic groups, major ocean regions, and marine and freshwaters, nearly all traits of ecological importance scale allometrically with cell size [38,44,45]. Despite its broad predictive power, however, theoretical and empirical studies have revealed complex interactions between cell size and environmental gradients such as temperature and nutrient supply, two of the principal abiotic factors structuring phytoplankton communities [16,46]. In general, large cells tend to dominate in the cold, nutrient-rich waters of high latitudes, and smaller cells are more abundant in lower latitudes, where temperatures are warmer and nutrient supplies are lower [16,47]. Other factors, such as grazing pressure, can interact with temperature and nutrients to modify size–abundance relationships [46]. Amidst a sea of trait correlations, the extent and complexity of these interactions make it difficult to infer causal relationships and develop a simple ecological model of abundance [39,41,42]. Across the tree of life, cell size is also correlated with the size of both the genome and the cell nucleus [24,34,48]. This relationship is commonly assumed to reflect simple packaging constraints, suggesting that over evolutionary timescales nucleus and cell sizes ebb and flow nonadaptively in response to changes in genome size [22,24,34]. Although intuitive, the mechanisms by which these 3 size components of the cell exert their influence on one another is unclear [22]. Alternatively, the strong associations between cell size and fitness-related traits, such as nutrient acquisition and growth rate, suggest cell size is an adaptive trait [23,24,34]. If larger cells require larger nuclei to balance space requirements for RNA synthesis in the nucleus and protein synthesis in the cytoplasm, then changes in the amount of bulk DNA are a means of modulating the size of the nucleus to maintain an optimal nuclear:cytoplasmic ratio (the “karyotopic ratio”) [23]. Our novel approach to these questions—combining phylogenomics, empirical growth rates, and a global DNA metabarcoding database—highlighted a central role for genome size in the cellular and ecological properties of marine diatoms. Although previous flow cytometry studies found correlations between genome size and cell size in diatoms [49,50], the genome sequences analyzed here identified repetitive DNA as the principal driver of genome size evolution. Nucleotypic effects describe the phenotypic changes that occur in response to changes in genome size [51]. In the diatoms studied here, repeat-driven changes in genome size over the past 100 million years had strong nucleotypic effects on 2 fitness-related traits—cell size and maximum growth rate (Figs 1 and 2). The same nucleotypic effects operate on microevolutionary timescales in diatoms as well. A comparison of 2 populations of the marine planktonic diatom, Ditylum brightwellii, with 2-fold difference in genome size and 4-fold difference in maximum cell volume, showed that the population with a smaller genome and cell size had a higher growth rate, and that genome size had a significantly greater (negative) impact on growth than cell size [52]. Larger genomes take longer to replicate, lengthening mitosis and cell doubling time [22,51]. Larger genomes also require additional investments of N and P to replicate and maintain, so in species with large genomes, these 2 essential nutrients cannot be allocated to RNA, ribosomes, and proteins, reducing growth rates and, over longer timescales, selecting for smaller genomes under conditions of nutrient limitation [41]. Although cell size is often considered a “master” phytoplankton trait, our results highlight the ecological importance of genome size as well. Genome size was not driven by increases in the amount of functional DNA, either through gene or genome duplications, but instead through changes in the amount of nonfunctional sequences, highlighting bulk DNA content as a phenotype with far-reaching consequences for phytoplankton physiology and ecology. Although genome size could be interpreted as an adaptive trait in this context [24], this must be weighed against the deleterious effects of excess DNA, including mobile elements that can disrupt functional genes [53] and the metabolic burden of noncoding DNA [41]. Although evidence for this hypothesis is mixed, [54–56], the inclusion of population genetic parameters in our models might have shown whether diatoms with smaller effective population sizes are potentially more susceptible to nonadaptive genome expansions due to genetic drift [53]. Whatever the cause, our results provide support for a simple model in which many ecologically important traits, though perhaps more proximally related to cell size, are perhaps ultimately attributable to the size of the genome. Ecologists have identified numerous biotic and abiotic factors that explain organismal abundance and geographic distributions [7–11]. Indeed, diatom abundance is shaped by abiotic factors such as temperature, along with both bottom-up effects such as nutrient supply, and top-down effects such as grazing pressure [57–59]. The data presented here showed that diatoms with larger genomes and, by extension, larger cell volumes are more abundant in regions and latitudes that experience colder temperatures, supporting Bergmann’s rule and reinforcing a broader biogeographic trend of larger phytoplankton in colder seas [42]. This pattern has been attributed to temperature and numerous covarying factors [18]. For example, larger genomes and cells require more nutrients, which are generally in greater supply at higher latitudes [16,47]. In addition, grazing marine copepods have larger body sizes in colder temperatures [60], which might select for increased genome and cell size in colder parts of the ocean. Although including these and other factors in our models undoubtedly would have explained more of the variation in abundances, genome size predicted abundance remarkably well. Like most studies, the strong effect of latitude on the association between genome size and abundance reflects the context-specific nature of this association, which is typical of many ecological patterns [61]. Finally, a field study of freshwater benthic diatoms from geothermally heated streams found no evidence for Bergmann’s rule [62], suggesting possible differences in diatom size–abundance relationships across ecosystem types or phylogenetic lineages. Documenting abundance associations at a global scale is not without challenges. For example, in many cases the Tara Oceans samples represented a single snapshot in time of abundance at a location, precluding estimates of sampling error and potentially missing rapid seasonal changes in species abundance. Although these types of temporal limitations are common in spatial datasets that are global in scale, they have nevertheless proven to be extremely powerful in revealing broad ecological trends [40,63]. The diatom lineage studied here, Thalassiosirales, was well represented throughout the Tara Oceans samples and allowed us to uncover strong evidence linking genome size, temperature, maximum population growth rate, and species abundance [32,33]. Our results are consistent with size–abundance relationships found in large-scale phytoplankton studies [16,46], which might also be driven ultimately by genome size. Similar associations have been found in multicellular organisms as well, suggesting genome size may shape patterns of species abundance broadly across the tree of life. Genome size has been linked to the distribution of flowering plants along a temperature gradient in the British Isles [64] and was positively correlated with regional abundance in 436 herbaceous plant species across Europe [28]. Although not related to a temperature gradient, salamanders are among the most abundant animal groups in many terrestrial ecosystems [65], and they have among the largest known genomes in the vertebrate lineage [66]. Like diatoms, much of the variation in genome size in these and other groups is attributable to noncoding sequences. Notably, despite the inherent difficulty in estimating abundance at a global scale, the amount of variation in abundance explained by genome size in our study was substantially greater than the typical range of variation accounted for in many ecological studies [67], highlighting the seemingly outsized role of genome size in the ecology of unicellular organisms. In addition to the ecological consequences, our results highlight the unique cellular tradeoffs imposed by changes in genome size in diatoms. Diatoms reproduce asexually throughout most of their life history and are unusual in that one of the 2 daughter cells following a mitotic event is smaller than the parent, leading to a reduction in the average diameter of a cell lineage over time, eventually triggering sexual reproduction and restoring the maximum cell size [68]. Although not measured here, nucleus size is positively correlated with genome size across the tree of life [24,69,70], and the same correlation likely exists for diatoms. With a fixed genome size that constrains the size of the nucleus, diatoms must optimize their surface area:volume ratio as cell size decreases across generations. Diatoms have vacuoles that function in buoyancy control, nutrient storage, and optimization of the surface area:volume ratio. Vacuoles occupy as much as 90% of the cell volume [57], and vacuole size can be modulated in response to environmental conditions [71]. The strong correlation between vacuole size and cell volume has led to the hypothesis that the vacuole has played a key adaptive role in diatom evolution by facilitating increases in cell size as a way to escape grazing pressure [57,58]. This hypothesis does not account for the parallel influence of genome size on cell size confirmed here. Just as the discovery here of a directional effect of genome size on cell size highlights its lack of consideration from previous models, it likewise highlights the absence of several traits from our study. Although suggested by our models, increases in genome size may not be the proximal cause of increased cell volume. Genome size might affect cell volume indirectly, via upward pressure on nuclear volume or another latent character. In addition, although the functions of the vacuole as they relate to cell size are clear [58], vacuole size is a more labile trait, and it is unclear whether the vacuole exerts a causal influence on cell size or vice versa. The genome, nucleus, and vacuole have different functional roles in relation to cell volume, and all incur costs to maintain [57,72], so with a fixed genome size that presumably constrains the minimum size of both the nucleus and the cell, diatoms probably rely primarily on adjustments to the size and contents the vacuole as the ecological setting (e.g., rates of nutrient uptake, sinking rate, susceptibility to grazers) changes in response to decreases in the volume of a cell lineage over time. Overall, the results presented here advance our understanding of species abundance by showing that a single emergent trait fundamental to all life, the size of the genome, can predict population abundance at a global scale. Moreover, the geographic variation in this pattern is entirely consistent with longstanding ideas regarding size–abundance associations in relation to the thermal environment. The addition of ecological information and other trait data to genome size estimates would likely generate a more informative model of species abundance, and this remains an important next step. Integrative approaches such as the one developed here, combining the seemingly disparate subdisciplines of phylogenomics and population ecology, may prove useful in forecasting widespread changes in the abundance of diatoms in response to ongoing climate change, especially in polar regions. [END] --- [1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002733 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/