Measuring, comparing and interpreting phenotypic selection on floral scent

Abstract Natural selection on floral scent composition is a key element of the hypothesis that pollinators and other floral visitors drive scent evolution. The measure of such selection is complicated by the high‐dimensional nature of floral scent data and uncertainty about the cognitive processes involved in scent‐mediated communication. We use dimension reduction through reduced‐rank regression to jointly estimate a scent composite trait under selection and the strength of selection acting on this trait. To assess and compare variation in selection on scent across species, time and space, we reanalyse 22 datasets on six species from four previous studies. The results agreed qualitatively with previous analyses in terms of identifying populations and scent compounds subject to stronger selection but also allowed us to evaluate and compare the strength of selection on scent across studies. Doing so revealed that selection on floral scent was highly variable, and overall about as common and as strong as selection on other phenotypic traits involved in pollinator attraction or pollen transfer. These results are consistent with an important role of floral scent in pollinator attraction. Our approach should be useful for further studies of plant–animal communication and for studies of selection on other high‐dimensional phenotypes. In particular, our approach will be useful for studies of pollinator‐mediated selection on complex scent blends comprising many volatiles, and when no prior information on the physiological responses of pollinators to scent compounds is available.


| INTRODUC TI ON
The astonishing diversity of animal-pollinated flowers is generally interpreted in light of adaptation to specific pollinators (Darwin, 1862;Fenster et al., 2004;Grant & Grant, 1965;Harder & Johnson, 2009;Stebbins, 1974). This hypothesis has spurred substantial interest in measuring pollinator-mediated phenotypic selection on plant phenotypes (reviewed in Harder & Johnson, 2009, Caruso et al., 2019, Sletvold, 2019, Opedal, 2021. The measurement of selection on a limited set of well-defined floral characters is statistically straightforward using the multiple-regression approach of Lande and Arnold (1983). However, some functionally important floral phenotypes are not easily quantified through a small set of measurements.
Second, although more than 1000 volatile compounds have been detected in floral fragrances, the floral scent bouquets often comprise a core set of compounds of known biosynthetic background (Knudsen et al., 2006). Third, species divergence in scent chemistry is at least partly driven by pollinators, because distantly related species that share the same type of pollinator often exhibit similar floral scent chemistry (Dobson, 2006;Fenster et al., 2004;Junker & Parachnowitsch, 2015;Schiestl & Johnson, 2013;Whitten et al., 1986), whereas closely related species that interact with different pollinators often differ markedly in scent chemistry (Byers et al., 2014;Dobson et al., 1997;Hetherington-Rauth & Ramírez, 2016;Weber et al., 2018).
Studies that have estimated selection on floral scent have often detected directional selection on the emission rate of one or more compounds (Chapurlat et al., 2019;Ehrlén et al., 2012;Gfrerer et al., 2021;Gross et al., 2016;Joffard et al., 2020;Parachnowitsch et al., 2012;Schiestl et al., 2010). However, studies of selection on floral scent are complicated both by our yet limited understanding of the functional role of floral scent in plant-pollinator communication (Schiestl, 2015) and by the high-dimensional nature of floral fragrances, which create challenges for measuring selection (Chapurlat et al., 2019;Gfrerer et al., 2021;Gross et al., 2016;Parachnowitsch et al., 2012;Schiestl et al., 2010).
Biologically, the interpretation of selection estimates on floral scent is complicated by uncertainty about the extent to which pollinators are actively searching for certain compounds, or whether the scent of a flower as perceived by pollinators and other interactors (e.g. antagonists) is determined by the relative abundances of some or all of these compounds. There are examples of both strategies, but most studies come from highly specialized pollination systems which may not be representative of the behaviour of many pollinators. For example, plants can mimic insect alarm (Brodmann et al., 2009) or sex pheromones (e.g. Borg- Karlson, 1990;Kullenberg & Bergström, 1976;Schiestl et al., 2003) that lure particular insect pollinators to the flowers. The compounds involved in these deceptive pollination systems are often unique, and not commonly part of floral scent blends. Similarly, plants involved in obligate pollination mutualism have sometimes evolved the release of particular compounds that function as 'private channels' to their particular mutualist species (Chen et al., 2009;Schäffler et al., 2015). In other specialized pollination mutualisms, plants emit diverse and generic floral scent compounds (Friberg et al., 2014Ramírez et al., 2011), and their specialized pollinators have antennal receptors that detect several to many of these volatiles (Eltz & Lunau, 2005;Schiestl et al., 2021;Svensson et al., 2010). To further complicate the issue, many flowering plants are pollinated by generalist insects (Johnson & Steiner, 2000;Waser et al., 1996), and these are able to learn different floral scents, singularly or in blends (Lawson et al., 2018;Riffell et al., 2008;Wright et al., 2013;Wright & Schiestl, 2009). In the latter cases, the trait 'scent' may represent a combination of a potentially large number of measurements (volatile concentrations), and it is unclear how pollinators use the multidimensionality of floral scent variation in their interaction with flowers (García et al., 2021;Wright & Schiestl, 2009). Hence, analyses of selection on scent need to consider both individual floral scent compounds and the entire scent bouquet (as a 'composite trait').
Studies of selection on scent are also complicated statistically by high dimensionality and associated issues related to multicollinearity (Graham, 2003). The most common solution to the problem of measuring selection on high-dimensional phenotypes is to employ dimension reduction through principal component regression (Gross et al., 2016;Parachnowitsch et al., 2012;Schiestl et al., 2010). In this two-step approach, dimension reduction is achieved by projecting an original set of covariates (volatile concentrations) onto a subset of principal components, which are subsequently included as predictors in a multiple-regression model. This approach solves the issue of fitting regression models to high-dimensional data but yields estimates of selection that are not directly linked to the original trait measurements (but see Chong et al., 2018).
The aim of dimension reduction in principal component regression is to reduce the multivariate phenotype into a subset of K E Y W O R D S floral fragrance, floral scent, natural selection, plant-pollinator interactions, reduced-rank regression, selection gradient phenotypic axes that jointly explain most of the variance in the original phenotypic space. In other words, dimension reduction for the phenotype is performed independently of the relationship between phenotype and fitness. This is potentially problematic because the most variable phenotypic axes may not be those that are ecologically most important or interesting (Morrissey, 2014;Schluter & Nychka, 1994). An alternative approach to dimension reduction is to explicitly seek the phenotypic axes (combinations of the original variables) that explain the most variance in the response variable (e.g. relative fitness). This can be achieved through techniques such as two-block partial least-squares (Gómez et al., 2006;Rohlf & Corti, 2000), projection-pursuit regression (Friedman & Stuetzle, 1981;Morrissey, 2014;Schluter & Nychka, 1994) or reduced-rank regression (Anderson, 1951). These approaches allow estimating the leading axes of phenotypic variation that are under selection, a very useful property for analyses of multivariate selection (Morrissey, 2014). In turn, selection gradients on the original traits can be obtained via numerical methods (Morrissey & Sakrejda, 2013), or by projecting the estimated selection on the leading axes back to the original trait space as suggested for principal component regression (Chong et al., 2018). This facilitates biological interpretation in cases where dimension reduction is applied for traits with a clear functional role in the process under study (e.g. floral dimensions in studies of pollinator-mediated selection; Opedal, 2021) and may also be helpful for characterizing and interpreting the structure of the major axes of selection in cases where the biological relevant phenotype represents a combination of the original measurements.
The aim of this study is to reassess general patterns of phenotypic selection on floral scent through a re-analysis of data from four previously published studies (Chapurlat et al., 2019;Gross et al., 2016;Joffard et al., 2020;Parachnowitsch et al., 2012). We

| Theory: phenotypic-selection analysis with reduced-rank regression
Reduced-rank regression (Anderson, 1951;Izenman, 1975) achieves dimension reduction in multivariate problems by projecting an original set of covariates onto a reduced set of composite variables that best explains variance in the response variable. In selection analysis, this translates into the reduced set of phenotype axes that best explains relative fitness and, thus, is under selection. In the following analyses, we used the Bayesian reduced-rank regression implementation of the Hmsc 3.0 R package Tikhonov et al., 2020).
In the Hmsc model, the linear predictor for the fixed effects is written as L F ij = ∑ k x ik kj , where x ik is the value of covariate k for observation i, and kj is the regression slope of response variable j on covariate k. In the following analyses, we include only one response variable, but we keep the multivariate notation here for generality. In the reduced-rank regression implementation, the n c covariates k are decomposed into two sets so that n c = n * c + n RRR c . The covariates k = 1, … , n * c are treated as standard regression covariates, while dimension reduction is applied for the covariates

| Study systems
We details about all study systems and study designs are given in the Appendix S1.

| Selection analyses
We analysed each of the 22 datasets (population-year combinations) separately and refer to these as 'studies'. In all analyses, individual plants were treated as sampling units, and female reproductive success (fruit production) as a fitness proxy. All datasets included abundances of scent compounds (volatiles hereafter) as well as morphological traits, and some included a phenological trait (flowering time).
We fitted Hmsc models to each dataset with relative fitness as response variable and Gaussian error distribution. As fixed effects, we included the morphological and phenological traits as 'standard' covariates (specified by the XData argument in Hmsc), while the volatiles were reduced into a single 'scent selection axis' through reduced-rank regression specified through the XRRRData argument in Hmsc. The models did not include any random effect. The R code implementing all analyses is available on GitHub; github.com/oyste iop/Scent Selec tion).
We obtained mean-(β μ ) and variance-scaled (β σ ) linear selection gradients for the standard traits and the scent selection axis by multiplying the regression slope on each covariate by its mean and standard deviation, respectively (Hereford et al., 2004). Because the scent selection axis is not on a ratio scale, mean-scaling is not meaningful (Hereford et al., 2004;Houle et al., 2011) and we report only variance-scaled selection gradients for the scent selection axis.
After projecting the estimated selection gradient on the scent selection axis back onto the original volatiles to facilitate interpretation, we expressed inferred selection on each volatile as mean-scaled selection gradients.
To evaluate the adequacy of the dimension reduction approach for characterizing selection on floral scent, we compared the explanatory and predictive power of the reduced-rank regression models to models treating each volatile concentration as a standard covariate (Lande & Arnold, 1983). To compare the predictive power of the two models (i.e. reduced-rank regression
Selection on scent was well supported statistically (posterior support >90%) in about 41% of the studies (9/22 studies). In the remaining 13 studies, support for selection was weak to moderate (posterior support 50.6%-78.0%).
Explanatory power was always higher for the multiple-regression models than for the reduced-rank regression models (Table 1). When making predictions for independent training data (cross-validation), however, the reduced-rank regression models often performed as well or better than the multiple-regression model ( Table 1).
The compound-specific selection estimates inferred by projecting selection on the scent selection axis back onto the original variables were qualitatively similar to those obtained through standard multiple regression, as indicated by moderate-to-strong positive correlations between selection gradients inferred by the two methods (mean r = 0.67, range = 0.41-0.89).

| Spatio-temporal variation in selection on scent in G. odoratissima
Selection on scent and other pollination traits (flower number, plant height and inflorescence length) of G. odoratissima varied in time and space and specifically tended to be stronger in the lowlands than in the mountains, especially in 2010 (Figure 1). Selection on scent was reasonably strong (β scent >0.1) and statistically well supported in 6 of 13 studies (population-year combinations, Table 1).

Inferred selection on individual volatiles also varied in time and
space, yet the magnitude of variation was limited after accounting for sampling uncertainty (Figure 2). Notably, average selection gradients on all volatiles were close to zero. ) and the multiple-regression model (r 2 MR ), and predictive power based on fivefold cross-validation for the reduced-rank regression (r 2 CV ) and multiple-regression models (r 2

CV−MR
). The column r β gives the correlation between compound-specific selection gradients inferred from the reduced-rank regression and multipleregression models.
Bold values indicate at least 90% posterior support for selection on scent.  & Johnson, 2009, Opedal, 2021. Third, the statistical support for selection on scent in about a third of the studies is also comparable to patterns observed for other kinds of pollination traits.
While pollinator-mediated selection on flower dimensions can often be interpreted trait by trait (Opedal, 2021), it is unclear whether selection on floral scent acts on individual volatiles or on the entire scent bouquet. Indeed, scent bouquets comprise sets of biochemically linked compounds (Junker et al., 2018), and scent chemistry should perhaps be seen as a reducible multivariate phenotype rather than as an irreducible multidimensional trait (Collyer et al., 2015).
We found that the dimension reduction approach captured well the relationship between phenotype and fitness (i.e. selection), but this is not directly informative about how pollinators respond to variation in scent. To further understand the biological meaning of the 'scent selection axis' inferred by our approach, data are needed on how pollinators respond physiologically to compounds inferred to be under selection. There is ample evidence that pollinators respond physiologically to floral volatiles (e.g. Dötterl et al., 2006;Eltz & Lunau, 2005;Schiestl et al., 2021;Svensson et al., 2010) and that floral volatiles are attractive to pollinators in the field (Dodson et al., 1969;Majetic et al., 2009)

F I G U R E 2 Spatio-temporal variation in compound-specific mean-scaled linear selection gradients in Gymnadenia odoratissima.
In the upper panel, the + indicates the mean for each compound. The lower panel shows the standard deviation of the selection gradients on each compound, after correcting for the sampling variance in the individual estimates. The grey bars indicate compounds that loaded onto the leading principal component in Gross et al. (2016). Gross et al. (2016) detected positive selection on PC1 and stronger selection in the lowlands than in the mountains. Asterisks (*) indicate compounds that were shown to be electrophysiologically active in pollinators among populations. Although floral scent is functionally involved in advertisement towards pollinators, these patterns of variation in selection are closer to those observed for pollinator-fit traits than for other advertisement traits such as plant height or flower display size (Opedal, 2021). We can speculate that spatio-temporal variation in selection on scent chemistry is driven by variation in pollinator assemblages, as seems often to be the case for fit traits (e.g. Chapurlat et al., 2015;Herrera et al., 2006;Opedal, 2021;Paudel et al., 2016;Soteras et al., 2020). While variation in selection on fit traits is expected to arise from variation in the fit of local pollinators to flowers, variation in selection on scent could well arise from variation in the scent preferences of local pollinators (Ramírez et al., 2011;Suinyuy et al., 2015). For this dataset, the original analysis was practically identical to our multiple-regression analysis, and the compound-specific selection gradients so inferred were strongly correlated to those inferred by our reduced-rank regression approach (r = 0.89,  Gross et al. (2016) in that selection in scent tended to be stronger in lowland populations, especially in the first year of study. Furthermore, the analysis of compound-specific selection was consistent with the results of Gross et al. (2016) in terms of which compounds were under stronger selection ( Figure 2, and see Appendix S1). Our results are also qualitatively comparable to those of Joffard et al. (2020) in identifying the same two populations subject to stronger selection.

Reduced-rank regression and principal component selection
are not the only statistical techniques for dealing with large sets of correlated predictor variables. One possibility is to use regularization approaches such as the elastic net (Zou & Hastie, 2005) and its variants such as the least absolute shrinkage and selection operator ('lasso'). Like our reduced-rank regression approach, these approaches aim at maximizing the predictive ability rather than model fit (Morrissey, 2014). Gfrerer et al. (2021) used an elastic-net approach in their recent study of Arum maculatum, a species with extraordinarily complex floral scent chemistry. These authors used the elastic-net approach to identify which of the 289 compounds emitted by their study plants were more strongly associated with fitness and subsequently estimated selection on these compounds using standard multiple-regression. Another suitable approach is projection-pursuit regression as advocated by Schluter and Nychka (1994). This approach is similar to reduced-rank regression, although allows non-linearity in the functions used to construct the predictors (Morrissey, 2014). Given the difficulties involved in collecting scent data, and the modest sample sizes typically achievable, it is not clear that adding such complexity would yield much further insight. Finally, while not yet applied to studies of floral scent, morphometric studies have estimated selection on shape (as a multidimensional trait) through the two-block partial least-squared method, which also yields axes of maximum covariance between sets of variables such as fitness and shape (Gómez et al., 2006;Kuchta & Svensson, 2014;Rohlf & Corti, 2000).
All these approaches yield insights into patterns of selection on scent chemistry, although we argue that there are several advantages of reduced-rank regression and similar approaches. First, comparison to published principal component regression analyses (Gross et al., 2016;Joffard et al., 2020) suggests that the two approaches to dimension reduction yield qualitatively similar conclusions, yet the numerical interpretability remains higher for the reduced-rank regression approach due to the direct inference of the axis of scent variation under selection. Second, compound-specific selection gradients inferred by multiple-regression vs. reduced-rank regression appears strongly correlated when the number of compounds is relatively low and the sample size is relatively large ( Table 1). The advantage of the reduced-rank regression approach is that we also obtain an estimate of 'overall' selection on scent, and the strength of selection on the scent composite trait was not obviously related to sample size or to the number of volatiles included in the analysis.
Pre-selecting compounds based on knowledge about pollinator responses are clearly biologically meaningful, but the downside of this approach is that data on physiological responses may often not be available, and it is not clear whether the physiological response to a compound maps directly to the relevance of these compounds in foraging decisions. Furthermore, analysing a subset of compounds with reduced collinearity, or that are found to be under stronger net selection, could bias inferred patterns of 'overall' selection on scent. Taken together, these points suggest that the reduced-rank regression approach may be particularly useful for studies of selection on complex scent blends comprising many compounds, and when no prior information on physiological responses of pollinators is available.
Our reduced-rank regression approach can be easily extended to accommodate different data types. The flexible Hmsc model allows analysing several response variables jointly, which provides interesting possibilities for studies of selection. First, selection studies sometimes consider several fitness components, such as pollinator visitation, pollen deposition, seed set and seeds sired through pollen export (male fitness). By including several of these fitness components as separate response variables, it is possible to ask how variation in floral scent affects each, while accounting for potential covariance among fitness components. Similarly, reproductive success of plants may depend not only on pollinator visitation, but, for example, also on seed predation (Parachnowitsch & Caruso, 2008;Pérez-Barrales et al., 2013). When multiple response variables are included in the model, it also becomes natural to include multiple reduced-rank covariates to allow for distinct patterns of response to floral scent for, say, pollinators and seed predators. Finally, we note that our approach could be directly applied to other highdimensional problems, such as those involved in measuring selection on chemical traits more generally (e.g. nectar or leaf defensive chemistry), or on shape quantified through morphometric methods (Gómez et al., 2006).

| CON CLUS IONS
Our reduced-rank regression approach allowed us to obtain a meas- writing -original draft (supporting); writing -review and editing (supporting).

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/jeb.14103.