Origin and distribution of desert ants across the Gibraltar Straits

have a low dispersal capacity are


INTRODUCTION
Speciation is a key determinant of the diversification of life and regional species richness.
Theory predicts that the formation of new species mostly occurs in allopatry (Coyne and Orr, 2004;Mayr, 1963), although it has been suggested that sympatric speciation may occur at a more common rate than initially thought (Nosil, 2008).Allopatric speciation arises due to the absence of gene flow between populations separated by geographic barriers through vicariance or dispersal.The process of speciation may be accelerated by natural selection acting on diverging populations subject to different ecological conditions promoting the evolution of phenotypic differences (Nosil, 2012).However, niche conservatism-the tendency for new species to retain similar ecological requirements-may also limit phenotypic divergence in spite of reproductive isolation (Wiens et al., 2010).This might be particularly true in taxa in which reproductive and non-reproductive functions are completed by different individuals like social insects where worker morphological traits are often insufficient for species delineation (Schlick-Steiner et al., 2006;Seifert, 2009).The suppression of the geographic barriers may then allow secondary contacts between reproductively isolated populations that are phenotypically indistinguishable (Eyer et al., 2017;Jowers et al., 2014).Alternatively, populations of the same species distributed over large environmental gradients may show major phenotypic differences as a result of local adaptation and phenotypic plasticity, even though complete reproductive isolation is not achieved.In this context, phylogeographic analyses can provide useful information on the role of past and current geographic barriers on species formation and species delineation at a regional scale.
The Mediterranean basin is one of the world's main biodiversity hotspots (Myers et al., 2000).This extraordinary regional diversity partly results from the recurrent encounters of North-African and Eurasian biota as a consequence of geo-climatic events during the Cenozoic (Husemann et al., 2014).For example, the progressive closure of the Gibraltar Straits between North Africa (NA) and the Iberian Peninsula (IP), which culminated in the creation of a land-bridge during the Messinian (Achalhi et al. 2016), may have allowed the migration of terrestrial species in both directions (Fig 1).The sudden re-opening of the Gibraltar Straits ≈ 5.33 Ma, may then have initiated numerous processes of speciation through vicariance.This is supported by the fact that, despite being only 14 km wide, the current Gibraltar Straits is an effective barrier to gene flow, even for organisms with good dispersal capacities (Casimiro-Soriguer et al., 2010;Castella et al., 2000;Garcia-Mudarra et al., 2009;Pardo et al., 2008;Salzburger et al., 2002;Sanchez-Robles et al., 2014).
Moreover, measures of divergence between sister clades of snakes and lizards distributed on both sides of the Gibraltar Straits often coincide with a scenario of late Miocene vicariance (Albert et al., 2007;Paulo et al., 2008;Velo-Anton et al., 2012).However, other studies suggested that divergence occurred much earlier (Gantenbein and Largiadèr 2003) or much later than this period (Brändli et al., 2005;Carranza et al., 2006aCarranza et al., , 2006b;;Jowers et al., 2015).In some lizard species a first migration event from the IP to NA during the late Miocene was followed by a second migration event in the early Pleistocene (-2.3 to -2.7 Ma), perhaps through natural rafting (Kaliontzopoulou et al., 2011).
Climatic oscillations during the Quaternary period may also have promoted important vicariance events.Extensive studies based on fossil records and mtDNA analyses have revealed that the IP served as a refugium for temperate species that migrated towards lower latitudes during Ice Ages (Hewitt, 1999(Hewitt, , 2004;;Taberlet et al., 1992).The marked genetic structuring of extant populations has revealed that migrations towards these favorable areas led to the formation of a mosaic of fragmented populations that persisted in small refugia throughout the entire Iberian refugium (Abellán and Svenning, 2014;Centeno-Cuadros et al., 2009;Ferrero et al., 2011;Gomez and Lunt, 2007;Gonçalves et al., 2015;Miraldo et al., 2011).A similar pattern of small refugia occupying river valleys separated by high elevation habitats has been postulated in NA (Husemann et al., 2014).In some cases, climate stability during long periods may have initiated allopatric speciation between the isolated lineages through drift and/or selection.Post-glacial range expansion may then have led to secondary contact between sister lineages (Gonçalves et al., 2015;Igea et al., 2013;Jowers et al., 2014;Lefebvre et al., 2016;Miraldo et al., 2011).
Previous studies on the thermophilic ant genus Cataglyphis suggest that it constitutes an interesting group in which to trace the phylogeography of this complex region (Boulay et al. 2017;Jowers et al. 2013Jowers et al. , 2014)).These species occupy hot and xeric environments of Eurasia and NA, where workers benefit from unusual thermoresistance (they are able to withstand a ground temperature > 70ºC) thereby limiting predation and competition with other ants and with vertebrates (Cerdá and Retana, 2000;Wehner et al., 1992).The high species richness found in the Middle East suggests that they may have diversified in this region and dispersed east-and westward (Radchenko 1997;Ruano et al. 2011;Sanllorente et al. 2015).
In the western Mediterranean basin, Cataglyphis species have been regrouped into 6-7 phylogenetic groups (Aron et al. 2016;Knaden et al. 2012;Radchenko 1997).In the present study, we were specifically interested in the phylogeography of the albicans group around the Gibraltar Straits in order to determine the role of geographic isolation as a driver of new species formation.This group is particularly useful for this study because it contains several species distributed in both the IP (n=4) and Morocco (n= 9; Cagniant 2009).Nevertheless, many taxonomic and phylogenetic issues remain unresolved in this group, due to the low morphological differentiation among the workers.
To determine the North-African origin of the Iberian desert ants in the C. albicans group we conducted an extensive sampling of C. albicans in Morocco, Spain, and Portugal and determined their phylogenetic relationships with DNA sequences.We used the fossilbased divergence time estimation between Cataglyphis iberica collected in northern Spain and various other species of ants (Moreau and Bell, 2013) in order to calibrate our phylogenetic tree.We employed microsatellite markers to determine the level of isolation between identified clades.We also used male morphological measurements and the composition of worker cuticular hydrocarbons to supplement clade delineation.Male morphology, particularly the copulatory organs, is often considered a good marker of reproductive isolation, since a small difference in shape may prevent copulation between different species (Masly 2012;Kubota et al., 2013;Wojcieszek and Simmons, 2013;Jowers et al., 2014).Moreover, cuticular hydrocarbons have been shown to be species-specific and to constitute useful tools in the identification of cryptic ant species (Bagnères and Wicker-Thomas, 2010;Dahbi et al., 2008Dahbi et al., , 1996;;Jowers et al., 2014).Finally, we combined demographic analyses and niche modeling to test the "refugium within refugium" hypothesis (Gomez and Lunt 2007) in NA and the IP.

Sample collection
A total of 130 Cataglyphis colonies were sampled across the IP and northern Morocco between 2011 and 2015 (Table S1; average distance between sampled colonies: 484 km, range: 0.14 -1870 km).From all colonies, we collected several workers that were immediately conserved in 96% ethanol.When available, males were also collected and stored in 70% ethanol for morphological analyses.Finally, some workers were brought alive to the laboratory for cuticular hydrocarbon (CHC) extraction (see below).Species identification was reviewed by ant systematist authority Prof. Henri Cagniant (formerly at the Paul Sabatier University of Toulouse).Voucher specimens are conserved at the Doñana Biological Station collection.

DNA extraction, amplification, and sequencing
Genomic DNA was extracted from the thorax and legs of one worker per colony using Wizard® Genomic DNA Purification Kit (Promega).A 503 bp portion of the mitochondrial cytochrome oxidase subunit I (COI), 604 bp of the long wavelength rhodopsin (LWRh), and 779 bp of the ribosomic RNA (28S) were amplified from each specimen using the primers summarized in Table 1.Polymerase chain reaction (PCR) amplifications were performed with the HotstarTaq® Master Mix Kit (Qiagen).PCR reactions were performed in a volume of 30 μl using 1 μl of extracted DNA, 1.5 μl of each primer (20mM), 15 μl HotStarTaq Master Mix and 11μl of Nuclease-free water.PCR amplification was conducted with an initial denaturation step at 95 °C (15 min), followed by 35 cycles at 94 °C (45 sec), annealing temperature (1 min), and 72 °C (1 min 30 sec), and finally by an extension step at 72 °C (10 min).The PCR templates were purified with NucleoSpin® Gel and PCR Clean-up kit (Macherey-Nagel) and sequenced on an automated ABI 3730XL sequencer using Big Dye® Terminator cycle sequencing kit (Applied Biosystems, Foster City, California, USA) at the Genoscreen facility (http://www.genoscreen.fr/).

Sequence analysis and phylogenetic inference
All sequences were corrected, aligned, and analyzed in Geneious v. 8.0 (Kearse et al. 2012) using the MAFFT plugin (Katoh et al., 2002) with the auto option for other parameters set as default and then refined manually.For the protein coding genes, the correct translation to amino acids was checked, and introns were considered as a single separate partition for LWRH.DnaSP v. 5.1 (Librado and Rozas, 2009) was used to retrieve haplotypes through the algorithms provided by PHASE (Stephens and Donnelly 2003) as well as to determine nucleotide and haplotype diversity, and neutrality tests of Tajima (D) and Fu and Li (D*).All sequences were deposited in GenBank (KY100127-KY100256, KY314267-KY314396, KY314626-KY314755).
Bayesian phylogenetic analyses on the concatenated DNA matrix were implemented in BEAST2 v. 2.3.0 (Bouckaert et al., 2014) for the joint estimation of the topology of the tree and the ages of diversification.Analyses were conducted applying the best fitting substitution model to each partition as estimated by PartitionFinder v. 1.1.1(Lanfear et al., 2012), with the following parameters: linked branch length, models available in BEAST, BIC model selection, and greedy search algorithm.We applied a relaxed uncorrelated lognormal molecular clock and a Yule speciation process prior.Formica wheeleri (DQ353362, DQ353149, DQ353556), Proformica nasuta (DQ353311, DQ353195, DQ353581) DNA sequences retrieved from GenBank and Cataglyphis tartessica (KY314398, KY314397, KY314756) sequences obtained in this study were included as outgroups.To calibrate the tree, we used the divergence time estimated by Moreau and Bell (2013) between Formica wheeleri, Proformica nasuta, and C. iberica.Analyses were run for 30 million generations, sampling one tree every 1000 generations.
A maximum clade credibility tree was obtained with TreeAnnotator v. 1.8.1 (Drummond et al. 2012) after checking effective sample size values (ESS > 200) using Tracer v. 1.6 (Rambaut and Drummond, 2013).Ten percent of the initial trees was discarded as burning fraction.Maximum likelihood analyses were performed using the Geneious RAxML v. 7.2 plugin with the GTR CAT model.The dataset was partitioned according to the best-fit partitioning schemes.Node support was obtained after 1000 bootstrap replicates.
Haplotype median joining networks were constructed for the three genes using Network v. 4.6 (Bandelt et al., 1999) applying the MP correction (Polzin and Vahdati Daneshmand, 2003) and avoiding hypervariable sites.Finally, the relative similarity among the clades identified by Bayesian inference was also analyzed by Discriminant Analysis of Principal Components (DAPC) using Adegenet (Jombart et al., 2010) in R (R Development Core Team, 2015).
To determine whether expansion/contraction processes had occurred on both sides of the Mediterranean, we pooled all the samples from the IP and all the samples from NA into two groups.For each group, we estimated effective population size changes through time according to the Bayesian Skyline Plot (BSP) method implemented in BEAST v. 2.3.0 (Bouckaert et al., 2014), setting a strict molecular clock and partitions and substation models as defined above.Chains were run for 30 million iterations, of which the first 25% was discarded as burn-in.Genealogies and model parameters were sampled every 1000 iterations.The results were analyzed and summarized as BSPs in Tracer v. 1.6 (Rambaut and Drummond, 2013).

Microsatellite analyses
We obtained microsatellite data for 10 loci from 129 of the 130 specimens (one specimen had to be discarded owing to amplification problems).The microsatellite markers were developed for C. cursor (Chéron et al., 2011;Pearcy et al., 2004).Genotyping was performed using multiplex PCR reactions.Details of multiplex compositions, reaction conditions, and temperature profiles are given in the supplementary information S2.
Fragment analysis was performed using an ABI 3730 XL (Applied Biosystems).Allele size determination and binning was conducted using the Microsatellite plugin in Geneious.We tested for the occurrence of null alleles and scoring errors at each locus with MICROCHECKER 2.2.3 software (Van Oosterhout et al. 2004).Allele richness and frequency, observed and expected heterozygosity, and Wright's F-statistics were computed using GENEPOP ON THE WEB (Raymond and Rousset 1995).Linkage disequilibrium and departures from Hardy-Weinberg Equilibrium (HWE) were tested using GenAlEx v. 6.5.
The Bayesian clustering software TESS 2.3.1 (Durand et al., 2009) was used to infer the most likely number of populations (K) accounting for the spatial distribution of the samples.Models with and without admixture were evaluated using default parameters, setting K from 2 to 15.The optimal number of clusters was determined by plotting the Deviance Information Criterion (DIC) against K.The software package CLUMMP v. 1.1.2(Jakobsson and Rosenberg, 2007) was used to evaluate the consistency of results across 15 independent runs using the full-search algorithm.For sequence data, we also analyzed genotypic differences among the clades using DAPC (Jombart et al. 2010).

CHC analyses
To compare the CHC composition of the main clades identified using sequence data, we analyzed 5-10 workers from 5 -9 colonies per clade.Colonies collected from the same clade were at least 10 km apart.Before extraction, the workers were killed by placing them at -18°C for 20 min.The carcasses were immersed in 2 ml of hexane for 24 h.All the workers from one colony were pooled together.The CHC extracts were then stored at -18 °C until the analyses were conducted using a Gas Chromatograph coupled to a Mass Spectrometer (7890 GC System, 7000C GC/MS triple quad, Agilent technologies).Extracts were completely evaporated and re-suspended in 50 μl of GC quality Hexane (Sigma Aldrich).
One microliter of each sample was injected into a 30m-long HP-5MS fused silica capillary column (HP-5 MS UI, Agilent) with a splitless time of 2 min at 250°C and a Helium flow rate of 1 ml.min -1 .The temperature program was set from 100°C (2 min initial hold) to 320°C at 6°C.min -1 (5 min of final hold).The cuticular components were identified using an electron impact of 70eV (scanning mode 40-550 amu).Fragments were analyzed with the Masshunter software (qualitative analysis B.06.00, Agilent technologies) and compared to public data.Quantification was achieved by calculating peak area relative to the total peak area.
Only compounds with an average relative quantity higher than 1% of the total CHC composition in at least one specimen per clade were used in the statistical analyses.
Because the number of CHC identified in the samples was too large compared with the number of samples, we first regrouped the compounds according to their carbon chain length and number of methyl branching.Differences in profiles among the clades identified by BI were then analyzed statistically using Discriminant Analysis of Principal Components implemented in the Ade package in R (R Development Core Team 2015).

Male genitalia analyses
A total of 211 males were collected from 25 locations (range: 1 to 20 males per location).For each male, 15 measurements were taken with a millimetric ruler, a digital camera Canon 30D with MP-E 65 mm macro lens, and the software ImageJ (Abràmoff et al., 2004).The measurements comprised head length and width, inter-ocular distance, scape length, pronotum length and hind femur length, subgenital plate width and length, paramere length, penis valves width, length and distance between the apexes of their sclerotic appendices, volsella width, and length and lacinia length.Morphological differences among specimens were then analyzed through principal component analysis in R (R Development Core Team 2015).

Geographic distribution of the clades and niche contraction
Current clades' niche modeling was conducted using Maxent v. 3.3 (Philips et al. 2004(Philips et al. , 2006)).First, we retrieved 19 bioclimatic variables from WORLDCLIM online database (http://www.worldclim.org/) with a resolution of 2.5 arc-minute.These 19 variables (Table S3) present climate conditions for the period 1960-1990 in the study area and were used to train a model of current distribution of the identified clades.The same bioclimatic variables estimated for the Last Glacial Maximum through the Max Planck Institute for Meteorology new Earth system model (MPI-ESM), also recovered from WORLDCLIM, were used to predict the potential geographic distribution of each clade 0.025 Ma.
Haplotype number and diversity were much greater for COI (97 and 0.99, respectively) than for both nuclear genes (31 and 0.87 and 25 and 0.79 for LWRh and 28S, respectively).The best evolution models are given in Table 2.The Bayesian analysis of the three concatenated genes resulted in a well-supported phylogram that allows the identification of eight clades (Fig. 2).Descriptive statistics for each clade and the results of neutrality tests are given in Table 3.None of the tests showed a significant departure from neutrality.
The Bayesian method suggested that the separation between C. tartessica and C. albicans group occurred 9.66 Ma (95% interval: 15.15 -5.21Ma).Among the albicans group, The BSP method showed a constant population size followed by a recent contraction for the NA group (Fig 6).In contrast, the IP group revealed a constant increase in population size with a rapid recent expansion.These results, however, should be taken with caution as samples were pooled, and hence treated as a single species under uniform selection pressure throughout their distribution range.Nonetheless, they are informative in illustrating general trends of demographic processes occurring on both sides of the Mediterranean, particularly after the opening of the Gibraltar Straits.

Microsatellite analyses
The summarized statistics for the 129 typed specimens and 10 loci are given in Table 4. Details for each locus and the results of Hardy-Weinberg tests are given in Table S4.All loci were polymorphic with allele numbers ranging from 4.3  0.7 in clade F to 15.2  2.3 in clade C.There was evidence of linkage disequilibrium for only four out 45 pairs of loci (Table S6).
Observed and expected heterozygosity ranged between 0.47  0.08 and 0.59  0.10,

CHC analyses
A total of 46 CHC were identified on the ant cuticle (Table S7).All were alkanes that belonged to 22 categories based on their chain length (25 to 33) and number of methylations (0 to 3).The DAPC conducted on these 22 categories suggested that all the clades had well distinct profiles except E and F (Fig 5c).All the profiles were relatively common for ants and were dominated by C29 and C27 linear-and methyl-alkanes.Clades F and H differed from the others by having abundant long-chain alkanes (C31 for clade F and di-and trimethyl-C33 for clade H).In contrast, clade B was limited to trimethyl-C29 and lacked carbon chain longer than C30.

Male biometry
Only males from clades A to E could be collected in the field.The 15 measurements of male genitalia are given in Table S8.The result of the DAPC conducted on these measurements supported morphological differences among the three Iberian clades (Fig 5d).In addition, males of clades A, B, and C clearly differed from the two NA clades (D and E).The variables that most separated among the five clades were the measurements of genitalia organs (i.e. Paramere length, Subgenital plate width, Volsella width; Table S8).

Geographic distribution of the clades and niche contraction
Despite low sample size for some clades, the use of 19 bioclimatic variables enabled a relatively good prediction of the clades' current distribution (Fig. 8).The distributions of favorable environmental conditions for the three Iberian clades showed a clear extension since the Last Glacial Maximum.This was particularly noticeable for clade C, for which favorable conditions 0.025 Ma were limited to three small areas in the east, west and south of the IP.In contrast, the distribution of favorable conditions for the NA clades showed evidences of a drastic reduction since the Last Glacial Maximum.(Aron et al., 2016;Cagniant, 2009).Moreover, the absence of species belonging to the C. albicans group in other regions of Europe and its commonness in the Middle East and Maghreb support the African origin of the Iberian clades.The alternative according to which migration from the middle East occurred through the North of the Mediterranean is more likely for the C. cursor group of which isolated populations are known in Greece, Italy, France and Spain but not in NA.Future analyses using a phylogenomic approach may help resolving these uncertainties (Blaimer et al., 2015).In addition, it would be interesting to obtain samples from other countries of NA to get a more precise picture of species diversification in this region.

DISCUSSION
The Bayesian tree calibrated using the divergence between F. wheeleri, P. nasuta, and C. iberica (Moreau and Bell 2013) dates the branching of clade D and the Iberian clades to 6.17 .This is consistent with the ancestor of the Iberian clades having crossed the Gibraltar Straits through a stepping-stone dispersal process in the late Miocene.
The progressive closure of the straits in the early Tortonian (~8 Ma; Fig. 1) led to the formation of temporal islands separated by ephemeral corridors that were probably small enough (<1 km) to permit dispersal of Cataglyphis queens and males.The reduction of water flow between the Atlantic and the Mediterranean culminated in the formation of a continuous land bridge in the early Messinian (6.1 Ma; Achalhi et al., 2016), which probably enhanced the northbound dispersal of African species.However, the fast reopening of the Gibraltar Straits 5.3 Ma may have rapidly limited gene flow and induced vicariance.A similar scenario of late Miocene dispersal from NA towards the IP has been suggested for some terrestrial vertebrates (Albert et al., 2007;Kaliontzopoulou et al., 2011;Paulo et al., 2008).This may also apply to other desert ants, like the C. altisquamis and C. emmae groups that occur on both sides of the Gibraltar Straits.It is intriguing, however, that some other groups with apparently similar dispersal capacity are frequent in NA but are missing in IP.This is the case of the C. bicolor group and of the C. albicans clades E-H.One explanation could be that these clades were not yet present in NA in the late Miocene and thus clearly could not have crossed the Gibraltar Straits.Alternatively, they may have dispersed towards the IP but became extinct possibly during the Pleistocene glaciation, as suggested, for example, for Malpolon snakes (Carranza et al., 2006a).
The three Iberian clades of C. albicans have a parapatric distribution that contrasts with the sympatry of the North-African clades.A similar geographic pattern among phylogenetically-related species exists in amphibians (Gonçalves et al., 2015), reptiles (Albert et al., 2007), and also in desert ants of the altisquamis and emmae groups (Jowers et al., 2014;Tinaut, 1990).This may indicate a second, much more recent vicariance event induced by Pleistocene glaciation in the IP.During this period, large populations of thermophile species were fragmented into small isolated populations distributed in the warmest valleys and coastal regions.The further global warming that characterized the Holocene may then have allowed the recolonization of the entire IP, allowing secondary contacts between once isolated populations.This is consistent with demographic analyses suggesting the recent expansion of the Iberian populations of C. albicans.It also coincides with climate niche reconstruction, which shows that the potential distribution of the Iberian clades during the last glaciation maximum was limited to three or four refugia, similar to those reported for other animals and plants (Abellán and Svenning, 2014;Albert et al., 2007;Centeno-Cuadros et al., 2009;Ferrero et al., 2011;Gomez and Lunt, 2007;Gonçalves et al., 2015;Miraldo et al., 2011;Weiss and Ferrand, 2007).Although the output of the niche construction models should be interpreted cautiously, simulation methods have shown that Maxent results are robust to small sample size (Wisz et al., 2008).However, despite climatic variables being particularly determinant for the ecology of desert ants (Amor et al., 2011;Boulay et al., 2017;Cerdá et al., 1997Cerdá et al., , 1998)), the validation of the models requires data on temperature resistance and on individual and colony-level performance as a function of climate.Moreover, factors not accounted for by the models may affect ant distribution.For example, because Cataglyphis ants inhabit open habitats, variations in the proportion of forest ecosystems may thus have affected their distribution independently of temperature and precipitations.
Clade A is currently separated from clade C by rather high mountains that may prevent between-population gene flow.However, a potential contact zone exists along the Mediterranean coast where there is no geographic barrier apart from, perhaps, recent urbanization.Moreover, some of the collection points of clades B and C are separated by less than 10 km (C361 and C364), suggesting that a contact zone is likely to exist between these two clades.The processes that prevents the spatial co-occurrence of these recently diverged clades, despite the often common parapatric occurrence even among distant clades, is unclear.Even in the absence of evident habitat differences between sister clades, their prolonged isolation in distinct areas may promote the selection of cryptic adaptations to different micro-habitats.Once a secondary contact has been established, competition for resources may preclude the overlap of the previously isolated clades.For example, in Cataglyphis ants thermal preference and prey-size selection may induce high competition and could prevent the coexistence of sister clades.Competition may then be attenuated by the process of niche partitioning among more phylogenetically distant clades, allowing their sympatric distribution (Dillier and Wehner, 2004;Knaden and Wehner, 2006).
Important morphological and chemical differences exist among some of the identified clades.Morphological differences among males may strongly limit intra-clade mating and may be driven by a reinforcement-like process if potential hybrids at the contact zone have a lower fitness than their parents.Such a mechanism has already been proposed for C. tartessica and C. floricola, which occur in parapatry in southern Spain (Jowers et al., 2014).
The selection forces that affect workers' cuticular chemistry are more complex to disentangle.The primary function of CHC is that of resistance against desiccation.This function may be particularly important for thermophilous species that forage in extremely arid environments.Thus, we cannot exclude a possible convergence of the CHC profiles among clades E, F, and G due to their exploitation of similarly dry habitats.In addition, CHCs have acquired an important communication function and help in assessing whether encountered individuals are aliens or nestmates.To what extent this new function interferes with the clade-specific composition of CHC is unknown.
Our results also have important consequences for the taxonomy of this complex group of species.To date, it was considered that the IP albicans group comprises four species (C. iberica, C. rosenhaueri, C. gadeai, and C. douwesi;Cagniant 2009), but our phylogenetic trees recovered only three clades (A, B, and C).In addition, C. iberica, which was thought to occupy the great majority of the IP, appears to be polyphyletic: C. iberica sensu stricto is probably limited to the north-east of the IP, while black specimens distributed in the center and the south of the IP are more closely related to C. rosenhaueri, despite a marked difference in coloration.Our data also question whether C. douwesi and C. rosenhaueri are distinct species.Interestingly, although the analysis of microsatellite markers through DAPC suggests the existence of a gene flow between clades B and C, both clades have distinct chemotypes and male morphotypes, suggesting that they may constitute two different species.The taxonomic situation in NA is even more complex.Some species that have been described mostly based on differences of workers are probably para-or polyphyletic like the pairs C. otini and C. cubica and C. espadaleri and C. theryi.Therefore, the C. albicans group deserves a major taxonomic revision and a more intensive sampling in the remote regions of NA.In addition to extensive morphological measurements on all the castes, the use of chemical markers may help to characterize otherwise cryptic species (Bagnères and Wicker-Thomas, 2010;Dahbi et al., 2008Dahbi et al., , 1997)).
In conclusion, our data point at dispersal and vicariance as two major mechanisms explaining the genetic divergence among Cataglyphis desert ants.This reaffirms the important role played by geographic isolation as a driver of new species formation, either through drift or selection.By modulating gene flow among populations, past geo-climatic events have shaped the diversity of the western Mediterranean basin.Further analyses should be undertaken in order to determine the respective roles of genetic drift and natural selection in this speciation process.Such knowledge will help in predicting how future global warming might affect local and regional species richness in this major hotspot of biodiversity.
the most ancient node (7.26 Ma; 95% interval: 11.34 -4.04 Ma) separated a group of North-African specimens only (clades E to H) with another group composed of both Iberian (clades A, B and C) and North-African (clade D) specimens.The three Iberian clades were wellsupported (BI support = 1) and showed a parapatric distribution (Fig. 2 and 3).The separation among these three clades may have occurred less than 3 Ma.Clade D contained North African specimens identified as C. otini and C. cubica.The position of this clade as a sister lineage to the three Iberian clades was not strongly supported (posterior probability = 0.7).Divergence time between clade D and the common ancestor of the Iberian clades was estimated as 6.17Ma (95% interval: 9.69 -3.18Ma).In contrast to the IP, the NA clades had overlapping distributions (Fig 3).Maximum Likelihood (Fig 4) and the Median Joining Network (Fig. S4) were consistent with Bayesian Inference.However, the position of clade D as a sister clade of the three Iberian clades was not supported by the Maximum Likelihood analysis (22%).Separate analyses conducted on each gene revealed a much greater structuring for the mtDNA than for both nuclear genes, probably due to the greater rate of evolution of the former.The DAPC analysis conducted on the sequence data also suggested a major segregation between the three Iberian clades and the five North-African clades along the first axis, which explained 61% of the variance (Fig 5a).The second axis explained 26% of the variance and separated clade D from the other clades.
respectively in clade A to 0.62  0.08 and 0.76  0.09, respectively in clade C. The fixation indexes were relatively high in all the clades, suggesting a high level of genetic differentiation within them.The analysis of microsatellite markers in TESS supported the existence of seven and five clusters with and without admixture, respectively (as indicated by the breaks in the plot of the Deviance Information Criteria against the number of clusters; Fig 7).Independently of the exact number of clusters, the first level of clustering separated the Iberian and North-African clades, suggesting that the Gibraltar Straits represented a major barrier to gene flow between the IP and NA.Another level of structuring within NA was shown by the well-defined clade D, which denoted a lack of gene flow with other North-African and Iberian clades.Moreover, clade E showed a major partitioning with specimens grouping either with clade G or F. A high level of genetic diversity was found in clade H.In contrast, a very low-level structuring existed within the IP.The analysis conducted in DAPC supported the separation between the IP and NA (Fig 5b).In addition, DAPC suggested a clear distinction between clade A and clades B and C.Among the NA clades, D clearly segregated from the others while E and G overlapped.
Molecular analyses reveal the existence of at least eight clades among the C. albicans group that currently occupies NA and the IP.These clades also show phenotypic differences.The three Iberian clades are monophyletic.Although their position with respect to the North-African clades is not strongly supported, their separation may have occurred after the last reopening of the Gibraltar Straits during the Miocene.In contrast to the NA clades, the IP clades have a paratric distribution.Moreover, both their population size and available niche show signs of expansion.This suggests that the formation of three independent clades in the IP may have been triggered by repeated glaciations along the Pleistocene.Recent warming may then have allowed secondary contacts between these new species.The analysis of mitochondrial and nuclear sequences of specimens collected in the IP and NA resulted in a clear delineation of at least eight clades that are also characterized by chemical and/or morphological differences.Bayesian and Maximum Likelihood methods of phylogenetic inference suggest a monophyly of the three Iberian clades.Their position with respect to the NA clades is uncertain.Hence, the supports for the node between clade D and the Iberian clades are weak.Moreover, the DAPC conducted on the sequences data highlights a clear discrimination between D and all the other clades.Finally, microsatellites analysis in TESS tends to associate clade D with the NA rather than with the IP clades.Nevertheless, all methods except the CHC analysis point at the uniqueness of clade D and the topologies of the trees are in agreement with recent phylogenies of the genus Cataglyphis that suggest that C. iberica and C. rosenhaueri derive from C. cubica-which composes clade D-

Figure captions Figure 1 :
Figure captions

Figure 2 :
Figure 2: Phylogram obtained by Bayesian Inference for 130 samples of C. albicans group

Figure 3 :
Figure 3: Geographic distribution of the 8 clades identified by Bayesian Inference.

Figure 4 :
Figure 4: Phylogram obtained by Maximum Likelihood for 130 samples of C. albicans group

Figure 5 :
Figure 5: Linear Discriminant Analysis of principal components conducted on sequences (a),

Figure 6 :
Figure 6: Bayesian skyline plots showing effective population size fluctuation throug time for

Figure 7 :
Figure 7: Results of the population structure analysis in TESS using microsatellites.Models

Figure 8 :
Figure 8: Models of distribution of the most favorable climatic conditions for each of the 8

Table 1 :
Primer information

Table 2 :
Best model of evolution for each partition based on the Bayesian Information Criteria.

Table 3 :
Measures of diversity of the 3 loci for the 8 major clades of the albicans group in the IP (clades A to C) and Morocco (clades D to H).
N: Number of sequences used; L : sequence length; S: Number of variable sites; Nh: Number of haplotypes; Hd : Haplotype diversity ; Pi: Nucleotide diversity ; D: Tajima's D test statistic; D*: Fu and Li's D* test statistic.

Table 4 : Descriptive statistics obtained from the microsatellite analyses.
: number of genotyped specimens; Na: Mean±SE number of alleles; PA: Number of private alleles; He: Mean±SE number of effective alleles Ho and He: Observed and expected Heterozygosity (± SE); Fst: Fixation index (±SE). N