Review of clypeasteroids phylogeny and a case study of Sinaechinocyamus mai ( Taiwanasteridae )

College of Medicine, National Taiwan University, No. 1, Sec. 1, Ren-Ai Road, Taipei 100, Taiwan Department of Geosciences, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan Department of Life Science, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan Institute of Biotechnology, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan This manuscript is submitted to “Terrestrial, Atmospheric and Oceanic Sciences”


Sequencing Technology
The development of evolutionary theory and sequencing technology are coupled with each other. Since 2005, next-generation sequencing (NGS) is one of the most significant technological breakthroughs in the field of molecular biology (Goodwin et al., 2016). With the aid of NGS, the availability of non-model species DNA sequences significantly increases. Therefore, the new field "phylogenomic," using the genomic scale sequence data to reconstruct the phylogenetic relationship between species, has emerged and improved both the depth (more evolutionary distant clades, such as the divergence between kingdoms) and resolution (able to resolve the relationship between closely related groups) of the tree of life (Chan and Ragan, 2013).
In spite of the flourish of NGS technology, the traditional gene markers used in molecular evolution do not entirely step down from the stage. Due to various reasons (i.e., cost, difficulties in data analysis, sample quality), single gene markers remain useful in small-scale analyses. The massive quality of data produced by whole-genome sequencing provides detailed information but is cumbersome in analysis (Cannon and Kocot, 2016). One way to do so is by only sequencing the coding region of the genome. As shown in Table 1., transcriptome analysis represents a balance between the phylogeny resolution whole-genome sequencing (WGS) and the cost and convenient advantage of gene markers (Cannon and Kocot, 2016;Young and Gillung, 2019).
Modern biologists can facilitate paleontology studies in various ways. Through experimental models of modern species, developmental biologists can answer how certain structures possibly formed in fossil species and the potential gene regulatory networks responsible for the evolutionary origin and development of these characters (Thompson et al., 2017). The relationship among fossil species with extant representatives in their corresponding clades can be supported by reconstructing the phylogeny based on sequencing data of extant species. Furthermore, sequencing data is also able to provide evidence of divergence time and correlate with geological events (Coppard and Lessios, 2017). Analyzing the genetic material inside fossil specimens is a field under active development as well (Paabo et al., 2004). Under adequate conditions, ancient DNA (aDNA) can preserve up to more than one million years (van der Valk et al., 2021). Through direct sequencing of DNA of ancient species, researchers can be more certain about the population genetic dynamic and the relationship between fossil and extant species (Orlando and Cooper, 2014). Furthermore, the genetic basis provides an independent dataset for phylogeny inference other than traditional phenotype comparison alone (Orlando and Cooper, 2014). Therefore, the advent of sequencing technology has revolutionized paleontology by adding a new dimension to the study material. < A c c e p t e d M a n u s c r i p t >

Phylogeny of Irregular Echinoids
Echinoidea, commonly known as sea urchins, forms a conspicuous and essential element of many marine benthic communities. They exploit a wide array of marine habitats and contribute the greatest levels of biodiversity in shallow shelf areas. Arisen since the Middle Ordovician in 460 million years ago (Mya), there are more than 1000 living species in the clade of Echinoidea (Kroh, 2020;Kroh and Smith, 2010). The robust endoskeleton structure, known as stereom (Lin et al., In press), which is composed of multi-plated magnesium calcite, allows echinoids to leave a relatively rich fossil record: in total, there are about 10000 fossil species (Kroh, 2020). From paleontology to developmental biology, the extant species and the rich fossil record make the echinoids excellent candidates as valuable study models.
Among the clades in Echinoidea, Irregularia, including species are commonly known as sand dollars (Scutelloida) and sea biscuits (Clypeasteroida), distinguish themselves from irregular echinoids through the unique body plan (Saucède et al., 2007). One of the characteristic features of the Echinodermata phylum is the pentameric symmetry body plan. The larvae of echinoderms are bilaterally symmetrical, while upon metamorphosis, the larval structures typically are lost and yield a pentameric juvenile (Peterson et al., 2000). However, irregular echinoids share a secondary adult bilateral symmetry body plan (Fig. 1), and the establishment of body axis is still largely unknown and intriguing. Furthermore, the phylogeny study regarding the relationship among irregular echinoids suffers from the discrepancy between morphology and molecular evidence (Kroh and Smith, 2010;Mongiardino Koch et al., 2018).
Due to the fact that the taxonomy of Irregularia is still a topic under active discussion, in this article, the definition of Clypeasteroida is used in congruence with Kroh (2020) and Mongiardino Koch et al. (2020) to prevent ambiguity, which does not include the traditional suborder Scutellina. While the term "clypeasteroids" is used to denote the traditional clade consists of Clypeasterina and Scutellina.
Based on the recent total-evidence dated analysis, clypeasteroids possibly originated during the Cretaceous (145-66 Mya) (Mongiardino Koch and Thompson, 2020). By the Middle Eocene (49-37 million years ago), this group already presented in the fossil record worldwide (Pawson, 2007). The principal characters in echinoid fossil morphological analysis are the structure of test, spine, and Aristotle's lantern (Kroh and Smith, 2010;Ziegler et al., 2015). In particular, the lantern structure of clypeasteroids is highly modified compared to other echinoids, and this modification has been hypothesized to be crucial for the epifaunal inhabitant of clypeasteroids (Mooi, 1990a). On the contrary of abundant morphological studies, molecular efforts have lagged on echinoids phylogeny studies. Early attempts to use molecular data to resolve the deep-sea urchin phylogeny relied on just a few single-gene markers, usually ribosomal RNA-coding ones (Smith et al., 2006). The lack of broad sampling of loci across the genome limits the robustness of these phylogenetic analyses due to the genes used might be the outliers of incomplete lineage sorting. When an ancestral species undergoes < A c c e p t e d M a n u s c r i p t > rapid speciation in succession, ancestral polymorphisms may not be fully resolved into different monophyletic lineages, and the individual gene trees may be incongruent with the species tree (Galtier and Daubin, 2008). Other types of biases can also lead to inaccurate phylogenetic topology even in the absence of incomplete lineage sorting. For instance, species exhibiting significantly higher genetic distances to other clades tend to be erroneously clustered together. This phenomenon, known as long-branch attraction, is commonplace in molecular evolution inference, and it may arise from the heterogeneity of substitution rates, inadequate taxon sampling, and base compositional heterogeneity (Qu et al., 2017;Susko and Roger, 2021).
Therefore, although numerous studies have tried to investigate the phylogeny of Irregularia, the relationships among sea biscuits, sand dollars, and other irregular echinoids remain not well understood due to the contradicting results yielded by molecular and morphological data (Kroh and Smith, 2010;Mongiardino Koch et al., 2018). For example, most morphological phylogenies strongly supported the monophyly of clypeasteroids and their origin from a paraphyletic sister group collectively known as "cassiduloids", including extant clades of Echinolampadoida Kroh & Smith, 2010, Cassiduloida Claus, 1880, and other extinct lineages (Kroh and Smith, 2010). On the other hand, the transcriptomic analyses refute the monophyly of Clypeasteroida (Mongiardino Koch et al., 2018). This mismatch remains somewhat unresolved, and further investigation is needed to answer this perplexing inquest.

Body Plan
One of the most conspicuous characteristics of extant echinoids is their pentameral symmetry in the adult stage. Based on the observation of extant echinoderms and the fossil record, it seems the early echinoderms were bilateral in both the larval and adult life stages while the extant species developed secondary pentameral symmetry in adulthood (Peterson et al., 2000). Among many essential genes controlling development, one crucial important player is the homeobox genes, also known as Hox genes. They are a family of transcription factors regulating body plan and axis establishment in embryo development (Lacalli, 2014). A primary recognition criterion for these genes is a sequence of an 180 nucleotides, highly conserved "homeobox," which translates into a sequence of 60 amino acids (the homeodomain motif) that binds to specific DNA or RNA sequences (Mooi and David, 2008).
Hox genes have been found in animals, plants, fungi, and numerous other eukaryotes, and they are taxonomically widespread and highly conserved, suggesting a basal origin on the phylogenetic aspect. Almost all developmental gene networking involves the Hox gene family in metazoans. One of the most conspicuous phenotypic of the Hox cluster is the patterning along the posterior (A/P) axis of bilaterian animals (David and Mooi, 2014). Hox genes are generally arranged into three groups, or classes, on the chromosome: anterior (Hox1 and Hox2, with Hox3 possible categorized into an independent group); medial (Hox4 to Hox8); and posterior (Hox9 to Hox13) (Mooi and David, 2008). In echinoderms, the Hox 9/10 and Hox 11/13a, Hox 11/13b, and Hox 11/13c were designated due to the homologous analysis did not yield a clear one-to-one correlation between echinoderms genes and their chordates counterparts. The anterior class is closer to the 3' end and the posterior closer to the 5' end of the DNA strand. This remarkable linearity in the organization of the genes along chromosomes is the basis for a concept known as "collinearity" (Mooi and David, 2008).
There are two kinds of collinearity observed in Hox gene cluster: temporal and spatial. The genes of the anterior class (3' end) are activated first and expressed earlier in ontogeny than those of the medial class, and the medial ones are expressed earlier than those of the posterior class at the 5' end, and this is temporal collinearity (Mooi and David, 2008). In organisms that grow from anterior to posterior, such as arthropods, temporal collinearity is in turn transcribed into spatial collinearity, in which the first expressed genes at the 3' end of the cluster code for developmental events in the anterior part of the organism, and the last ones at the 5' end for events in the posterior part (Mooi and David, 2008). The schematic diagram for temporal and spatial collinearity of Hox gene cluster is shown in Figure 2.
The Hox gene cluster in Echinodermata lost the collinearity, as shown in Figure 3. Among all echinoderms, the echinoid cluster is by far the most thoroughly investigated, as it has been studied in several species belonging to quite disparate clades. In the Strongylocentrotus purpuratus, one of the most important model species of echinoids with its whole genome sequenced and annotated, the Hox cluster is almost complete besides Hox4, leaving a total of 11 genes (Li et al., 2020). The peculiarity of echinoid Hox clusters is not so much the absence of a member in the series but that the order of the genes themselves along the chromosome is rearranged. Hox1 to Hox3 are located at the 5' end of the cluster in reverse order. This topology is the result of a chromosomal translocation and inversion event.
This observation of unique rearrangement inspired the hypothesis of the body plan of echinoids being a result of a disorganized Hox cluster (Mooi and David, 2008). Since the S. purpuratus is the first sequenced echinoid and the first in the whole phylum Echinodermata, it served as a model species for this highly diverse clade initially (Sea Urchin Genome Sequencing Consortium et al., 2006). However, some recent studies question the significance of the loss of collinearity in Echinodermata species due to a better understanding of the genomic landscape of other Echinodermata species (Byrne et al., 2016). As shown in Figure   3., sequenced samples from the Crinoidea (sea lily), Asteroidea (starfish), and Holothuroidea (sea cucumber) clades show no signs of the same translocation/inversion of S. purpuratus (Li et al., 2020).

Sinaechinocyamus mai
Here, we introduce a recent study to exemplify the importance of incorporating molecular methods in modern taxonomy research. Among 15 extant clypeasteroids recorded in Taiwan shown in Table 2., Sinaechinocyamus mai (Wang, 1984) is one of the most studied sand dollars (Figure 4.). The species was initially described by Chai-Chin Wang in 1984, and the species name is in honor of the Ting-Ying Ma, one of the eminent pioneer geologists in Taiwan and China (Wang, 1984). The original species assignment was Taiwanaster mai.

However, Mooi examined the description of the feature of Taiwanaster and found that
Taiwanaster is the same genus as Sinaechinocyamus Liao, 1979. Therefore, Taiwanaster is an invalid junior synonym, and Taiwanaster mai was reassigned as S. mai (Mooi, 1990b).
Aside from the paleontology research on the fossil record, investigation of the living populations of S. mai began in the early 1990s. The ecology habitat of S. mai is in the intertidal zone along the west coast of Taiwan, from Hsinchu to the mouth of the Tsengwen river (Lee et al., 2019). Since S. mai possesses several characters that do not fit any other clypeasteroids family, Wang proposed a novel superfamily, Taiwanasteritida Wang, 1984, and a new family Taiwanasteridae Wang, 1984 to place S. mai (Wang, 1984). Mainly based on the observation of miniaturized body size, Wang proposed that Sinaechinocyamus was related to other microechinoids in the Asia-Pacific region, such as Fibulariella acuta (Yoshiwara, 1898). On the other hand, Mooi concluded that Sinaechinocyamus is a derived scutelline sand dollar (Mooi, 1990b). More recent morphology-based phylogeny also supports the hypothesis that S. mai is a derived scutelline and places the family Taiwanasteridae as incertae sedis within Scutelliformes (Kroh and Smith, 2010). Based on the morphological study, the extant sister group may be Scaphechinus mirabilis, a widely distributed sand dollar in the northwestern Pacific (Mooi, 1990b). Adult S. mirabilis generally has a body diameter of approximately 7 cm in size, while adult S. mai nearly never exceeds 1 cm (Chen and Chao, 1997).
Miniaturization does not only represent smaller body size but also contributes to essential physiology, behavior, development, and reproductive changes according to the size. There are two main models of paedomorphosis: "progenesis", which miniaturized species has a similar growth rate but growth period ends earlier, and "neoteny", which is characterized by a similar growth period with a slower growth rate (McNamara, 1986). In the case of S. mai, it seems that the predominant model might be the neoteny due to the homogenous decrease of growth rate to 19% of S. mirabilis with a comparable growth period (Chen and Chao, 1997). Further investigation of comparing the developmental process of S. mirabilis and S. mai might give valuable insight into the mechanism of miniaturization of S. mai. Fossil record and paleoclimatology may provide a possible evolutionary explanation of miniaturization.
In this study, we aimed to utilize the transcriptome-based approach to resolve the phylogenetic placing of S. mai, and investigate the monophyly of clypeasteroids through incorporating public available transcriptomic datasets of irregular echinoids.

Sample and library preparation
Live specimens of Sinaechinocyamus mai were collected in their native habitat in Miaoli County, Taiwan (120°39′E; 24°29′N) in May 2018. Total RNA of the sample was extracted using Tri Reagent (Ambion, USA). RNA concentration and quality were determined using a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, USA), which calculates an RNA integrity number (RIN).
Total RNA with A260/A280 = 1.8-2.0 and RIN >8.0. Normalize intact RNA samples to 200 ng/μl with DEPC-treated H2O. Total 1~2 ug RNA was purified by using poly-T oligo-attached magnetic beads. Following purification, the mRNA was fragmented into small pieces using divalent cations under elevated temperatures. The cleaved RNA fragments were reverse transcribed into first-strand complementary DNA (cDNA) using reverse transcriptase and random primers. This was followed by second-strand cDNA synthesis using DNA Polymerase I and RNase H. These cDNA fragments then went through an end repair process, adding of a single "A" base, and then ligating the adapters. The products were then purified and enriched with PCR to create the final cDNA library. Paired-end 75 nucleotide reads from each mRNA library were obtained using NextSeq500 (Illumina Inc., CIC bioGUNE, Spain).

RNA sequencing
We quantified the cDNA with Life Technologies Qubit and Agilent Technologies Tapestation 4200 and used 0.625 nanograms of size-selected cDNA for the sample preparation and the barcoded library prepared by the Chromium Gel bead and library kit (10X Genomics, USA).
Barcoded library sequencing was performed with Illumina Novaseq 6000 sequencer with 2 x 151 paired-end reads. All read-pairs contain a 16-base barcode.

Bioinformatic analysis
The list of the species incorporated in this study is shown in Table 3. The raw reads from the RNA sequencing of Sinaechinocyamus mai in this study and published data in the Sequence Read Archive (SRA) repository underwent quality control and adapter removal using Trimmomatic (Bolger et al., 2014) with default parameters. Reads passed quality control was assembled by Trinity (Haas et al., 2013). Transcriptome-based phylogenetic analysis was then conducted through the Agalma pipeline (Dunn et al., 2013), which sequentially annotated the transcriptome assembly with BLAST (Camacho et al., 2009), aligned gene clusters with MACSE (Ranwez et al., 2011), trimmed alignments with GBlocks (Talavera and Castresana, 2007), and provided preliminary maximum likelihood (ML) phylogeny based on the supermatrix with RAxML (Stamatakis, 2014).
Further ML phylogeny analysis was separately conducted with partition of homologues based on the Agalma pipeline output and without partition in RAxML v. 8.2.12 with 100 replicates, respectively. The substitution model was selected automatically by RAxML (-m PROTGAMMAAUTO).

Transcriptome-based phylogeneomics
The analysis included 22 echinoid species, and Eucidaris tribuloides was designated as the outgroup. In total, the Agalma pipeline yielded 9630 loci comprised 2,574,725 amino acid positions with an actual gene occupancy of 39.2%. The concatenated supermatrix dataset was then used in the transcriptome-based phylogenetic analysis. Both partitioned and unpartitioned dataset was used to infer the phylogeny, and the topology was identical. The phylogeny is shown in Figure 5. The result is congruent with  and Koch et al. (2020). The result revealed that the "cassuloids" Conolampas sigsbei is a sister group to the Scutelloida clade. Therefore, the monophyly of traditional clypeasteroids was not supported. As for the newly sequenced S. mai, the result suggests that S. mirabilis is its sister taxa in the sampled dataset, supporting previous studies drawn on morphological analysis.

Monophyly of clypeasteroids
The overall cladogram comparison between previous studies and the result from this work is shown in Figure 6. Mongiardino Koch et al. (2018) firstly used transcriptome-based phylogeny to challenge the morphology-based monophyly of clypeasteroids. In this study, the "cassiduloids" included in the analysis, Conolampas sigsbei, intertwined in the phylogeny relationship of Clypeasteroida and Scutelloida.
In our previous study (Lin et al., 2020), we sequenced the whole mitochondria genome of S. mai and inferred the phylogenetic relationship through the data. The availability of a mitogenome for S. mai sets the stage for exploring the systematic position of this taxon.
According to the mitogenome-based phylogeny, the clades of Echinolampadoida and Scutelloida both receive 100% bootstrap support (Lin et al., 2020). Consistent with the hypothesis raised by Mooi, this result suggests that the presence of Aristotle's lantern in adults was firstly lost in the basal Irregularia, then the re-appeared in the Scutelloida clade (Mooi, 1990a). However, due to the fact that mitogenome data from the Scutelloida clade was unavailable, and this work was unable to support nor refute the monophyly of clypeasteroids.
On the other hand, one analysis published recently tried reconstructing the phylogeny of Echinoidea using the total-evidence dated approach incorporating the fossil morphology evidence and transcriptomic data of extant species (Mongiardino Koch and Thompson, 2020).

< A c c e p t e d M a n u s c r i p t >
By combining the morphological fossil record and molecular data (transcriptome dataset was used for phylogenetic inference in this study), the monophyly of clypeasteroids was refuted again. Therefore, under this phylogenetic theory, the morphological characters shared between Clypeasteroida and Scutelloida (e.g., extremely flattened test and presence of Aristotle's lantern) result from convergent evolution rather than true synapomorphies (Mongiardino Koch and Thompson, 2020).
The transcriptome-based phylogeny analysis presented in this article was consistent with Mongiardino Koch and Thompson (2020), which did not support the monophyly of clypeasteroids. The reason for this inconsistency between mitogenome-and transcriptomebased phylogeny might be primarily due to the limited sampling. As the mitogenome-based analysis did not include Clypeaster spp. from Clypeasteroida, the possible polyphyly status of clypeasteroids cannot be revealed.

Future Perspective
Despite the fact the seemingly solid evidence provided in the transcriptome-based phylogeny, the only representative of "cassiduloids" clade was the extant species Conolampas sigsbei which was resolved as the sister group to Scutelloida. Therefore, one should be careful when interpreting this result, and further investigation sampling more species in the "cassiduloids" clade is still needed for a deeper understanding of the conundrum of the monophyly of clypeasteroids. Nevertheless, these recent works showcase the potential of using molecular data to aid phylogenetic research in the field.
On the other hand, although the role of Hox gene cluster rearrangement in establishing pentameric symmetry of Echinodermata has been questioned based on the due to several sequenced species from Crinoidea (sea lily), Asteroidea (starfish), and Holothuroidea (sea cucumber) clades do not present with rearranged clusters. Due to the lack of available data on irregular echinoids, it is still unclear if the structure of Hox gene cluster contributes to secondary bilateralism in the Irregularia. A further investigation of the high-quality genome assembly within this group will benefit the research of this topic.

CONCLUSION
The technology breakthroughs in the first ten years of the 21st century serve as an important foundation for further investigation in the field of genomics, and investigators of other specialties benefit significantly as well, including paleontologists. Both the traditional morphology-based paleontology and novel genetic methods have their own limitation. However, by combining these two paradigms, further investigation may shed light on the phylogenetic relationships, developmental processes, and the possible mechanism behind the secondary bilateral symmetry of clypeasteroids.  Fell (1966) and Mooi (1989). Figure 2. The schematic diagram for the temporal and spatial collinearity of the Hox gene cluster.

Figure legends
Data from Mooi and David (2008).      Sinaechinocyamus mai (Wang, 1984) Table 3. The list of species used in the transcriptome-based phylogenetic analysis in this study. The clade designation is based on Kroh (2020) and Horton et al. (2021).