supplementary materials for...francisco, ca) for sanger sequencing using sp6, m13 and bhf v2...

44
www.sciencemag.org/cgi/content/full/341/6144/384/DC1 Supplementary Materials for Identification of a Colonial Chordate Histocompatibility Gene Ayelet Voskoboynik,* Aaron M. Newman,* Daniel M. Corey, Debashis Sahoo, Dmitry Pushkarev, Norma F. Neff, Benedetto Passarelli, Winston Koh, Katherine J. Ishizuka, Karla J. Palmeri, Ivan K. Dimov, Chen Keasar, H. Christina Fan, Gary L. Mantalas, Rahul Sinha, Lolita Penland, Stephen R. Quake,* Irving L. Weissman* *Corresponding author. E-mail: [email protected] (A.V.), [email protected] (A.M.N.), [email protected] (S.R.Q.), [email protected] (I.L.W.) Published 26 July 2013, Science 341, 384 (2013) DOI: 10.1126/science.1238036 This PDF file includes Materials and Methods Figs. S1 to S21 Tables S1 to S4 and S7 to S9 Legends for Tables S5 and S6 Full References Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/341/6144/384/DC1) Tables S5 and S6 as Excel files Table S5. B. schlosseri gene sequences analyzed in this work and fosmid sequence. Table S6. Details of analyzed B. schlosseri genes, including concordance with fusibility outcomes and genetically defined lines. Movies S1 and S2

Upload: others

Post on 17-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

www.sciencemag.org/cgi/content/full/341/6144/384/DC1

Supplementary Materials for

Identification of a Colonial Chordate Histocompatibility Gene

Ayelet Voskoboynik,* Aaron M. Newman,* Daniel M. Corey, Debashis Sahoo, Dmitry Pushkarev, Norma F. Neff, Benedetto Passarelli, Winston Koh, Katherine J. Ishizuka, Karla J. Palmeri, Ivan K. Dimov, Chen Keasar, H. Christina Fan, Gary L. Mantalas,

Rahul Sinha, Lolita Penland, Stephen R. Quake,* Irving L. Weissman*

*Corresponding author. E-mail: [email protected] (A.V.), [email protected] (A.M.N.), [email protected] (S.R.Q.), [email protected] (I.L.W.)

Published 26 July 2013, Science 341, 384 (2013) DOI: 10.1126/science.1238036

This PDF file includes

Materials and Methods Figs. S1 to S21 Tables S1 to S4 and S7 to S9 Legends for Tables S5 and S6 Full References

Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/341/6144/384/DC1)

Tables S5 and S6 as Excel files Table S5. B. schlosseri gene sequences analyzed in this work and fosmid sequence. Table S6. Details of analyzed B. schlosseri genes, including concordance with fusibility outcomes and genetically defined lines. Movies S1 and S2

Page 2: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

2

Materials and Methods Genome and transcriptome sequencing B. schlosseri can be mass-cultured, reared and crossed in the laboratory (10, 26). Tissue obtained from clone 356a, a wild type first generation colony that was hatched and raised in our mariculture facility, served as the source of gDNA for sequencing (13). Tissues from defined homozygous and heterozygous lines for the Fu/HC locus, and from wild type colonies with known fusion/rejection relationships, served as the source of cDNA for transcriptome and Sanger sequencing (fig. S13; table S4). To assemble this complicated genome, we developed a highly accurate method to sequence many large fragments (6-8kb) in parallel (13). The Velvet assembler (27) was used to assemble short paired-end reads of gDNA that had been sheared to 6-8kb fragments. The Celera WGA assembler (28) was then used to assemble the 6-8kb amplicon fragments into larger contigs. Assembled contigs were further mapped onto 13 (of 16 haploid) chromosomes that were isolated by microfluidics and sequenced individually, as previously described (13, 29, 30). Chromosome reads were aligned to 356a genomic contigs greater than the N50 length, and 13 reference-guided individual chromosome scaffolds were produced (mean N50 of 38Kbp) (13). RNA was isolated from individuals with both known and untested fusibility outcomes. Total RNA was extracted following the manufacturer’s instructions (Ambion; Purelink RNA mini kit) and purified using the Purelink DNase kit (Invitrogen). cDNA libraries for Illumina HiSeq and MiSeq were prepared (Ovation RNA-Seq v1 system, Nugen; NEBnext DNA Master Mix for Illumina (New England Biolabs) and standard Illumina adapters and primers from IDT. RNA-Seq (2x100bp; Illumina HiSeq or 2 x 150bp Illumina MiSeq) was performed. Each genotype was sequenced separately. In total, >100Gb of raw transcriptome sequence data were generated for the 21 colonies (table S4) and aligned to reference mRNAs (below) using BWA 0.6.1 (31) with default parameters. Using Cufflinks (32) with default parameters, B. schlosseri cDNA reads were aligned to the draft assembly and a reference-guided transcript assembly was produced. To predict genes, we used the program Augustus v2.5.5 (33). The reference-guided transcripts assembly was aligned to the draft genome assembly and a “Hints” gff file was generated to guide gene prediction. Augustus was run (using human HMM and parameters), and from 121,094 contigs, a total of 72,632 genes (including gene fragments) were predicted, including BHF. To investigate expression of genes in particular tissues (fig. S17), we generated tissue-specific RNA-Seq libraries from polyA+ containing mRNA using Illumina GAII single-end 36bp reads. B. schlosseri cDNA libraries were prepared and sequenced, yielding

Page 3: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

3

13,363,720 reads from the vasculature (ampullae, blood vessels, and tunic; 1.3M mapped reads) and 13,315,481 reads from the endostyle (4M mapped reads). BWA 0.6.1 (31) was used to map all reads. Mariculture and systemic breeding of B. schlosseri defined Fu/HC lines Mariculture conditions for raising and crossing B. schlosseri in the laboratory, and techniques for assaying phenotypic histocompatibility and defined Fu/HC lines are previously described (10, 24). For analysis of fusion/rejection pairs, per every tested colony, one naive subclone was taken for total RNA purification; another subclone was used in a fusion/rejection assay. Our deep sequencing analysis included: (1) six F1 individuals, progeny of wild type colonies collected in Monterey, CA or Santa Cruz, CA marinas: Sc32e, 2362c, 5326y, 5326x, 5326p (y, p and x are progeny from the same wild -type colonies), Sc109e, 5606b; (2) six colonies, progeny of defined crosses, bred in our mariculture facility: Hm9, 15, 4, 31, 944, 40, and Sc6a-b; (3) seven colonies, progeny of defined Fu/HC allele crosses which include four AA colonies (AA 1, AA 2, AA 3, AA 4), AB colony (AB), AX colony (AX) and BB colony (BB) (table S4; fig. S8). Sanger sequencing of PCR products and primers used RNA from naïve subclones was isolated using an Ambion Purelink RNA minikit and cDNA prepared using Protoscript AMV LongAmp Taq RT-PCR kit (NEB). cDNA was amplified using Illustra Hot Start Mix RTG (GE Healthcare) and amplified using previously described primers for sFuHC and mFuHC (12, 16). In an attempt to recover the entire predicted cFuHC gene, several combinations of primers pairs were used to amplify the template (table S1). PCR was performed using an MJ Research PTC-200 ThermoCycler as follows: Initial denaturization at 95 °C for 5 minutes, followed by 36 cycles of 95°C for 1 minute, 56°C for 1 minute, 72°C for 1 minute, followed by a final extension of 72°C for 30 minutes. Annealing temperature was adjusted according to Tm of primers used. Initial extension time was adjusted according to primer pair and the length of predicted product. Amplified products were sent to MCLAB (384 Oyster Pt Rd. S. San Francisco, CA) for cleanup and Sanger sequencing. For BHF amplification and Sanger sequencing the following primers were used: BHF-Forward aggtcaccacgaagaggaaa, BHF-Reverse ttgccaagtagccttcatca (product length 639bp) or BHFv2-Forward tcaccacgaagaggaaaccg, BHFv2-Reverse cctgttttgtacaagggccg (product length 718bp). PCR was performed on the MJ Research PTC-200 thermal cycler as follows: Initial denaturization at 95oC for 4 minutes, followed by 34 cycles of 95oC for 1 minute, 59o for 1 minute, 72oC for 1 minute, followed by a final extension 72oC for 20 minutes. Amplified products were run on an E-Gel EX 1% agarose gel (Invitrogen) to validate size, then sent to MCLAB for cleanup and sequencing.

Page 4: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

4

Cell-direct qPCR Tissues that participate in allorecognition (vasculature/tunic) were collected from colonies that were placed near each other (“challenged” just before they touched each other) and from naïve unchallenged colonies (Fig. 3B). RT-qPCR reactions for the detection of BHF, sFuHC, mFuHC, β-Actin mRNA within B. schlosseri cells were performed by mixing 8.33 µL 2x Reaction Mix (CellsDirect, PN 46-7200, Invitrogen, USA), 0.33 µL SuperScript III RT Platinum Taq Mix (CellsDirect, PN 46-7200, Invitrogen, USA), 0.75 µL of 20x Evagreen (Biotium Inc., Hayward, CA, USA), 1.1 µL of DEPC treated di-H2O (CellsDirect, PN 46-7200, Invitrogen, USA), 1.5 µL of 10 µM forward and reverse primers for each gene of interest and approximately 500 cells of the corresponding B. schlosseri tissue in 1.5 µL of PBS. The table below lists the primers used for each gene. For each condition, 5 µL of the reaction mix was loaded into a 384 well plate and thermo cycled in a quantitative PCR machine (7900HT Fast Real time PCR System, Applied Biosystems, Warrington, UK). The thermo-cycling profile used was: 55°C for 40 min, 95°C for 6 min; 12 cycles of 95°C for 15 seconds, 65°C for 60 seconds and followed by 38 cycles of 95°C for 15 seconds, 60°C for 60 seconds.

Gene Symbol

Primers used for each gene (F/R)

BHF TCACCACGAAGAGGAAACCG / TTGCCAAGTAGCCTTCATCA

sFuHC AACGATGAATGGGTTCGCGATTTTC / TACTTCAAGTCGACAGTTCCAATCAACGTA

mFuHC GTATGGGACAACACAGGAAATTCTAC / GTGACGTTTTAGTCCATAGGATATCAG

β-Actin CAAGAGATGCAAACCGCTGC / GATCTTCATCGTAGGCGGGG BHF cloning RNA from naive colonies was extracted from samples using the Nucleospin RNA II kit (Machery-Nagel), followed by cDNA preparation using the AMV First Strand Synthesis kit (NEB). RT-PCR was performed using primers designed specifically for the gene of interest (synthesized by eurofins mwg/operon), BHF V2F, 5’-TCACCACGAAGAGGAAACCG-3’, and BHF V2R, 5’-CCTGTTTTGTACAAGGGCCG-3’ according to the protocol in the Protoscript AMV LongAmp Taq RT-PCR kit (NEB). PCR products were then cleaned using the Wizard SV Gel and PCR cleanup kit (Promega). Transformations were performed following the T-Easy Vector System (Promega) protocol, then plated on blue/white selection plates and incubated overnight at 37°C. Positive colonies from each BHF clone were selected for minipreps and grown overnight in LB+ampicillin 37°C. DNA was prepared using the Nucleospin Plasmid Kit (Machery-Nagel), and cut with NOT1 to confirm the presence of

Page 5: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

5

vector and insert. Miniprep DNA was then sent to MCLAB (384 Oyster Point Rd, S.San Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed according to (34). B. schlosseri colonies (<1 month old, 1-2 zooids per colony) were fixed overnight in 4% paraformaldehyde 4°C, gradually dehydrated in ethanol, and treated with Proteinase K 5µg/ml for 20 min at 37°C. Dig-labeled RNA sense and antisense probes were generated using the DIG RNA labeling kit (SP6/T7, Roche) from pooled zooid cDNA from the following primer pair: 5’–AGGTCACCACGAAGAGGAAA–3’ and 5’–TTGCCAAGTAGCCTTCATCA–3’ (639bp). Samples were observed and photographed using a Keyence BZ-9000 microscope. Morpholino oligonucleotide (MO) experiments To study BHF function by MO-mediated knockdown, we followed “best practice” recommendations (35) and used both translation-blocking and splice-inhibiting MOs in order to minimize off-target effects. The following two MO formulations were designed with the assistance of Gene Tools, LLC: 1) A 25bp splice-inhibiting Vivo-Morpholino oligonucleotide (Gene Tools) 5’–

AATCCAAATGCTCTACTTACCGTGT–3’ was designed to target the first exon-intron junction of BHF in order to cause intron 1 retention and nonsense-mediated decay (due to the presence of in-frame premature stop codons).

2) A 25bp translation-blocking Vivo-Morpholino oligonucleotide (Gene Tools) 5’–CACCATCTTGCTTGTCAGTAACGGA–3’ was designed to be complimentary to the 5’ UTR and translational start site of BHF.

Importantly, both MOs have low similarity (<65% identity) to non-BHF sequences in the B. schlosseri genome (best off-target match for translation blocking MO = 15/25bp, and for splice-inhibiting MO = 16/25bp). For all experiments, we injected standard control Vivo-MO (Gene Tools) 5’–CCTCCTACCTCAGTTCCAATTTATA–‘3 at the same concentration as BHF anti-sense MOs. Experimental endpoints were based on the timing of fusion/rejection events in control colonies. Splice-inhibiting morpholinos: A 1mM stock solution of BHF anti-sense MO in phosphate buffer was loaded into a pre-stretched glass needle micropipette (50-60 µm diameter sharp tip) and injected into the ampullae or endostyle sinuses of 3-4 zooids using an air compressed microinjector (PLI-188 Narishige, Japan). To calibrate the MO dosing regimen, we injected 1.5 or 3µl of stock solution into subclones of a lab-reared colony (progeny from a defined cross bred in our mariculture; genotype [5139jL11HMBYSc6ab35]-15; 5-8 zooids each; 2 MO controls and 4 BHF MO). Samples were taken 24, 48, and 72 hours following MO injections, and transcript levels

Page 6: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

6

were assessed by semi-quantitative PCR. Under splice-inhibiting MO treatment, reduced expression of BHF mRNA was evident, and a maximum knockdown of ~66% was recorded at 48 hours (quantified using the integrated density metric in ImageJ) (fig. S20). Baseline transcript levels returned by 72 hours (fig. S20). Based on this preliminary experiment, we injected ~3µl of stock solution on days 0 and +2 (table S9). For each experiment, subclones from colony [5139jL11HMBYSc6ab35]-15, or from colony 5670a (progeny of a wild type colony hatched and raised in our mariculture) were injected with BHF anti-sense MO or control. Following the first injection, a colony allorecognition assay (CAA) was arranged between injected colonies and 3 week old untreated colonies (1 zooid each) hatched in our mariculture from two wild type colonies, 5680 (siblings a-f) and 5681 (siblings a and b) (table S9; fig. S21). Following each injection (days 0 and +2), colonies were also immersed in 4µl BHF MO/control solution for 30 minutes. Fusibility outcomes were monitored for 4 days following treatment (table S9). Translation-blocking morpholinos: A 0.5mM stock solution of BHF anti-sense morpholino in phosphate buffer was loaded into a pre-stretched glass capillary tube and injected intravascularly by means of a single ampulla into a firmly attached colony using an air compressed microinjector (PLI-188 Narishige, Japan). To calibrate the dosing regimen, 1 or 2µl of stock solution was injected per colony, and using semi-quantitative PCR, BHF transcript perturbation was confirmed. We injected ~1 µl of 0.5mM stock solution into a colony of approximately 8-10 zooids on day 0 (table S8; fig. S19). For each of three experiments, genetically identical systems taken from a lab-reared colony (progeny from a defined cross bred in our mariculture; genotype=[5139jL11HMBYSc6ab35]-15) were injected with BHF anti-sense MO, or control (table S8). Following the first injection, colonies were allowed to recover for 24-48 hours and then arranged in a colony allorecognition assay to allow colonies to come into physical contact (table S8). In addition, we administered a second MO injection (BHF-targeted or control) on days +6 and +2 for experimental sets II and III (table S8), respectively, depending on the extent of ampullae contact between colonies. For all experiments, fusibility outcomes were monitored for a minimum of 5 days. Haplotype phasing We developed a heuristic algorithm to haplotype paired-end RNA-Seq data mapped to coding sequences from the draft assembly. As input, the algorithm requires a list of single base variants and a SAM/BAM file consisting of all single or paired-end mapped reads. We employed VarScan 2.2.7 (36) for variant detection, though any suitable approach could be used. Inferred variants satisfying the following criteria were used in our pipeline: (i) minimum depth of 4x, (ii) a p-value < 1.0x10-6 or a frequency of 25-75% and

Page 7: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

7

p-value < 0.05. A minimum depth of 3x was required for homozygous bases. Our phasing method starts by prioritizing physically linked variant pairs for haplotype assembly. Let vi be a variant v (i.e. a SNP base) in the haplotype sequence located at reference sequence position i, and let v’i denote the other variant base at position i. For each gene, every pair of variants (vi, vj) with evidence for physical linkage (i.e. located on the same read or same paired-end read) is catalogued, and the number of reads f(vi, vj) supporting each unique pair is determined. A Chi-square test is then used to test the null hypothesis that vi is not linked to vj, or more specifically, that f(vi, vj) = f(vi, v’j). An alpha level of 0.05 was employed to reject the null hypothesis. Variant pairs are then ranked, traversed, and clustered in order of increasing p-value and decreasing frequency. Because this greedy approach treats all variant pairs independently, it will not exclude individual variants with, for example, one or two significant (α ≤ 0.05) linkage pairs and many insignificant (α > 0.05) linkage pairs. Such variants, which we term “bad hubs”, may arise from sequencing errors, genomic repeats, etc. and should ideally be identified and removed prior to clustering. To address this, we record all pairs P of sequence indices (i, j) where both variants at position i are significantly linked to the same variant at position j (α = 0.05, Chi-square test). Bad hubs are then determined by finding the minimum set of distinct sequence indices M, such that at least one index per pair in P is present in M. All variants corresponding to indices in M, or exclusively present in variant pairs with insufficient statistical support (i.e., P ≥ 0.05), are excluded from phasing, and output as individual alleles. Finally, because each allele may have different coverage patterns, variant clusters are sorted by decreasing size, and the minimum number of clusters C covering the maximum number of non-overlapping variants is determined. Phased alleles present in C, along with the homologous set of phased alleles, are written to output. Software and Java code are available upon request. Benchmark simulation: To verify the correctness of our haplotype phasing method, we generated random 1Mbp sequences, and randomly embedded “ground-truth” heterozygous alleles at different frequencies while simultaneously introducing sequencing errors. RNA-Seq data were emulated by sampling fifty 200bp reads from each of the two haplotypes, equivalent to ~20x physical coverage. To model the effect of systematic sequencing bias and noise on haplotyping accuracy, we uniformly sampled reads from the first haplotype and for each read index i, employed a Gaussian distribution to extract a companion read j [read index i +/- Normal(0, 10bp)] from the second haplotype (fig. S5A). We employed two parameters in our benchmark analysis, heterozygosity rate, defined as the probability of a heterozygous position in the gene sequence, and sequencing error rate, defined as the probability of random base errors introduced into the read pool. We generated 10,000 random genes, each with a different set of values for these two parameters drawn from the range of 0 to 0.1 in 0.001 increments, and evaluated the outcome with respect to three metrics: (i) phasing error =

Page 8: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

8

% incorrectly phased alleles, (ii) mean length of a haplotype block, and (iii) % of variants retained for phasing. Within this range, our phasing algorithm exhibited robust performance, with zero phasing errors, despite sequencing error rates up to 10% (fig. S5B, left panel). Moreover, despite the presence of increasing sequencing errors, consistent performance was observed with regard to mean haplotype block length (fig. S5C, left panel) and % variants retained (fig. S5D, left panel), demonstrating robust error tolerance. We also performed this analysis in the range of 0 to 1, in 0.01 increments, to evaluate the phasing algorithm in the most extreme scenarios. As illustrated in the right panel of fig. S5B, there are virtually no phasing errors below a sequencing error rate of ~40%. As the intrinsic error-rate in HiSeq RNA-Seq data is apparently far below this threshold (i.e., ~1%), these results indicate that our phasing algorithm will assemble accurate haplotype blocks across a broad spectrum of polymorphic sequences at moderate coverage levels (right panels of fig. S5C, D), with haplotype lengths expected to improve with higher depths and/or longer reads, such as paired-end reads, which we did not explicitly model here. Inter-allele comparison and joining of allelic blocks B. schlosseri colonies must share at least one pair of Fu/HC alleles to initiate a fusion reaction upon contact. To computationally identify fusibility candidates, it is therefore necessary to delineate the best possible allelic concordance between colony pairs, for each gene w. In the case of homozygous alleles, the number of mismatches m between colonies is trivially determined. Likewise, when one colony is heterozygous for gene w, it is straightforward to identify the two best matching (smallest number of mismatches) alleles. However, due to limited amount of RNA-Seq coverage and paired-end read lengths, most heterozygous genes can only be partially resolved, resulting in discontinuous haplotype blocks (fig. S4C). Furthermore, when colonies share common heterozygous sequence positions in a gene with discontinuous haplotype blocks, a combinatorial expansion of possible allelic combinations may result. Therefore, we devised an algorithm to efficiently determine a minimum mismatch path between genes with phased, discontinuous alleles, with the assumption that the allelic blocks with minimum distance between two fusing colonies are likely to represent the actual contiguous haplotypes for the B. schlosseri allorecognition determinant (fig. S4D). We implemented an efficient dynamic programming routine, which breaks the problem of finding a minimum mismatch path into a series of sub-problems, that when solved in an iterative manner will produce a globally optimal solution. The overall approach is schematically described in fig. S7 using an anecdotal sequence (fig. S7A). To partition all possible tiling paths into trivial sub-problems, the algorithm traverses through all double heterozygous haplotype blocks, if any, in order of increasing sequence index, and compares all four possible haplotype combinations, recording the number of mismatches

Page 9: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

9

between each pair (fig. S7B). Because haplotype blocks contain more than one variant by definition, they often occur in different lengths between colonies (fig. S7B). Thus, for a given haplotype pair, the longest possible sequence stretch of each haplotype will be scanned for mismatches, whether between each other, or between a haplotype and the corresponding homozygous sequence from the other colony. This will continue until either (i) the algorithm traverses past both haplotypes and encounters a double homozygous sequence index, or (ii) a conflict occurs, defined as the first variant position of a downstream double heterozygous haplotype block (fig. S7C). In either case, the process will repeat at the next double heterozygous haplotype block. This time, a mismatch score is calculated for each of the four haplotype pairs, first by enumerating all mismatches as before, and then by adding the number of mismatches from the haplotype pair from the previous iteration with minimum mismatches and no phasing conflict (the haplotype pair from the previous iteration with minimal score must also be consistent in phase with the current haplotype pair being interrogated; such a problem can arise, for example, when a haplotype block has more than one double heterozygote position). As a result, between the first and current sequence indices, the best four haplotype possibilities will have been identified (fig. S7C). Once all double heterozygote positions have been traversed in order, the best scoring contiguous haplotypes are identified and saved. Because any remaining variant positions or haplotype blocks will only occur in one of the two colonies, those with minimal mismatches are easily identified (fig. S7D), thereby completing the joining of discontinuous haplotype blocks into gene-wide haplotypes. Re-phasing: As illustrated in Fig. S4E, for a given gene w, after performing the above procedure between a given colony AB and all other colonies, more than one “best” allelic phasing can result. Thus, the final step for inter-allele comparison involves identifying the most recurrent haplotype blocks for w for each colony. This is accomplished by first constructing a simple graph where nodes are haplotype blocks and edges are inter-haplotype block associations. Node pairs are then traversed in order of decreasing numbers of connecting edges. After the node pair with the most edges (i.e., most recurrent pair of haplotype blocks) is identified, all competing haplotype possibilities are pruned from the graph. The process is repeated until all haplotypes blocks and isolated variants have been completely connected. After re-phasing of haplotypes is completed for gene w for each colony, final inter-colony allelic distances are computed for w and passed to the fusion/rejection classifier, described below. Fusion/rejection classifier For completeness, we evaluated putative Fu/HC genes using a permissive approach that utilizes known fusion/rejection outcomes to identify the best mismatch threshold along a continuum. Our method, illustrated in Fig. S4H, works as follows: All inter-colony distances for gene w are ranked in ascending order. Each inter-colony distance d is

Page 10: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

10

subsequently assessed for its ability to stratify colony pairs into accurate fusion and rejection groups (<d and ≥d, respectively). For each d, we calculate the true positive rate (TPR), defined as the number of true positives (known fusion pairs in the fusion group) divided by the sum of true positives and false negatives (known fusion pairs in the rejection group), and the false positive rate (FPR), defined as the number of false positives (known rejections in the fusion group) divided by the sum of false positives and true negatives (known rejections in the rejection group). To maximize sensitivity (i.e., TPR) and specificity (i.e., 1 – FPR), the best threshold is identified as the closest two-dimensional point (TPR, FPR) in Euclidean space to a perfect classifier (1, 0). The corresponding Euclidean distance e represents the error associated with each gene’s performance. All computer codes needed to run the fusibility gene identification pipeline are available upon request. Author Contributions: Conception and design of experiments, data analysis, and writing of manuscript, A.V., A.M.N., S.R.Q. and I.L.W.; Genome project, A.V., N.F.N., D.S., A.M.N., D.P., W.K., B.P., H.C. F., G.L.M., K.J.P., K.J.I., D.M.C., L.P., I.L.W. and S.R.Q.; development of haplotype phasing and fusibility prediction analytical tools, and statistical analyses, A.M.N.; animal husbandry, genetic crosses, genotyping of defined lines, RNA-Seq library preparation, BHF cloning, allorecognition assays, PCR and morpholino experiments, K.J.I., K.J.P. and A.V; BHF in situ hybridization and morpholino experiments, D.M.C; library preparation, R.S; cell-direct qPCR, I.K.D.;analysis of BHF protein structure, C.K.; A.V. and A.M.N. contributed equally to this work.

Page 11: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

11

Fig. S1. Fusion/rejection between B. schlosseri colonies is governed by the Fu/HC locus. Colonies that share an allele will fuse (left panel), whereas colonies that do not will reject (right panel). Major anatomical features are indicated (amp=ampullae).

fusionpoint of rejection

zooid

ampbud amp

Page 12: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

12

Fig. S2. sFuHC and mFuHC amino acid polymorphisms. The number of unique residues per sequence position is plotted for sFuHC and mFuHC using RNA-Seq data from all 17 colonies in the exploratory cohort (table S4).

Page 13: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

13

Fig. S3. Identification of a B. schlosseri histocompatibility haplotype and allorecognition initiation factor. The basic computational workflow we used to interrogate known and predicted B. schlosseri coding sequences for candidate Fu/HC genes. We used BWA (31) for mapping paired-end reads (default parameters), and VarScan (36) for variant prediction. See Fig. S4 for an anecdotal overview of the remaining steps of the workflow. ORFs, open reading frames.

Page 14: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

14

Fig. S4. Computational pipeline for identification of candidate Fu/HC loci, illustrated with a hypothetical gene. A, Reads from colony AB (heterozygous for Fu/HC) are mapped to gene w. Sequence variants are indicated by yellow squares. B, Aligned reads are “phased” into clusters based on co-occurrence of variants. C, Phased reads are collapsed into potential maternal and paternal alleles. D, Predicted alleles of w from colony AB are compared with colony AX to identify the minimum inter-colony distance for w, defined as the set of alleles exhibiting the fewest character mismatches between colonies (i.e., Allele 1’ and Allele 2’’). E, The previous step is repeated for all other colonies known to fuse with colony AB. F, Allelic blocks that co-occur most frequently are joined into contiguous haplotypes. These correspond to our best estimate, given the data, of the two complete alleles of w from colony AB, if w is a fusibility gene. G, Repeat steps (D) to (F) for all colony pairs with known fusibility outcomes. Inter-colony allelic distances are then determined and recorded using joined alleles. H, Inter-colony distances are normalized by the number of characters compared, yielding a mismatch frequency threshold. This threshold is optimized using known fusion/rejection outcomes to determine a final fusibility score for gene w, represented as a classification error.

Page 15: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

15

Fig. S5. Benchmarking of haplotype phasing method via sequence simulation. A, Representative coverage plots for a simulated gene with maternal and paternal haplotypes. The mean coverage for the gene is 20x. B-D, Results from applying our phasing algorithm to simulated haplotypes under different heterozygozity rates and sequencing error rates. B, Phasing error, defined as the error associated with calling correct haplotypes. C, Mean length of phased haplotypes per simulated gene. D, Fraction of known variants retained in each phased gene. White points denote genes that could not be phased due to insufficient variant abundance and/or high error rate.

Page 16: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

16

Fig. S6. Haplotype phasing of sFuHC is consistent with Sanger-sequencing. Maximum parsimony tree showing that PCR-amplified haplotypes validated by Sanger sequencing are comparable to haplotypes predicted from RNA-Seq data using our phasing approach, with minimum depths of 3x and 4x for calling homozygous and heterozygous bases, respectively. These depths were used for all fusibility analyses. Lower depths yielded weaker concordance (data not shown). sFuHC sequences from colonies with AB or BB Fu/HC genotypes were used to build the dendrogram with MEGA5 (37).

Page 17: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

17

Fig. S7. Inter-allele comparison and joining of haplotype blocks. Schematic illustrating algorithm used to identify best allelic series between two colonies with haplotype fragments and shared variant positions. A, Read fragments from two colonies are aligned to gene w. Variant positions exclusive to one colony or shared by both colonies are indicated. B, Reads with heterozygous positions shared by both colonies are subdivided, and mismatches are enumerated incrementally for each of four haplotype pairs (Colony 1 allele 1 versus Colony 2 allele 1, Colony 1 allele 1 versus Colony 2 allele 2, Colony 1 allele 2 versus Colony 2 allele 1, and Colony 1 allele 2 versus Colony 2 allele 2). C, The lowest mismatch path is identified. D, Any remaining mismatches are counted (i.e. between homozygous regions and between regions with variant positions in only one colony).

Page 18: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

18

Fig. S8. Pedigree of defined and wildtype Fu/HC lines, and colonies sequenced. All crosses made by our lab over the last 2.5 decades to precisely identify the Fu/HC locus (approximate years are indicated). The pedigree includes homozygous (AA or BB) and heterozygous (AB or AX) breeding lines that were developed in our mariculture facility, as well as wild type colonies. Colonies sequenced in this work are depicted as triangles.

1985

1990

1995

2000-

2010

AA

BB

AB

A_

?

WT

Seq.

Lab

5x

WT C

olo

ny

Fu

/HC

ge

no

typ

e

Page 19: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

19

Fig. S9. Fusibility classification errors of sFuHC and mFuHC, related to Fig. 2A. The optimal mismatch frequency threshold for sFuHC is >2.11 (TPR=1.0, FPR=0.14), as indicated by the dotted line. For mFuHC, the best mismatch threshold is attained at >1.77 (TPR=0.82, FPR=0.43). Colony pairs in pink represent defined lines homozygous for the A allele; such pairs should have no mismatches.

Page 20: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

20

Fig. S10. BHF is significantly more highly expressed than either sFuHC or mFuHC. A, Expression is plotted for BHF, sFuHC, and mFuHC using transcriptome data from all 21 colonies sequenced by RNA-Seq (table S4). BHF is significantly more highly expressed than either sFuHc or mFuHC (Mann Whitney test; two-tailed p-value < 0.0001). Values are presented as mean expression +/- SEM. RPKM, reads per kb of exon model per Mb mapped reads. B, Representative RNA-Seq coverage plots for each gene (colony 944).

BHF

sFuHC

mFuHC

8163264

128256512

1024204840968192

Gene

RPK

M (L

og2)

BHF

sFuHC

mFuHC

Log1

0

cove

rage

Log1

0

cove

rage

Log1

0

cove

rage

Length (bp)

Length (bp)

Length (bp)

0

2.6

0

3.7

756

1,666

0

2.9

1,5940

0

0

A B

Page 21: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

21

Fig. S11. BHF protein sequence analysis. The lack of any structurally or functionally characterized homologs of BHF allows only a qualitative analysis of its sequence. The upper panel highlights major characteristics of the sequence while the lower summarizes the outcomes of several independent servers for disorder prediction. Blue, red and green bars indicate positive (lysine, arginine, and histidine), negative (aspartic and glutamic acids), and polar (asparagine, glutamine, and serine) residues respectively. Black triangles indicate glycine or proline residues, and the orange stars indicate cysteine residues. Overall the protein has a +21 net positive charge. The relative scarcity of core forming hydrophobic residues as well as the structural destabilizing effect of proline, glycine, and high net charge suggest that the protein is at least partially unstructured. To quantify this hypothesis, we submitted BHF sequence to the DisMeta meta-server (38), which provides a consensus disorder prediction based on several independent methods. The lower panel summarizes these predictions, indicating that more than half of the protein (residues score above four) is disordered.

Page 22: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

22

Fig. S12. BHF has homologs in other tunicate species, and has high sequence identity among colonial tunicates. A multiple sequence alignment of BHF and its homologs in solitary and colonial tunicates (left), and sequence similarity tree comparing BHF with its homologs (right). By BLAST analysis, BHF is distantly homologous to putative protein-encoding genes in three solitary tunicate species, two of which were predicted from NCBI ESTs: Molgula tectiformis [gi 67794433; tblastx e-value = 3e-10] and Halocynthia roretzi [gi 117783193 (tblastx e-value = 2e-07), which was combined due to sequence overlap with gi 117750037 (tblastx e-value = 0.001)]. The third putative gene is annotated as a ‘hypothetical protein’ in NCBI: Ciona intestinalis [gi 198417896; tblastx e-value = 2e-04]. No other homologous sequences were found in NCBI. BHF homologs in the alignment from two colonial species (i.e., Botrylloides spp. and Diplosoma spp.) were amplified using BHF-specific primers (15) and Sanger-sequenced (ambiguous/low quality subsequences were truncated from either side). These two sequences thus represent incomplete gene products. The multiple sequence alignment was created using MUSCLE (39) and rendered with JalView (40, 41). For each alignment column, the majority character, if any, is colored. All gaps were introduced to optimize the multiple sequence alignment. ‘X’ denotes missing sequence content. The similarity tree was constructed using the average % identity method in JalView (40, 41).

Halocynthia roretzi

Molgula tectiformis

Ciona intestinalis

Diplosoma spp

Botryllus schlosseri

Botrylloides spp Colonial

Solitary

BHF Homology Tree

Lifestyle

Page 23: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

23

Fig. S13. Detailed pedigree of all colonies analyzed in this work. Same as Fig. S8, but with BHF genotypes derived for all 23 colonies sequenced, including exploratory and validation cohorts analyzed by RNA-Seq (table S4), and two Sanger-sequenced colonies with the AA genotype (AA 5 and AA 6). Abbreviated colony names are indicated beneath sequenced genotypes (table S4).

1985

1990

1995

2000-

2010

AA

BB

AB

A_

?

WT

Seq.

Co

lon

yF

u/H

C g

en

oty

pe

Lab

WT

AJ AD

AB

AD

BI

D_

DE

F_

I_

ADAG AH AI

BC BD BD CF

Hm9

Sc6ab15

31 40

4

944

AX

Y P Sc109e X

2362c 5606b Sc32e

AF AA

AA

AA

AA

AA

AA

AB

BB

AB

BB

AA 2

AA 5

AA 1

AA 3

AA 4AA 6

Page 24: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

24

Fig. S14. All BHF alleles identified among 23 B. schlosseri colonies. A, Phylogenetic tree depicting sequence relationships among all identified BHF alleles from 23 colonies (Fig. 3A, table S4 and fig. S13). Sequence evidence for each allele in the tree is shown as a heat map, and includes RNA-Seq (table S4), Sanger-sequencing of PCR amplicons, and cloned alleles. B, Representative sequences for all identified alleles (longest sequence shown). Gaps indicate missing data (i.e., insufficient RNA-Seq coverage) or predicted variants discarded by the phasing algorithm due to high error rate. Importantly, instances of the latter fall within sequence regions with very low RNA-Seq coverage. The tree and allelic sequence schematic were generated with JalView (40, 41).

2362c (G allele)

AX (F allele)

X (F allele)

P (D allele)

Sc32e (A allele)

5606b (H allele)

Hm9 (J allele)

40 (D allele)

2362c (A allele)

Sc6ab (E allele)

Sc6ab (D allele)

Hm9 (A allele)

Y (C allele)

X (C allele)

Sc32e (I allele)

4 (I allele)

P (B allele)

Sc109e (D allele)

40 (A allele)

Y (B allele)

944 (D allele)

31 (D allele)

5606b (A allele)

Sc109e (B allele)

15 (A allele)

15 (B allele)

944 (A allele)

4 (B allele)

31 (A allele)

AB (B allele)

BB (B allele)

AX (A allele)

AB (A allele)

AA6 (A allele)

AA5 (A allele)

AA4 (A allele)

AA3 (A allele)

AA2 (A allele)

AA1 (A allele)

A BSanger

Cloned

RNA-Seq

Page 25: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

25

Fig. S15. Fusibility results for BHF among all 21 colonies sequenced by RNA-Seq. Data shown represent results from analyzing canonical transcripts (nucleotide level with all exons), and are identical in format to Fig. 2B. Fusion/rejection pairs analyzed during the validation phase are indicated in green (table S4). Of all fusion pairs tested, we found only 1 pair (15 vs. 4) with mismatches at the nucleotide level (both mismatches are silent at the protein level).

Fusibil

ity Outc

omes

Mismatc

h Freq

. (%)

No. of

Mismatc

hes

Bases

Compa

red

Colony

1

Colony

2

FusionRejection

0 0 679 AB AA 20 0 347 AB AX0 0 392 AB BB0 0 677 AB AA 30 0 341 AB AA 10 0 677 AB AA 40 0 468 15 Sc109e0 0 468 15 9440 0 467 15 310 0 347 AA 2 AX0 0 752 AA 2 AA 30 0 341 AA 2 AA 10 0 753 AA 2 AA 40 0 345 AX AA 30 0 341 AX AA 10 0 345 AX AA 40 0 725 40 Sc109e0 0 341 Sc6ab 9440 0 453 Y X0 0 728 Sc109e 9440 0 689 Sc109e 310 0 340 AA 3 AA 10 0 752 AA 3 AA 40 0 340 AA 1 AA 40 0 378 2362c Hm90 0 382 Hm9 Sc32e0 0 689 944 31

0.45 2 448 15 40.47 3 639 Sc109e 5606b0.88 3 341 15 Sc6ab1.12 5 448 4 9441.12 5 448 4 311.77 8 453 X P2.02 7 346 AX BB2.65 9 340 BB AA 12.81 13 463 AA 2 BB2.82 13 461 BB AA 32.82 13 461 BB AA 4

Validation pair

Page 26: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

26

Fig. S16. Image of a fusion event between colony 31 and colony 944, as predicted by inferred BHF genotypes.

Page 27: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

27

Fig. S17. BHF transcripts analyzed by RNA-Seq are highly expressed in the B. schlosseri vasculature, and enriched in comparison to the endostyle. RPKM, reads per kb of exon model per million mapped reads.

Vasculature Endostyle0

50

100

150

Tissue library

Expr

essi

on (R

PKM

)

Page 28: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

28

Fig. S18. Tissue-specific expression of BHF assessed by PCR and Sanger-sequencing. Representative gel showing PCR-amplification of BHF, compared to Actin, in various B. schlosseri tissues. Importantly, unlike in Fig. S17, here the endostyle samples include both endostyle and blood from surrounding sinuses.

Ladder

BH

F b

lood

Actin b

lood

BH

F w

hole

syste

m

Actin w

hole

syste

m

BH

F e

ndosty

le

Actin e

ndosty

le

BH

F b

ud

Actin b

ud

Page 29: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

29

Fig. S19. Representative images documenting BHF translation-blocking morpholino experiments. A, A colony allorecognition assay (CAA) was arranged between an isogenic pair 2 days following microinjection of control morpholino oligonucleotides (MOs). B-C, Four days following microinjection of control MOs, the colonies established a common vasculature, an expected outcome for an isogenic (i.e., histocompatible) pair. D, Another CAA was set between isogenic colonies 2 days after microinjection with BHF MOs. E-F, Despite physical contact between blood vessels (ampullae), no vasculature fusion was observed over a period up to 4 days following BHF MO injection. G, In a third example, a CAA was set between an isogenic pair 3 days following injection with BHF MOs. G-I, Despite establishing physical blood vessel contact, no fusion occurred over a monitoring period of up to 5 days following BHF MO administration. Amp, ampullae; +xd, number of days following treatment; scale bars represent 1mm.

Page 30: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

30

Fig. S20. Knockdown of BHF mRNA by splice-inhibiting morpholinos. Representative semi-quantitative PCR results (left) and gel quantification using the density integration metric in ImageJ (right). MO, BHF morpholino; L, ladder; a, Actin; b, BHF.

Control M

O24

hrs48

hrs72

hrs0.00

0.25

0.50

0.75

1.00

Rel

ativ

e B

HF

expr

essi

onBHF-targeted MO(hrs post-injection)

ControlMO

24 hrs

MO

48 hrs

MO

72 hrs

Page 31: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

31

Fig. S21. Representative images documenting BHF splice-inhibiting morpholino experiments. A-B, A colony allorecognition assay (CAA) was arranged between two rejecting colonies. Control morpholino oligonucleotides (MOs) were injected the same day. Points of rejection (arrows) are shown +2 days (A) and +3 days (B) following CAA setup. C, A CAA was arranged between 2 histcompatible colonies. Three days following control MO microinjection, the colonies fused, establishing a common vasculature (arrow). D-F, A three-way CAA was arranged. Despite establishing physical contact between ampullae, no reaction was observed at +3 days (D-E) or +4 days (F) after initial microinjection with BHF splice-inhibiting MOs. G-I, Another 3-way CAA example, in which physical contacts were established (arrows, G), but no reactions were observed. Images shown are from +2 days (G-H) and +3 days (I) following microinjections with BHF splice-inhibiting MOs. +xd, number of days following first treatment; Colonies were boosted with BHF MO/control treatment +2 days following first injection (see table S9). Scale bars represent 1mm.

Page 32: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

32

Supplementary Tables Table S1. PCR primers

Primer name Sequence Starting base in cFuHC

WholeIg Forward AACGATGAATGGGTTCGCGATTTTC 8 Fu/HC 14 to 17 Reverse TACTTCAAGTCGACAGTTCCAATCAACGTA 1510 Fu/HC 14 to 17 Forward TACGTTGATTGGAACTGTCGACTTGAAGTA 1481 WholeIg Reverse AAGCTTCTTTCAGAGCTACTATCTTCA 2980 IgF 1 Reverse GTACCTCAAGTACCACACGCCCCAAT 2433 IgF 1 Forward AAGCTTCTTTCAGAGCTACTATCTTCA 2949 STS1 Forward GTATGGGACAACACAGGAAATTCTAC 1761 STS1 Reverse GTGACGTTTTAGTCCATAGGATATCAG 1953

*Multiple combinations of forward and reverse primers were used

Page 33: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

33

Table S2. sFuHC gene structure. To determine exon boundaries and introns, transcripts were aligned to the fosmid sequence originally used to identify cFuHC (12). In some cases, consecutive exon boundaries slightly overlap due to alignment ambiguity, thus the sum of exon lengths is longer than the spliced transcript itself. For each exon n, the intron length shown corresponds to the intron between exons n and n + 1, as determined using the fosmid sequence. Splice donor and acceptor sites were predicted from the fosmid using the online tools SplicePort (SP; http://spliceport.cs.umd.edu) and NetGene2 (NG2; http://www.cbs.dtu.dk/services/NetGene2), with default parameters. For SplicePort, ‘Y’ indicates a positive prediction, and for NetGene2, the number of plusses denotes the number of species (of 3; H. sapiens, C. elegans, and A. thaliana) with a positive prediction. All positive predictions fall within a 2bp window of predicted exon boundaries. Exon No.

cDNA start

cDNA end

Fosmid start

Fosmid end

Intron length

Exon length

Splice Acceptor Splice Donor

SP NG2 SP NG2 1 1 70 94,574 94,643 1,671 70 Y +++ 2 68 195 96,314 96,441 59 128 Y +++ 3 194 256 96,500 96,562 1,703 63 + ++ 4 257 356 98,265 98,364 1,053 100 Y +++ Y ++ 5 357 524 99,417 99,584 2,275 168 Y ++ Y +++ 6 520 638 101,859 101,977 53 119 + 7 638 726 102,030 102,118 1,256 89 Y ++ Y + 8 726 811 103,374 103,458 381 86 Y ++ +++ 9 808 900 103,839 103,931 303 93 Y ++ Y +++

10 900 1,061 104,234 104,395 2,057 162 + Y +++ 11 1,061 1,117 106,452 106,508 308 57 Y +++ +++ 12 1,118 1,208 106,816 106,906 1,368 91 +++ Y +++ 13 1,203 1,370 108,274 108,441 288 168 ++ 14 1,368 1,495 108,729 108,856 641 128 Y +++ Y + 15 1,496 1,581 109,497 109,582 2,288 86 Y + Y +++ 16 1,580 1,668 111,870 111,958 1,493 89 Y +++ Y +++ 17 1,669 1,927 113,451 113,707 257 Y +++

Page 34: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

34

Table S3. mFuHC gene structure. Exon boundaries, intron lengths, and splice donor/acceptor sites were determined as described in table S2. Exon No.

cDNA start

cDNA end

Fosmid start

Fosmid end

Intron length

Exon length

Splice Acceptor Spice Donor

SP NG2 SP NG2 1 1 68 113,957 114,024 66 68 +++ 2 67 191 114,090 114,214 696 125 Y ++ Y ++ 3 189 259 114,910 114,980 1,785 71 Y +++ Y +++ 4 255 431 116,765 116,942 275 177 Y +++ 5 431 583 117,217 117,369 879 153 Y + ++ 6 582 731 118,248 118,397 55 150 +++ 7 728 794 118,452 118,519 1,632 67 +++ 8 795 938 120,151 120,294 862 144 Y +++ 9 932 1,082 121,156 121,306 48 151 Y +++ Y +++

10 1,072 1,122 121,354 121,403 1,362 51 11 1,115 1,281 122,765 122,931 2,788 167 Y +++ 12 1,278 1,308 125,719 125,749 218 31 Y + Y +++ 13 1,307 1,358 125,967 126,018 1,184 52 Y +++ Y +++ 14 1,356 1,633 127,202 127,479 278 ++

Page 35: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

35

Table S4. Colonies sequenced and RNA-Seq statistics. Seventeen colonies were used to identify novel Fu/HC candidates and constitute the ‘Exploratory cohort’. The ‘Validation cohort’ consists of four additional colonies, sc109e, sc6ab, 5606b, and 40. For lineage relationships among all 21 sequenced colonies, see fig. S13. a The Illumina HiSeq 2000 platform (2 x 100bp paired end reads) was used to generate RNA-Seq data for all ‘Exploratory’colonies and three ‘Validation’ colonies. The remaining ‘Validation’ colony (sc6ab) was sequenced using an Illumina MiSeq (2 x 150bp paired end reads). b All unfiltered reads. c To increase confidence in variant calling from B. schlosseri predicted genes, many of which are novel, only properly paired reads were used for the analyses described herein (SAMtools (42) bitwise flag = 0x0002).

Colony Name

Colony Name (Detailed) Experimental Cohorta

Total Readsb

(Properly Paired Readsc)

No. Proper Pairs

Mapped to sFuHC

No. Proper Pairs

Mapped to

mFuHC

No. Proper Pairs

Mapped to BHF

Progeny of wild type colonies or second generation in the mariculture

P 5326P Exploratory 109,901,134 (14,653,296)

868 1,794 13,616

X 5326X Exploratory 104,038,064 (9,976,882)

686 900 3,828

Y 5326Y Exploratory 93,378,336 (10,102,816)

174 656 6,424

2362c 2362c Exploratory 6,133,456 (599,438)

74 30 224

Sc32e Sc32e Exploratory 4,540,324 (424,732)

22 66 408

Sc109e Sc109.e Validation 137,938,102 (11,902,270) 1004 238 1858

Sc6ab Sc6a-b Validation 2,845,170 (445,302)

32 34 136

5606b 5606b Validation 67,847,740 (2,991,552)

170 0 834

Progeny of defined crosses, bred in our mariculture

944 [944axBYd196]-4 Exploratory 152,104,960 (16,588,484) 1,366 3,020 10,446

31 [L11HMBY15xSc6a-b]-31 Exploratory 71,524,430 (7,387,480)

436 630 1,774

4 [Sc6ab3143j46xL11HMBY4-15-10]-4 Exploratory 53,835,208 (2,918,010)

180 192 1,060

15 [L11jHM9aBYd196-4]-15 Exploratory 25,570,902 (2,581,304)

206 180 1,522

Hm9 [HM9aYw1225c]-23.20 Exploratory 9,162,170

(1,057,650) 32 306 4,062

Page 36: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

36

40 [L11HMBY15xSc6a-b]-40.1 Validation 115,037,646 (21,453,048)

418 368 682

Progeny of defined Fu/HC allele crosses, homozygous (AA or BB) or heterozygous (AB or AX)

AA 1 Yw764 Exploratory 9,439,982 (933,116)

32 194 258

AA 2 Yw1258 Exploratory 80,237,630

(10,289,660) 283 1,212 3,807

AA 3 Yw(BY)3 Exploratory 66,533,126 (8,249,962)

146 636 1,558

AA 4 Yw1023 Exploratory 61,605,974 (9,380,758)

979 1,154 3,269

AB BYd129 Exploratory 25,610,052 (3,184,306)

144 402 916

AX 745u-n Exploratory 4,118,402 (480,490)

20 48 162

BB BYBYw557 Exploratory 10,510,234 (1,077,364) 98 120 1,590

Page 37: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

37

Table S5. B. schlosseri gene sequences analyzed in this work, and fosmid sequence. In all, 7,523 genes were analyzed, all of which have at least 1 nucleotide mismatch (i.e. polymorphism) between colonies in our exploratory cohort (table S4), and RNA sequence coverage for at least 6 fusion pairs and 6 rejection pairs. Table S5 is available on Science Online as an Excel spreadsheet. Table S6. Details of analyzed B. schlosseri genes, including concordance with fusibility outcomes and genetically defined lines. For each row, the gene identifier is given, along with its best blastp match, if any, to H. sapiens and M. musculus (e-value < 1e-10). In addition, the predicted canonical transcript length (Gene Length), chromosome assignment (Chr.), and B. schlosseri contig identifier are provided. Additional gene details are provided elsewhere (13). Output from the fusibility classifier (figs. S3-S7, 15) is interpreted as follows: Classification Error: difference between a gene’s performance and perfect concordance between alleles and fusibility outcomes (see 15); True Positive Rate: the number of fusion pairs correctly predicted divided by all fusion pairs considered; False Positive Rate: the number of rejection pairs incorrectly assigned divided by all rejection pairs considered; Accuracy: the number of correct assignments divided by all assignments; Best Mismatch Threshold: The frequency of mismatches (number of mismatches scaled by protein length) that best stratifies known fusion/rejection outcomes (used to calculate all other classifier metrics); Total Pairs Analyzed: all colony pairs with known fusibility outcomes analyzed (colony pairs with RNA-Seq sequence coverage of at least 20 sites in common); No. Fusions: number of fusion pairs analyzed; No. Rejections: number of rejection pairs analyzed. Additional columns: Defined Lines Error: the fraction of fusibility relationships among defined lines that are correct, provided that histocompatible colony pairs have zero mismatches and histoincompatible colony pairs have at least 1 mismatch; Homozygous Error: The fraction of homozygous genetically defined lines without any heterozygous sites (total of 5 colonies: 4 AA and 1 BB); Botryllus Rejection Library: The presence (1) or absence (0) of a given gene in an EST subtraction library of genes up-regulated during the B. schlosseri rejection response (blastn e-value <1e-20; alignment length of at least 200bp) (17). Importantly, all classifier results shown are related to colonies from the exploratory cohort (table S4). All gene sequences are provided in table S5. Table S6 is available on Science Online as an Excel spreadsheet.

Page 38: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

38

Table S7. BHF gene structure. Exon boundaries, intron lengths, and splice donor/acceptor sites were determined as described in table S2. This gene model is consistent with reference-guided transcript assemblies generated by Cufflinks (32) and the gene model predicted by Augustus (33). All splice junctions were validated by Sanger-sequencing. Exon No.

cDNA start

cDNA end

Fosmid start

Fosmid end

Intron length

Exon length

Splice Acceptor Splice Donor SP NG2 SP NG2

1 1 166 28,079 28,244 608 166 Y +++ 2 166 639 28,852 29,325 2,798 474 Y +++ Y +++ 3 638 831 32,123 32,316 194 Y ++

Page 39: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

39

Table S8. Detailed timeline for three BHF translation-blocking morpholino experiments. I, date of morpholino injection; I*, date of 2nd morpholino injection; CAA, Colony Allorecognition Assay setup; NC, no contact; NR, no reaction/no change; A to A, ampullae to ampullae contact; T to A, tunic to ampullae contact; T to T tunic to tunic contact; Fuse, vasculature fusion between colonies. E, end of observational period. Experimental

Group Days following initial morpholino injection

0 1 2 3 4 5 6 7 8 9 10 11 12

Set I: Control I CAA A to

A Fuse E

Set I: BHF I CAA T to A

NR E

Set I: BHF I CAA A to

A NR E

Set II: Control

I CAA NC NC A to

A Fuse Fuse Fuse Fuse Fuse Fuse E

Set II: BHF I CAA NC A to

A NR

NR I*

A to T

NR A to

A NR NR E

Set II: BHF I CAA NC NC NC NC I*

T to T

NR T to A

NR NR E

Set III: Control

I CAA NC I*

NC A to

A NR NR E

Set III: Control I CAA

NC I*

A to A

Fuse Fuse Fuse E

Set III: BHF I CAA T to T

I* NR NR NR NR E

Set III: BHF I CAA NC I*

NC NC A to

A NR E

Page 40: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

40

Table S9. Detailed timeline for BHF splice-inhibiting morpholino experiments. I, date of morpholino injection; I*, date of 2nd morpholino injection; ND, not determined (no observation); CAA, Colony Allorecognition Assay setup; A to A, ampullae to ampullae contact; A to T, ampullae to tunic contact; NR, no reaction/no change; Fuse, vasculature fusion between colonies; Reject, points of rejection; E, end of observational period.

Experimental Group

Allorecognition Assay

Days following initial morpholino injection 0 1 2 3 4 5

Control 15.36 vs 5680a I + CAA ND Reject

I* Reject Reject E

Control 15.36 vs 5680b I + CAA ND A to A

I* Fuse Fuse E

Control 15.36 vs 5680c I + CAA ND A to A

I* Fuse Fuse E

BHF 15.36 vs 5680d I + CAA ND A to A

I* NR NR E

BHF 15.36 vs 5680e I + CAA ND A to A

I* NR NR E

BHF 15.36 vs 5680f I + CAA ND A to T

I* NR NR E

BHF 5670a vs 5681a I + CAA ND A to A

I* NR NR E

BHF 5670a vs 5681b I + CAA ND A to A

I* NR NR E

Page 41: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

   

41

Movie S1. Fusion and rejection under morpholino control treatment. Colony 1 was injected with ~3µl standard control Vivo-Morpholino oligonucleotides and a colony allorecognition assay (CAA) was arranged between the treated colony and 2 young colonies (3 weeks old; colony 2 and 3). Two days following CAA setup and initial treatment, 3 rejection points (brown/black spots) appeared on the ampullae border between colonies 1 and 2. At +3 days, a vasculature fusion was established between colonies 1 and 3. This movie, taken at +3 days, contrasts the common blood flow shared by colonies with a fused vasculature (colonies 1 and 3) with the isolated blood systems and points of rejection that characterize histoincompatible colonies (colonies 1 and 2). Movie S2. Unreactive colonies treated with BHF splice-inhibiting morpholinos. Colony 1 was injected with 3µl BHF splice-inhibiting Vivo-Morpholino oligonucleotides. That same day, a colony allorecognition assay was arranged between colony 1 and two young colonies, both progeny of the same wildtype parent (3 weeks old). Despite establishing physical ampullae contact, all colonies remained unreactive, even five days following initial treatment. This movie was taken at +3 days. Notice two orange ampullae protruding from colony 1 that have interfaced with colony 2. Despite close physical contact, no fusion/rejection reaction occurred.

Page 42: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

42

References and Notes 1. A. Nakashima, T. Shima, K. Inada, M. Ito, S. Saito, The balance of the immune system

between T cells and NK cells in miscarriage. Am. J. Reprod. Immunol. 67, 304 (2012).doi:10.1111/j.1600-0897.2012.01115.x Medline

2. G. Girardi, Z. Prohászka, R. Bulla, F. Tedesco, S. Scherjon, Complement activation in animal and human pregnancies as a model for immunological recognition. Mol. Immunol. 48, 1621 (2011).doi:10.1016/j.molimm.2011.04.011 Medline

3. M. Colonna, S. Jonjic, C. Watzl, Natural killer cells: Fighting viruses and much more. Nat. Immunol. 12, 107 (2011).doi:10.1038/ni0211-107 Medline

4. D. F. LaRosa, A. H. Rahman, L. A. Turka, The innate immune system in allograft rejection and tolerance. J. Immunol. 178, 7503 (2007). Medline

5. F. Delsuc, H. Brinkmann, D. Chourrout, H. Philippe, Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439, 965 (2006).doi:10.1038/nature04336 Medline

6. H. Oka, H. Watanabe, Colony specificity in compound ascidians as tested by fusion experiments (a preliminary report). Proc. Jpn. Acad. 33, 657 (1957).

7. H. Oka, H. Watanabe, Problems of colony specificity in compound ascidians. Bull. Mar. Biol. Stat. Asamushi. 10, 153 (1960).

8. A. Sabbadin, Le basi geneticha della capacita di fusion fra colonies in B. schlosseri (Ascidiacea). Rend. Accad. Naz. Lincei. Ser. 32, 1031 (1962).

9. V. L. Scofield, J. M. Schlumpberger, L. A. West, I. L. Weissman, Protochordate allorecognition is controlled by a MHC-like gene system. Nature 295, 499 (1982).doi:10.1038/295499a0 Medline

10. A. W. De Tomaso, Y. Saito, K. J. Ishizuka, K. J. Palmeri, I. L. Weissman, Mapping the genome of a model protochordate. I. A low resolution genetic map encompassing the fusion/histocompatibility (Fu/HC) locus of Botryllus schlosseri. Genetics 149, 277 (1998). Medline

11. A. W. De Tomaso, I. L. Weissman, Initial characterization of a protochordate histocompatibility locus. Immunogenetics 55, 480 (2003).doi:10.1007/s00251-003-0612-7 Medline

12. A. W. De Tomaso et al., Isolation and characterization of a protochordate histocompatibility locus. Nature 438, 454 (2005).doi:10.1038/nature04150 Medline

13. A. Voskoboynik et al., The Botryllus schlosseri genome: a genetic toolkit for the investigation of regeneration and immune system evolution. eLIFE 2, e00569 (2013). doi: 10.7554/eLife.00569

14. I. Letunic, T. Doerks, P. Bork, SMART 7: Recent updates to the protein domain annotation resource. Nucleic Acids Res. 40, (D1), D302 (2012).doi:10.1093/nar/gkr931 Medline

15. Materials and methods are available as supplementary material on Science Online.

Page 43: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

43

16. B. Rinkevich, J. Douek, C. Rabinowitz, G. Paz, The candidate Fu/HC gene in Botryllusschlosseri (Urochordata) and ascidians’ historecognition—An oxymoron? Dev. Comp. Immunol. 36, 718 (2012).doi:10.1016/j.dci.2011.10.015 Medline

17. M. Oren, J. Douek, Z. Fishelson, B. Rinkevich, Identification of immune-relevant genes in histoincompatible rejecting colonies of the tunicate Botryllus schlosseri. Dev. Comp. Immunol. 31, 889 (2007).doi:10.1016/j.dci.2006.12.009 Medline

18. The MHC sequencing consortium, Complete sequence and gene map of a human major histocompatibility complex. The MHC sequencing consortium. Nature 401, 921 (1999).doi:10.1038/44853 Medline

19. M. Hirano, S. Das, P. Guo, M. D. Cooper, The evolution of adaptive immunity in vertebrates. Adv. Immunol. 109, 125 (2011).doi:10.1016/B978-0-12-387664-5.00004-2 Medline

20. L. J. Dishaw, G. W. Litman, Invertebrate allorecognition: The origins of histocompatibility. Curr. Biol. 19, R286 (2009).doi:10.1016/j.cub.2009.02.035 Medline

21. D. S. Stoner, I. L. Weissman, Somatic and germ cell parasitism in a colonial ascidian: Possible role for a highly polymorphic allorecognition system. Proc. Natl. Acad. Sci. U.S.A. 93, 15254 (1996).doi:10.1073/pnas.93.26.15254 Medline

22. D. S. Stoner, B. Rinkevich, I. L. Weissman, Heritable germ and somatic cell lineage competitions in chimeric colonial protochordates. Proc. Natl. Acad. Sci. U.S.A. 96, 9148 (1999).doi:10.1073/pnas.96.16.9148 Medline

23. D. J. Laird, A. W. De Tomaso, I. L. Weissman, Stem cells are units of natural selection in a colonial ascidian. Cell 123, 1351 (2005).doi:10.1016/j.cell.2005.10.026 Medline

24. A. Voskoboynik et al., Identification of the endostyle as a stem cell niche in a colonial chordate. Cell Stem Cell 3, 456 (2008).doi:10.1016/j.stem.2008.07.023 Medline

25. D. H. Sachs, M. Sykes, T. Kawai, A. B. Cosimi, Immuno-intervention for the induction of transplantation tolerance through mixed chimerism. Semin. Immunol. 23, 165 (2011).doi:10.1016/j.smim.2011.07.001 Medline

26. H. C. Boyd, S. K. Brown, J. A. Harp, I. L. Weissman, Growth and sexual maturation of laboratory-cultured Monterey B. schlosseri. Biol. Bull. 170, 91 (1986). doi:10.2307/1541383

27. D. R. Zerbino, Using the Velvet de novo assembler for short-read sequencing technologies. Curr. Protoc. Bioinformatics Chapter 11, 11.5.1, 12 (2010). Medline

28. E. W. Myers et al., A whole-genome assembly of Drosophila. Science 287, 2196 (2000).doi:10.1126/science.287.5461.2196 Medline

29. H. C. Fan, J. Wang, A. Potanina, S. R. Quake, Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51 (2011).doi:10.1038/nbt.1739 Medline

Page 44: Supplementary Materials for...Francisco, CA) for Sanger sequencing using SP6, M13 and BHF V2 primers. Whole-mount in situ hybridization Whole-Mount in situ hybridization was performed

44

30. X. Xu et al., The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line. Nat. Biotechnol. 29, 735 (2011).doi:10.1038/nbt.1932 Medline

31. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754 (2009).doi:10.1093/bioinformatics/btp324 Medline

32. C. Trapnell et al., Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511 (2010).doi:10.1038/nbt.1621 Medline

33. M. Stanke, M. Diekhans, R. Baertsch, D. Haussler, Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637 (2008).doi:10.1093/bioinformatics/btn013 Medline

34. C. J. Lowe, K. Tagawa, T. Humphreys, M. Kirschner, J. Gerhart, Hemichordate embryos: Procurement, culture, and basic methods. Methods Cell Biol. 74, 171 (2004).doi:10.1016/S0091-679X(04)74008-X Medline

35. J. S. Eisen, J. C. Smith, Controlling morpholino experiments: Don’t stop making antisense. Development 135, 1735 (2008).doi:10.1242/dev.001115 Medline

36. D. C. Koboldt et al., VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568 (2012).doi:10.1101/gr.129684.111 Medline

37. K. Tamura et al., MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731 (2011).doi:10.1093/molbev/msr121 Medline

38. P. Rossi et al., A microscale protein NMR sample screening pipeline. J. Biomol. NMR 46, 11 (2010).doi:10.1007/s10858-009-9386-z Medline

39. R. C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792 (2004).doi:10.1093/nar/gkh340 Medline

40. A. M. Waterhouse, J. B. Procter, D. M. Martin, M. Clamp, G. J. Barton, Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189 (2009).doi:10.1093/bioinformatics/btp033 Medline

41. M. Clamp, J. Cuff, S. M. Searle, G. J. Barton, The Jalview Java alignment editor. Bioinformatics 20, 426 (2004).doi:10.1093/bioinformatics/btg430 Medline

42. H. Li et al.; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078 (2009).doi:10.1093/bioinformatics/btp352 Medline