Derivation of HLA types from shotgun sequence datasets
© Warren et al.; licensee BioMed Central Ltd. 2012
Received: 8 June 2012
Accepted: 10 December 2012
Published: 10 December 2012
The human leukocyte antigen (HLA) is key to many aspects of human physiology and medicine. All current sequence-based HLA typing methodologies are targeted approaches requiring the amplification of specific HLA gene segments. Whole genome, exome and transcriptome shotgun sequencing can generate prodigious data but due to the complexity of HLA loci these data have not been immediately informative regarding HLA genotype. We describe HLAminer, a computational method for identifying HLA alleles directly from shotgun sequence datasets (http://www.bcgsc.ca/platform/bioinfo/software/hlaminer). This approach circumvents the additional time and cost of generating HLA-specific data and capitalizes on the increasing accessibility and affordability of massively parallel sequencing.
Due to its central role in adaptive immunity, human leukocyte antigen (HLA) is implicated in wide ranging areas of medicine, from infectious disease and vaccinology to cancer, autoimmunity, aging and regenerative and transplantation medicine [1–7]. The HLA locus is the most polymorphic region of the genome with over 5,000 variant HLA-class I allelic sequences catalogued to date. This genetic heterogeneity is the principal challenge to HLA typing methodologies, and it is the reason why this region has remained largely opaque to analysis by next-generation sequencing (NGS) platforms. Conventional sequence-based HLA typing approaches, the most recent of which exploits the sequence throughput of the Illumina MiSeq  and relatively long sequence reads of the 454 NGS platform , are targeted assays that rely on amplification of hypervariable sub-regions of these loci and variant detection within these amplicons. As such, HLA calls are based on sequence information that is not as comprehensive as for shotgun datasets, and must be generated de novo for each subject. The widespread uptake of large-scale genome, exome and transcriptome shotgun sequencing approaches for biomedical research, and now for clinical use, prompted us to explore the utility of these types of NGS data sets for HLA typing. The need has been for a solution to the problem of managing the many millions of short sequence reads NGS technologies produce, managing the many thousands of reference allele sequences, and integrating all of these data in a manner that maximally informs HLA content. Here we present a method for HLA allele prediction from next-generation shotgun sequence datasets. We focus on data generated from the Illumina platform, from which most sequence data are currently derived worldwide. Importantly, HLA allele assignments from shotgun datasets can not be derived from standard alignment-based interpretive methods for the simple reason that the extant genome reference sequences [10, 11] on which these methods rely do not provide any useful representation of HLA allelic diversity. Therefore, we have developed a computational pipeline that derives HLA allele predictions by targeted assembly of shotgun sequence data and comparison to a database of reference allele sequences. Our solution allows, for the first time, application of the power of NGS to the interrogation of one of the most important and complex sets of human genes. Our method is scalable, such that it will provide utility in extracting HLA information even from very large sequence data sets, such as those currently being compiled by various international consortia [12–15].
Materials and methods
Library construction and sequencing
Written informed consent was obtained from all donors and samples were collected following assessment of tissue specimens by a pathologist according to standardized operating procedures, immediately following surgical resection. Library construction and Illumina sequencing were performed as previously described for RNA-Seq  and whole genome shotgun (WGS) . For the colorectal cancer (CRC) RNA-Seq study, four lanes of 100-nucleotide paired-end sequences were obtained for each of the two pools, providing an average of 5 million paired reads per sample. For WGS, approximately 430 million paired 100-nucleotide WGS reads (approximately 30× depth coverage human genome) from normal and tumor samples from four diffuse large B cell lymphoma patients were processed . The sequencing data from the CRC study have been submitted to the NCBI Sequence Read Archive  under accession number SRP010181. A file describing the sample libraries is available at .
Exome capture libraries were prepared using the SureSelect system (Agilent) according to the manufacturer's instructions. Approximately 30 million (normal samples) and 120 million (normal plus tumor samples) 100-nucleotide exon capture paired-end sequence reads were generated from three ovarian cancer patients whose HLA alleles were verified by PCR-based methods. Verification of HLA allele predictions was accomplished by PCR amplification of exons 2 and 3 from HLA-I A, B and C, followed by capillary sequencing as previously described .
HLA coding DNA sequence (CDS) and genomic sequence databases from release 3.3.0 and 3.4.0 were obtained, respectively, from . HLA-I exon 2 and 3 concatenated sequence FASTA files were prepared using exon coordinates available from the flat file database (EMBL format) released by IMGT . For HLA allele predictions from RNA-Seq data, we used concatenated exons 2 and 3 as sequence targets for assembly using the TASR assembly tool . For predictions from genome and exome NGS data, we used HLA-I genomic sequences from major genes A, B and C.
Computational HLA allele predictions by targeted read assembly
since individual contig probabilities and reciprocal best hits are independent events. A short list of HLA allele groups (for example, A*02) and protein coding alleles (for example, A*02:01), sorted by decreasing score, are catalogued for each major HLA gene. When separate contigs characterize the same types, only the types that overlap are reported, unless the non-overlapping ones are characterized by additional, distinct contig(s). In addition, we summarize ambiguous HLA alleles using the P designation, when applicable.
Simulated data sets
In separate experiments, we removed HLA CDS, exonic regions and genes from 15K randomly selected Ensembl  transcripts, approximately 220K exon regions  (SureSelect Target Enrichment, Agilent Technologies, Inc.  and the HuRef genome ). For each data type, we randomly generated 20 sets of six (2 × A, 2 × B, 2 × C) HLA-I alleles (total of 120 alleles). In triplicate experiments, we merged each set of six sequences with HLA-less CDS, exon regions or HuRef, respectively, and simulated at various depth of coverage 50-, 75-, 100- or 150-nucleotide paired-end reads with 0.5, 1, 2 or 3 error using SAMtools  wgsim, and ran TASR and BLASTN as described above. For the simulation from direct read pair alignments, we used the simulated reads described above and ran BWA  with defaults and generated HLAminer predictions from SAM files.
We define the sensitivity as a proportion, that is, the number of HLA allele groups or protein coding alleles detected over the sum of distinct groups or protein coding alleles randomly chosen for the simulation or predicted by PCR, when applicable. The ambiguity rate is the proportion of all ambiguous predictions per total allele groups or protein coding alleles predicted. Ambiguous predictions arise when HLA allele groups or protein coding gene names differ despite having an identical score and probability. The specificity is defined as a proportion of number of groups or alleles predicted accurately divided by the total number of groups or alleles detected, respectively.
HLA class I alleles were predicted directly from the RNA-Seq data as described . Briefly, genomic DNA was extracted from patient granulocytes, and exons two and three from HLA class I genes (A, B, and Cw) were amplified by PCR . PCR amplicons were cloned and sequenced using an ABI 3730XL instrument, according to standard procedures. Clone sequences were assembled using Phred/Phrap/Consed . The resulting sequence data were aligned against all available exon 2 and 3 nucleotide sequences from the 3.1.0 release of the IMGT/HLA database  using ClustalW . Protein coding allele assignments  were based on high-quality exact or synonymous matches at informative nucleotide positions.
Results and discussion
Output from HLAminer HLA class I predictions from a CRC patient 100-nucleotide RNA-Seq sample
Expect value (Eval)
Confidence (-10 × log10(Eval))
Predictiond 1 - A*02
Prediction 2 - A*11
Prediction 1 - B*27
Prediction 2 - B*07
Prediction 1 - C*07
Prediction 2 - C*02
For initial evaluation of HLAminer we relied on simulated data sets, which allowed us to determine the influence on performance of sequencing parameters such as depth of coverage, sequence read length, and sequence error. We produced simulated data sets for each of the three formats, RNA-Seq, WGS and exome, by taking reference sequences (Ensembl transcripts, the HuRef genome, and hg19 exon capture regions, respectively) and substituting HLA-I A, B and C sequences with two randomly chosen alleles of HLA-I A, B and C. From these modified references we generated faux sequence reads. For each sequence format, 20 such data sets were generated and these were queried in triplicate, yielding a total of 360 allele predictions per condition tested. The sensitivity, specificity and ambiguity of HLA class I allele prediction was evaluated by comparing the highest-scoring HLAminer predictions to the randomly selected alleles. By ambiguity we mean the prediction of multiple, equally probably alleles.
Overall, from simulation, RNA-Seq datasets provided the greatest utility for HLA prediction. This may be due, in part, to lower representation in RNA-Seq data of off-target regions, such as the minor HLA class I genes, pseudogenes, and HLA class II genes, compared to genome or exome data, where these regions would be expected to have approximately equal representation as the class I alleles of interest, A, B and C. The stark contrast in HLAminer predictions derived from RNA-Seq compared to WGS or exome capture highlights intrinsic properties of these datasets and their value for computational HLA predictions. Functional HLA-I alleles are expressed on all nucleated cells, and despite possible amplification biases in the RNA-Seq library construction protocol, the high abundance of HLA-I transcripts is such that relatively low depth of sequencing is needed for robust predictions (approximately 5 million). On the other hand, non-functional (null) HLA alleles that are present in the genome (but transcriptionally silent) can confound HLA prediction from WGS or exome capture data, since the functional alleles and null alleles are equally represented in these data types. HLAminer has the functionality to report predictions from null alleles, if desired.
Effect of read length and base error on HLAminer predictions from targeted assembly of simulated RNA-Seq dataa
HLA allele resolution
Base error (%)
Read length (nucleotides)
Sensitivity (mean ± SD%)
Specificity (mean ± SD%)
Ambiguity(mean ± SD%)
13.62 ± 2.80
92.86 ± 10.10
19.06 ± 16.53
62.32 ± 3.62
90.27 ± 3.28
8.93 ± 2.67
95.72 ± 0.53
96.31 ± 0.02
4.46 ± 3.84
97.97 ± 2.80
95.73 ± 3.63
0.00 ± 0.00
93.04 ± 4.60
96.39 ± 1.44
2.91 ± 1.69
95.72 ± 0.53
96.31 ± 0.02
4.46 ± 3.84
64.64 ± 4.79
96.13 ± 1.43
13.40 ± 3.02
6.67 ± 2.51
100.00 ± 0.00
8.59 ± 8.34
7.78 ± 1.92
60.51 ± 20.02
27.78 ± 4.81
51.94 ± 2.93
77.36 ± 5.44
37.38 ± 11.16
86.84 ± 1.75
88.13 ± 1.41
36.32 ± 4.76
93.33 ± 3.63
93.07 ± 2.91
22.87 ± 2.40
84.72 ± 5.42
89.65 ± 2.25
24.03 ± 3.00
86.84 ± 1.75
88.13 ± 1.41
36.32 ± 4.76
56.94 ± 1.73
87.49 ± 4.77
39.14 ± 5.73
4.44 ± 2.10
68.69 ± 17.03
37.22 ± 25.62
Next, we evaluated the performance of HLAminer with real shotgun datasets, including RNA-Seq data from 16 CRC libraries (RL Warren, DJ Freeman, P Watson, RA Moore, EA Allen-Vercoe, RA Holt, manuscript submitted), WGS and RNA-Seq from four lymphoma libraries  and exon capture data from three ovarian cancer libraries (Figure 2; Additional file 1). HLA predictions were compared to results from these same subjects obtained from standard PCR and capillary sequence-based typing . Results mirrored those obtained from simulated data. For all data types, prediction of allele groups was more reliable than prediction of individual protein coding alleles. For prediction of allele groups, the CRC RNA-Seq data yielded predictions with highest sensitivity and specificity (>96.5%) and low ambiguity (<2.4%), even at low sequence depth (approximately 5 million pairs per sample). From a total of 81 HLA allele groups predicted by HLAminer on the CRC cohort shotgun data only a single allele group prediction conflicted with PCR-based typing results (Additional file 2). For WGS and exome data, high sensitivity and specificity could also be achieved, but only at much higher depth of coverage. For all data types, the ambiguity associated with prediction of individual protein coding alleles were higher than for prediction of allele groups, with predictions from exome data sets more significantly impacted than predictions from WGS or RNA-Seq data sets. HLAminer predictions were also benchmarked on low-coverage 100-nucleotide WGS data from 20 individuals of the 1000 Genomes cohort . HLA class I allele predictions obtained from these same HapMap samples by the targeted PCR method of Erlich and colleagues have been previously published . Applying HLAminer to this data set, allele group sensitivity and specificity of 86.7 ± 15.9% and 86.3 ± 16.1% were achieved (Additional file 3), despite the relatively low number of genome shotgun reads processed (mean ± standard deviation of 361.2 ± 80.9 million. Further, our results from these 1000 Genomes samples are consistent with those we obtained from the diffuse large B cell lymphoma control normal tissue (sensitivity and specificity of 68.8 ± 31.5% and 71.3 ± 21.8%) and tumor tissue (sensitivity and specificity of 75.0 ± 20.4% and 87.5 ± 14.4%) WGS datasets, for which substantially higher sequence coverage was available (approximately 1.1 billion reads per sample). As discussed, the data type availability (WGS) and the lower depth of coverage (10- to 20-fold) are both limiting factors for HLAminer predictions.
HLAminer can evaluate reads by direct alignment (Figure 1, right). However, with Illumina read lengths currently ranging from 100 to 150 nucleotides, this approach has limited utility at the present time. At best we observed 80.0 ± 3.5% sensitivity and 78.2 ± 2.8% specificity (mean ± standard deviation; Figure 2, top panel; Additional file 1).
Regardless of input data, HLAminer predictions for HLA allele groups (two-digit resolution) are more robust than for HLA protein-coding alleles (four-digit resolution) (Figure 2; Additional file 1). Both the sensitivity and specificity of four-digit allele predictions are reduced relative to their two-digit counterparts, but changes to the ambiguity of predictions are more pronounced. For example, with 5 million × 100-nucleotide simulated RNA-Seq read pairs, four-digit predictions show a 8.9% decrease in sensitivity, 8.2% decrease in specificity, and a 31.9% increase in ambiguity, compared to two-digit predictions. This is because HLA coding alleles often differ by only a single base. In contrast to conventional HLA genotyping methods where sequence analysis is restricted to HLA amplicons, a target of reduced complexity compared to shotgun sequence data, HLAminer interrogates the full diversity of sequence information in whole transcriptome, whole genome or whole exome datasets. Here, single base differences can be more easily missed due to factors such as low or unequally distributed sequence coverage and base errors. Thus, the performance of HLAminer for robust four-digit HLA allele calls is a limitation of the current data sets and performance is expected to improve as sequencing technologies evolve to offer greater accuracy and read length at reduced cost.
HLAminer is the implementation of a strategy for automated HLA typing directly from NGS data sets. It is a fundamentally different approach compared to conventional methods that all rely on first amplifying HLA genes. The identification of allelic variants from an individual NGS data set by simple sequence search or alignment is precluded by the complexity of the locus, the massive allelic diversity in the population and limitations of short sequence reads to adequately capture these variations. The option of typing, retrospectively, existing cohorts for which NGS data have already been generated is enabling, particularly for large community resource projects [12–15]. In this context, the HLA info is value-added, as no additional cost is necessary to generate further HLA specific data from an existing data set. The method can also be applied prospectively. In fact, it may turn out to be the case that it is most efficient to do all HLA typing by shotgun sequencing, since these types of data sets are maximally informative and are becoming routine to generate.
It is recognized that certain HLA allele combinations are common in certain populations, presumably due to linkage disequilibrium . For example the combination of HLA-I A*01; C*07; B*08 is common in some western European populations. These conserved extended haplotypes are not yet well represented in HLA databases, but in future we will explore the possibility of using this type of information to further improve computational HLA typing. We also expect to extend our approach to prediction of HLA class II alleles. HLAminer is available for public use .
coding DNA sequence
human leukocyte antigen
polymerase chain reaction
whole genome shotgun.
This work was supported by the Canadian Institutes of Health Research and Genome British Columbia.
- Carrington M, O'Brien SJ: The influence of HLA genotype on AIDS. Annu Rev Med. 2003, 54: 535-551. 10.1146/annurev.med.54.101601.152346.PubMedView ArticleGoogle Scholar
- Dawson DV, Ozgur M, Sari K, Ghanayem M, Kostyu DD: Ramifications of HLA class I polymorphism and population genetics for vaccine development. Genet Epidemiol. 2001, 20: 87-106. 10.1002/1098-2272(200101)20:1<87::AID-GEPI8>3.0.CO;2-R.PubMedView ArticleGoogle Scholar
- Fernando MM, Stevens CR, Walsh EC, De Jager PL, Goyette P, Plenge RM, Vyse TJ, Rioux JD: Defining the role of the MHC in autoimmunity: a review and pooled analysis. PLoS Genet. 2008, 4: e1000024-10.1371/journal.pgen.1000024.PubMedPubMed CentralView ArticleGoogle Scholar
- Mizuki N, Meguro A, Ota M, Ohno S, Shiota T, Kawagoe T, Ito N, Kera J, Okada E, Yatsu K, Song YW, Lee EB, Kitaichi N, Namba K, Horie Y, Takeno M, Sugita S, Mochizuki M, Bahram S, Ishigatsubo Y, Inoko H: Genome-wide association studies identify IL23R-IL12RB2 and IL10 as Behçet's disease susceptibility loci. Nat Genet. 2010, 42: 703-706. 10.1038/ng.624.PubMedView ArticleGoogle Scholar
- Rioux JD, Goyette P, Vyse TJ, Hammarström L, Fernando MM, Green T, De Jager PL, Foisy S, Wang J, de Bakker PI, Leslie S, McVean G, Padyukov L, Alfredsson L, Annese V, Hafler DA, Pan-Hammarström Q, Matell R, Sawcer SJ, Compston AD, Cree BA, Mirel DB, Daly MJ, Behrens TW, Klareskog L, Gregersen PK, Oksenberg JR, Hauser SL: Mapping of multiple susceptibility variants within the MHC region for 7 immune-mediated diseases. Proc Natl Acad Sci USA. 2009, 106: 18680-18685.PubMedPubMed CentralView ArticleGoogle Scholar
- Ryder LP, Svejgaard A, Dausset J: Genetics of HLA disease association. Annu Rev Genet. 1981, 15: 169-187. 10.1146/annurev.ge.15.120181.001125.PubMedView ArticleGoogle Scholar
- Shugart YY, Wang Y, Jia WH, Zeng YX: GWAS signals across the HLA regions: revealing a clue for common etiology underlying infectious tumors and other immunity diseases. Chin J Cancer. 2011, 30: 226-230. 10.5732/cjc.011.10075.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang C, Krishnakumar S, Wilhelmy J, Babrzadeh F, Stepanyan L, Su LF, Levinson D, Fernandez-Viña MA, Davis RW, Davis MM, Mindrinos MN: High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc Natl Acad Sci USA. 2012, 109: 8676-8681. 10.1073/pnas.1206614109.PubMedPubMed CentralView ArticleGoogle Scholar
- Erlich RL, Jia X, Anderson S, Banks E, Gao X, Carrington M, Gupta N, DePristo MA, Henn MR, Lennon NJ, de Bakker PI: Next-generation sequencing for HLA typing of class I loci. BMC Genomics. 2011, 12: 42-10.1186/1471-2164-12-42.PubMedPubMed CentralView ArticleGoogle Scholar
- International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.View ArticleGoogle Scholar
- Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC: The Diploid Genome Sequence of an Individual Human. PLoS Biol. 2007, 5: e254-10.1371/journal.pbio.0050254.PubMedPubMed CentralView ArticleGoogle Scholar
- Cancer Genome Atlas Research Network, McLendon R, Friedman A, Bigner D, Van Meir EG, Brat DJ, Mastrogianakis GM, Olson JJ, Mikkelsen T, Lehman N, Aldape K, Yung WK, Bogler O, Weinstein JN, VandenBerg S, Berger M, Prados M, Muzny D, Morgan M, Scherer S, Sabo A, Nazareth L, Lewis L, Hall O, Zhu Y, Ren Y, Alvi O, Yao J, Hawes A, Jhangiani S, et al: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008, 455: 1061-1068. 10.1038/nature07385.View ArticleGoogle Scholar
- Human Microbiome Jumpstart Reference Strains Consortium, Nelson KE, Weinstock GM, Highlander SK, Worley KC, Creasy HH, Wortman JR, Rusch DB, Mitreva M, Sodergren E, Chinwalla AT, Feldgarden M, Gevers D, Haas BJ, Madupu R, Ward DV, Birren BW, Gibbs RA, Methe B, Petrosino JF, Strausberg RL, Sutton GG, White OR, Wilson RK, Durkin S, Giglio MG, Gujja S, Howarth C, Kodira CD, Kyrpides N, et al: A catalog of reference genomes from the human microbiome. Science. 2010, 328: 994-999.View ArticleGoogle Scholar
- International Cancer Genome Consortium, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, et al: International network of cancer genome projects. Nature. 2010, 464: 993-998. 10.1038/nature08987.View ArticleGoogle Scholar
- 1000 Genomes Project Consortium, Altshuler D, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Collins FS, De La Vega FM, Donnelly P, Egholm M, Flicek P, Gabriel SB, Gibbs RA, Knoppers BM, Lander ES, Lehrach H, Mardis ER, McVean GA, Nickerson DA, Peltonen L, Schafer AJ, Sherry ST, Wang J, Wilson R, Gibbs RA, Deiros D, Metzker M, Muzny D, Reid J, et al: A map of human genome variation from population-scale sequencing. Nature. 2010, 467: 1061-1073. 10.1038/nature09534.View ArticleGoogle Scholar
- Castellarin M, Warren RL, Freeman JD, Dreolini L, Krzywinski M, Strauss J, Barnes R, Watson P, Allen-Vercoe E, Moore RA, Holt RA: Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 2012, 22: 299-306. 10.1101/gr.126516.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Morin RD, Mendez-Lago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, Johnson NA, Severson TM, Chiu R, Field M, Jackman S, Krzywinski M, Scott DW, Trinh DL, Tamura-Wells J, Li S, Firme MR, Rogic S, Griffith M, Chan S, Yakovenko O, Meyer IM, Zhao EY, Smailus D, Moksa M, Chittaranjan S, Rimsza L, Brooks-Wilson A, Spinelli JJ, Ben-Neriah S, et al: Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature. 2011, 476: 298-303. 10.1038/nature10351.PubMedPubMed CentralView ArticleGoogle Scholar
- NCBI Sequence Read Archive. [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi]
- Sample libraries. [ftp://ftp.bcgsc.ca/supplementary/CRC2012/]
- Warren RL, Freeman JD, Zeng T, Choe G, Munro S, Moore R, Webb JR, Holt RA: Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 2011, 21: 790-797. 10.1101/gr.115428.110.PubMedPubMed CentralView ArticleGoogle Scholar
- HLA CDS and genomic sequences. [ftp://ftp.ebi.ac.uk/pub/databases/imgt/mhc/hla/]
- Robinson J, Waller MJ, Fail SC, McWilliam H, Lopez R, Parham P, Marsh SG: The IMGT/HLA database. Nucleic Acids Res. 2009, 37: D1013-D1017. 10.1093/nar/gkn662.PubMedPubMed CentralView ArticleGoogle Scholar
- Warren RL, Holt RA: Targeted assembly of short sequence reads. PLoS ONE. 2011, 6: e19816-10.1371/journal.pone.0019816.PubMedPubMed CentralView ArticleGoogle Scholar
- Warren RL, Sutton GG, Jones SJ, Holt RA: Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007, 23: 500-501. 10.1093/bioinformatics/btl629.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Rios D, Ritchie GR, Ruffier M, Schuster M, et al: Ensembl 2011. Nucleic Acids Res. 2011, 39: D800-D806. 10.1093/nar/gkq1064.PubMedPubMed CentralView ArticleGoogle Scholar
- Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C: Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009, 27: 182-189. 10.1038/nbt.1523.PubMedPubMed CentralView ArticleGoogle Scholar
- SureSelect Target Enrichment. [https://earray.chem.agilent.com/earray/]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMedPubMed CentralView ArticleGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.PubMedPubMed CentralView ArticleGoogle Scholar
- Cereb N, Maye P, Lee S, Kong Y, Yang SY: Locus-specific amplification of HLA class I genes from genomic DNA: Locus-specific sequences in the first and third introns of HLA-A, -B, and -C alleles. Tissue Antigens. 1995, 45: 1-11. 10.1111/j.1399-0039.1995.tb02408.x.PubMedView ArticleGoogle Scholar
- Phred/Phrap/Consed. [http://www.phrap.org]
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal × version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.PubMedView ArticleGoogle Scholar
- Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, Fernández-Viña M, Geraghty DE, Holdsworth R, Hurley CK, Lau M, Lee KW, Mach B, Maiers M, Mayr WR, Müller CR, Parham P, Petersdorf EW, Sasazuki T, Strominger JL, Svejgaard A, Terasaki PI, Tiercy JM, Trowsdale J: Nomenclature for factors of the HLA system, 2010. Tissue Antigens. 2010, 75: 291-455. 10.1111/j.1399-0039.2010.01466.x.PubMedPubMed CentralView ArticleGoogle Scholar
- de Bakker PI, McVean G, Sabeti PC, Miretti MM, Green T, Marchini J, Ke X, Monsuur AJ, Whittaker P, Delgado M, Morrison J, Richardson A, Walsh EC, Gao X, Galver L, Hart J, Hafler DA, Pericak-Vance M, Todd JA, Daly MJ, Trowsdale J, Wijmenga C, Vyse TJ, Beck S, Murray SS, Carrington M, Gregory S, Deloukas P, Rioux JD: A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet. 2006, 38: 1166-1172. 10.1038/ng1885.PubMedPubMed CentralView ArticleGoogle Scholar
- HLAminer. [http://www.bcgsc.ca/platform/bioinfo/software/hlaminer/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.