Cancer of the ampulla of Vater: analysis of the whole genome sequence exposes a potential therapeutic vulnerability

Background Recent advances in the treatment of cancer have focused on targeting genomic aberrations with selective therapeutic agents. In rare tumors, where large-scale clinical trials are daunting, this targeted genomic approach offers a new perspective and hope for improved treatments. Cancers of the ampulla of Vater are rare tumors that comprise only about 0.2% of gastrointestinal cancers. Consequently, they are often treated as either distal common bile duct or pancreatic cancers. Methods We analyzed DNA from a resected cancer of the ampulla of Vater and whole blood DNA from a 63 year-old man who underwent a pancreaticoduodenectomy by whole genome sequencing, achieving 37× and 40× coverage, respectively. We determined somatic mutations and structural alterations. Results We identified relevant aberrations, including deleterious mutations of KRAS and SMAD4 as well as a homozygous focal deletion of the PTEN tumor suppressor gene. These findings suggest that these tumors have a distinct oncogenesis from either common bile duct cancer or pancreatic cancer. Furthermore, this combination of genomic aberrations suggests a therapeutic context for dual mTOR/PI3K inhibition. Conclusions Whole genome sequencing can elucidate an oncogenic context and expose potential therapeutic vulnerabilities in rare cancers.


Background
Advances in treatments for cancer have generally come incrementally because novel treatments are subjected to large prospective randomized clinical trials. In these studies, several hundred patients are randomized to one treatment arm or another and the treatment associated with the best outcome is advanced. This method has worked well for relatively common cancers, including breast and colon cancers. This approach, however, falls short when one is faced with rare cancers such that prospective trials involving large numbers of patients are difficult or impossible to conduct. In these cases, oncologists may choose chemotherapy regimens because the rare tumor is thought to be similar to a more common cancer for which an accepted standard treatment exists. Such is the case with cancers of the ampulla of Vater. These cancers account for only 0.2% of gastrointestinal cancers and approximately 7% of periampullary tumors. Periampullary tumors arise from either pancreatic ductal epithelium, the distal common bile duct, the duodenal mucosa, or the ampulla of Vater. When resectable, ampullary cancers are treated like pancreatic cancers with a pancreaticoduodenectomy. When they present at an advanced metastatic stage, there is little information guiding choices for chemotherapy regimens. Although they represent a minority in such trials, patients with ampullary cancers are often included in clinical trials of patients with biliary tract cancers, so these patients are often treated with gemcitabine and cisplatin [1].
Genomic technologies have resulted in some limited but remarkable advances in cancer treatment. Prior to the discovery of the Philadelphia chromosome and the identification of the BCR/ABL fusion protein leading to the development of imatinib, chronic myelogenous leukemia, a relatively rare form of the disease, was nearly uniformly fatal. Treatment was a bone marrow transplant with its attendant high risks of both morbidity and death. Treatment with imatinib, a tyrosine kinase inhibitor, can induce remission in approximately 87% of patients with greatly reduced risks of complications [2]. Imatinib was subsequently also found to be remarkably effective against gastrointestinal stromal tumors [3]. Other targeted drugs that have recently been shown to have efficacy in the setting of an indentified genomic aberration include vismodegib in advanced basal cell skin cancers harboring mutations in PTCH1, and vemurafenib in patients with advanced melanoma exhibiting a V600E mutation in the BRAF (v-raf murine sarcoma viral oncogene homolog B1) gene product [4,5].
The rapid advancement of genomic technologies offers the possibility to tailor chemotherapy based on an indepth analysis of a limited number of tumor samples. The advent of next generation sequencing technologies has now paved the way for near complete interrogation of tumor genomes, providing the first opportunity for efficient global genomic tumor profiling at the point mutation, copy number, and breakpoint dimensions of the cancer genome. At a time in which there is an increasing array of chemotherapy drugs targeting aberrant molecular pathways, individualized genomic analysis to aid treatment decisions is quickly becoming feasible. Such an approach seems particularly well suited to the treatment of rare cancers for which there is a paucity of other clinical data to guide therapy. To demonstrate the potential clinical utility of individualized genomic analysis in patients with rare cancers, we applied whole genome sequencing to the tumor of a 63-year-old man with a resected cancer of the ampulla of Vater and identified therapeutic targets distinct from what would have been targeted based on existing literature.

Samples
Written informed consent was obtained and the patient samples were collected for research purposes at Banner Good Samaritan Medical Center, Phoenix, Arizona. The study was approved by the Western Institutional Review Board (WIRB) and was conducted in accordance with the 1996 Declaration of Helsinki. This was a study entitled, 'Pancreas Cancer Biospecimens Repository' (WIRB ® Protocol #20040832). Informed consent was obtained from the patient with cancer of the ampulla of Vater, including written consent for collection of the tissue and whole blood samples as well as clinical information and for genetic analysis of the specimens. The samples were then anonymized and assigned a unique identifier. Samples included fresh frozen tumor tissue collected within 20 minutes after surgical resection. Whole blood was obtained before the start of the operation at the time of induction of anesthesia. Histopathological analysis of the frozen specimen was quality assessed and determined to contain approximately 60% tumor cellularity. DNA and RNA were extracted from frozen tissue and whole blood using the Qiagen All Prep kit (Germantown, MD, USA) using the manufacturer's recommendations.

Next generation sequencing
To facilitate whole genome next generation sequencing, we utilized the Life Technologies SOLiD™ (version 3) technology with mate-pair chemistry using the manufacturer's recommendations (Carlsbad, CA, USA). Briefly, 20 µg of genomic DNA is mechanically sheared to an average fragment size of 1.5 kb using the HydroShear. These sizeselected fragments are then end repaired and circularized around a long mate-pair adaptor by nicked ligation. Nick translation is then used to displace the nick roughly 70 bp from either side of the internal adaptor. A nuclease reaction linearizes these fragments. SOLiD™ sequencingspecific sequencing adaptors are then ligated to the ends of these fragments. We prepared two independent 1.5 kb mate-pair libraries from the patient's constitutional (germline) DNA, and two independent mate-pair libraries from the patient's tumor DNA. Following PCR amplification, these mate-pair libraries are then used as templates in emulsion PCR reactions using SOLiD™ proprietary sequencing beads to generate clonal single molecule templated beads. Subsequently, an average of 500,000 templated beads are enriched and deposited onto SOLiD™ flowcells for massive ligation-based sequencing to generate 50 bp × 50 bp mate-pair sequences per bead. For this germline/tumor pair, we sequenced an average of one billion beads per library, thus generating two billion matepair reads for germline and two billion mate-pair reads for tumor.

Next generation sequencing data processing
Raw next generation sequencing data in the form of csfasta and qual files are used to align 50 bp × 50 bp paired end reads from either the patient germline genome sequence or tumor genome sequence to the reference human genome (NCBI build 36, hg18). For alignment, we utilized the Life Technologies BioScope version 1.3 software suite, which is based upon a seed-and-extend algorithm [6]. Compressed binary sequence alignment/map (BAM) formatted output files for germline and tumor genome alignments are generated and PCR duplicates are subsequently removed using the Picard Tools.

Next generation sequencing data analysis Somatic single nucleotide variants
We employed two different algorithms. The first algorithm (SolSNP) [7] detects a SNP variant by comparing two discrete distributions. It compares the distance of the discrete sampled distribution of the base-pair pileup on each strand to the expected distributions (according to ploidy), and determines the genotype call. This is done using a Kolmogorov-Smirnov-like distance measure based on both the base (that is, reference or alternative base) as well as the confidence in the base called (that is, the quality score of each base in the pileup). If the genome is haploid, two expected pileups are created at each position: one consisting of only the reference base (a 'homozygous-reference' pileup) and another consisting of only the alternative base (a 'homozygous non-reference' pileup). The confidence of each pileup position is kept the same. The expected pileup that has the minimal Kolmogorov-Smirnov distance to the sampled pileup is considered to be the genotype of the locus on the strand. In diploid genomes, SolSNP also considers a pileup half of which is made up of the reference bases and the other half made of alternative bases (a heterozygous pileup). A locus on the chromosome is called a SNP if a variant genotype (either 'homozygous non-reference' or heterozygous) is detected on both strands. SolSNP can restrict its calls to loci where the genotype calls on both strands are identical. This is achieved by passing the 'Genotype Consensus' value to the parameter 'STRAND_MODE'. In this mode, the tool is able to produce genotype calls as well as variants. The second algorithm (Mutation Walker) calculates a test of proportions for the tumor/normal set to construct a test-statistic for reads in the forward direction and the reverse detection separately. The minimum of these two comparisons is used as the reported test-statistic, ensuring evidence is found in both the normal and reverse detection. Sites with evidence in the normal are filtered from the final report so as to reduce false positives arising from under-sampled polymorphic germline events. Calls common to both the algorithms were considered for further examination. To reduce the false negative rate, two sets of common calls were made. One was made with a strict and the other with a lenient set of parameters for both the algorithms. Both the sets were visually examined for false positives, which were then filtered to get a final list of true single nucleotide variants.

Indel detection
For detecting somatic indels we employed a two-step strategy. In the first step, we removed reads from the tumor sample BAM whose insert size lay outside the interval (500,5000) for SOLiD™. Genome Analysis Toolkit [8] was then used to generate a list of potential small indels from this BAM. A customized perl script, which used the Bio-SamTools library from BioPerl [9], then took these indel positions and for each of the indels looked at the region in the germline sample consisting of five bases upstream of the start and five bases downstream of the end of the indel. An indel was determined to be somatic only if there was no indel detected in the region under consideration.

Structural variants
Structural variants were analyzed by comparing two sources of information: relative normal/tumor read-level coverage and anomalously mapping read pairs. Assessing structural variants by read-level coverage is termed copy-number analysis since it is parallel in concept to microarrays. In copy number analysis, gains and losses were determined by calculating the log2 difference in normalized coverage between tumor and germline. Briefly, we investigated regions in 100 bp windows where the coverage in the germline was between 0.1 and 10 of the mode coverage in order to remove regions with high degrees of repeat sequence (for example, centromes or difficult to sequence regions. Normalized coverage was determined by the log2 coverage within a 100 bp bin over the overall modal coverage. We then reported the difference between the germline and tumor normalized coverage by a sliding window of size 2 kb. Deleted and amplified regions were flagged by a departure of greater than 0.75 from baseline. Moderate deletions were identified by a similar method utilizing sequence coverage rather than clonal coverage for consensus coding sequence exons only. In anomalous read-pair analysis, we used perl scripts to detect enrichment of anomalously mapping read-pairs. These would be read-pairs that deviate from the expected mate-pair orientation of both reads occurring in the same direction or read-pairs that are outside the expected 1.5 kb insert size. A series of customized perl scripts were employed in the detection of translocation. These scripts used SAMtools [10] internally to access the BAM files. The analysis itself was made up of two steps. The first was the detection of a potential translocation in both tumor and germline samples. The second was comparison of a potential translocation in tumor to those detected in the germline sample to weed out potential false positives for statistical identification of outliers. The genome was analyzed by a walker with step size equivalent to the insert size where the number of anomalous reads was counted, that is, those reads whose mates align on a different chromosome. For each window we chose the highest hit to be the chromosome to which mates of most of the discordant reads mapped. We compared the ratios of discordant reads to the total aligned reads across all the windows to detect potential outliers. Outlier detection was done under the assumption that the normal distribution of the proportion of hit discordant reads in 2 kb windows aggregated across the chromosome will follow a normal distribution. We then computed the mean of the distributions and chose a cutoff of 3 standard deviations. The window with a proportion of hit discordant reads higher than this cutoff contained the region of potential translocation. The actual region of translocation is then determined by the span of the hit discordant reads in the window. For somatic translocations, the germline and the tumor sample are called separately and regions of overlap are eliminated. The output is a general feature format (gff) file of paired lines where the source tag indicates which two genomic regions show potential translocations. These regions were further inspected to reduce false positives and arrive at the more confident list. Additional details related to the methods for detection of somatic translocations and intrachromosomal rearrangements are included in Additional file 1.

Validation of next generation sequencing findings
Briefly, ten single nucleotide variants and one local deletion were selected at random for chain termination sequencing (Sanger method). Validation was conducted using tumor DNA. Specific genomic primer pairs (Additional file 2) were designed to anneal in flanking single nucleotide variant regions and approximately 150 to 500 bp fragments to be amplified in 25-cycle PCR. Some primers carried M13 sequences on the 5' end as a back up for sequencing runs. Reaction products were column purified using a QIAquick PCR Purification kit (Qiagen) and submitted to the Arizona State University sequencing facility. Electropherograms were then manually examined for the presence of mutations/deletions in both orientations (Additional file 3).
Genomic quantitative PCR was performed to validate homozygous PTEN (phosphatase and tensin homolog) deletion (Additional file 4). In addition to the PTEN locus, genes located in adjacent regions of hemizygous deletion (RGR (retinal G protein coupled receptor) and HHEX (hematopoietically expressed homeobox)) were also measured. BICC1 (bicaudal C homolog 1 (Drosophila)) and TRUB1 (TruB pseudouridine (psi) synthase homolog 1), located in unaffected regions of chromosome 10, were used as internal controls. Quantitative PCR reactions were set up in a 384-well plate in triplicate with 3 ng of genomic DNA input per reaction. Amplifications were performed using a LightCycler480 instrument and SYBR-Green I Master Mix (Roche). Melting curves were examined for the presence of a single peak and Ct values were used in calculating fold-change according to the C T method [11]. All tumor and normal C T values were first normalized to glyceraldehyde 3-phosphate dehydrogenase (GAPDH). The quantity of genomic material present for each gene in the tumor sample was then normalized to its normal counterpart.

Results
The patient is a 63-year-old Caucasian man diagnosed with adenocarcinoma of the ampulla of Vater. The patient had a Whipple procedure to resect the head of the pancreas, distal stomach duodenum, distal common bile duct, and gallbladder. The maximum dimension of the tumor, which was present at the junction of the ampullary and duodenal mucosa was 1.5 cm. The tumor invaded into the duodenal muscle wall but no lymphatic or vascular invasion was noted. There was no evidence of neoplasm of the lines of resection and there was no evidence of metastatic carcinoma to the 16 peripancreatic lymph nodes examined microscopically (pathologic TNM (Tumor, Node, Metastasis) stage T2, N0, M0). The patient's past history is significant of having smoked one to two packs per day for 15 years, stopping approximately 16 years before the diagnosis of his adenocarcinoma of the ampulla of Vater.
Massively parallel whole-genome sequencing was performed on genomic DNA from germline and tumor samples using the Life Technologies SOLiD™ version 4.0 mate-pair chemistry. Basic sequence run statistics based on our analysis pipeline are provided in Table 1. A total of 2.38 and 2.21 billion uniquely mappable reads were generated from germline and tumor DNA, which equates to 108 Gb and 100 Gb of uniquely mappable sequence for germline and tumor, respectively. Therefore, we achieved 37× and 40× genome coverage for tumor and germline, respectively. We detected a total of 2,771,201 SNPs from the germline genome, 91% of which are present in dbSNP (release 129). The transition to transversion ratio was 2.12, which is inline with what would be expected in a diploid human genome [12]. The full genome has been deposited in the database of Genotypes and Phenotypes (dbGaP) of the National Center for Biotechnology Information (submission ID SRA 053213).
To discover somatic mutations within ampullary cancer, we used a custom paired analysis pipeline. The overview of somatic alterations within this tumor is provided in the form of a Circos plot (Figure 1). Our paired analysis revealed 19,143 genome-wide somatic point mutations, of which 30 map within known annotated coding sequences. A list of all somatic missense (n = 28) and nonsense mutations (n = 2) is provided in Table 2. The most notable mutation is an activating KRAS (Kirsten rat sarcoma viral oncogene homolog) mutation at codon 12 (G12V), which is one of the most commonly reported mutations in ampullary carcinomas [13,14]. Furthermore, we discovered three somatic small insertions and deletions within coding regions, which result in frameshift mutations ( Table 2). All missense mutations were assessed for likely functional consequences using the SIFT prediction algorithm [15,16], which characterized mutations as tolerated or damaging. Of the 28 missense mutations that were assessed, 19 (68%) were predicted to be damaging. Previously, we calculated the rate of SIFT damaging calls from a random set of approximately 10,000 missense variants from the 1000 Genomes data, which showed a rate of damaging mutations of 15%. Validation by Sanger sequence analysis is presented in Additional file 3.
To identify regions of somatic copy number loss, we utilized a basic algorithm that determined log2 ratios in  To identify potential cis chromosomal rearrangements and translocation events, we searched for significant evidence of discordant mate pairs. The long insert mate pairs provide improved power for detecting structural alterations through improved clonal coverage. Clonal coverage can be defined as the genomic coverage (that is, 30×) multiplied by the length of the insert (1,500 bp), divided by the amount of sequence derived from each mate pair (100 bp). For example, at 37× genomic coverage for our tumor specimen and with 1,500 bp average matepair insert size, and with 2 × 50 bp mate-pairs (or 100 bp total), we achieve a clonal coverage of 432×. With such high clonal coverage we have significant power to detect evidence of discordant mate-pair reads, where the length of the insert deviates substantially from the mean insert length and/or map to different chromosomes or chromosomal regions. Utilizing an algorithm that identified discordant mate-pairs specific to the tumor, we discovered two independent translocation events occurring in the tumor. Both events involve genes on each side of the translocation event. One event is evidenced by significant

Discussion
Adenocarcinomas of the ampulla of Vater are relatively rare, accounting for only 0.2% of gastrointestinal cancers [17]. Perhaps due to their location and propensity to present with jaundice at an early resectable stage, these tumors are more likely to be resectable at the time of diagnosis than are pancreatic cancers [18]. Furthermore, in comparison to pancreatic cancer, resected ampullary cancers are associated with better 5-year survival rates of 34 to 61% [19][20][21]. Surgical series have demonstrated the factors affecting survival include completeness of surgical resection and nodal status. Surgical treatment for ampullary cancer and cancers in the head of the pancreas are similar in that surgeons perform a pancreaticoduodenectomy. Thereafter, the treatments may diverge. There is no clear consensus on the role of or the optimal regimen for adjuvant chemotherapy in ampullary cancers. Similarly, in part due to its relative rarity, there is no clear standard chemotherapeutic regimen for recurrent or metastatic ampullary cancer. A better understanding of molecular oncogenesis and the emergence of targeted agents will likely lead to improved treatment outcomes in this and other cancers. Our study used whole genome sequencing to analyze the genome of a resected ampullary carcinoma. We found expected as well as novel aberrations. We found an activating mutation in KRAS codon 12. KRAS mutations are common in ampullary cancer although the 25 to 37% incidence appears to be lower than the approximately 95% rate of KRAS mutation seen in pancreatic adenocarcinomas [13,14,22,23]. Furthermore, similar to what is seen in colonic adenomas, KRAS mutations occur in benign ampullary adenomas, suggesting activating mutations of KRAS are relatively early events in the progression toward cancer and the mutation does not appear to affect prognosis [14]. This tumor also demonstrated a somatic nonsynonymous mutation in SMAD4 (mothers against decapentaplegic homolog 4), which has been observed previously in 50% of ampullary cancers but infrequently in bile duct cancers [24].
The most notable gene deletion we found was a focal deletion of a region in chromosome 10 including the PTEN tumor suppressor gene (phosphate and tensin homologue deleted on chromosome 10). Cowden's syndrome is characterized by a germline mutation in the PTEN gene resulting in loss of function. This syndrome is characterized by noncancerous hamartomas of the skin and mucous membranes and affected patients have in increased risk of tumors of the breast, thyroid, uterus and gastrointestinal tract. Benign tumors of the ampulla of Vater have been reported in patients with Cowden's syndrome but are not a common feature within cancers of the ampulla. Loss of PTEN expression by immunohistochemisty has been associated with liver metastases and poor prognosis in colon cancer [25]. In a large-scale survey of the genomic aberrations of pancreatic cancers, PTEN deletions were not seen, although small deleterious coding mutations were detected [26]. We can conclude that despite their anatomic location in proximity to the pancreas, ampullary cancers are distinct entities from adenocarcinoma of the pancreas and bile duct cancers and thus should be treated as a different entity.
To that end, the loss of PTEN expression is important not only in the pathogenesis but because it exposes a potential therapeutic target (Figure 3). The PTEN protein product is an inhibitor of phosphoinositide 3-kinase (PI3K) and downstream signaling through AKT. Phosphorylation of Akt results in phosphorylation of several target proteins involved in regulation of key cellular functions, including cell proliferation, glucose metabolism, protein translation, and cell survival [27]. Additionally, activation of the PI3K pathway has been linked to activation of mammalian target of rapamycin (mTOR), although the mechanism is not yet fully elucidated [28]. The presence of a deletion in PTEN in this ampullary cancer would be predicted to release from inhibition activation of the PI3K/mTOR pathway. Consequently, one can infer that an agent that is a dual PI3K/mTOR inhibitor, such as NVP-BEZ235, would be an attractive therapeutic option for our patient should his disease recur [29]. NVP-BEZ235 and other agents like it have been shown in vitro to inhibit growth of cancer cells with activating mutations of PI3K and are all under clinical development [30]. In the case presented here, however, the tumor carries both a KRAS activating mutation and complete inactivation of PTEN, supporting dual activation of both the MEK/ERK and the PI3K/AKT axes ( Figure 3). The inhibition of only one axis may not be sufficient for effective treatment as there is likely to be compensatory activity from the other activated axis.
Our group reported the beneficial results seen in a clinical trial on patients with refractory solid tumors whose chemotherapy was chosen based on analysis of tumor biopsies using immunohistochemistry and expression arrays [31]. New technologies such as applied herein have made high-throughput whole-genome sequencing a more rapid and cost-effective process in a manner not possible with older technologies such as Sanger sequencing. The prospect is raised, therefore, that one may soon be able to apply whole-genome sequencing to the analysis of an individual patient's tumor to guide an informed choice of a therapeutic regimen. This type of personalized or precision medicine has only begun to be studied. Several limitations remain before this whole-genome sequencing methodology can be widely applied, including the need for improved and standardized bioinformatic analysis, along with reliable and rapid methods for validation of genomic findings and cost. Furthermore, if a target is found, one must have access to an agent and, in many cases, such agents may not be approved for clinical use. Thus, we must begin to understand the links between genomic profile and drug context in early drug development. This is amplified even more where there is evidence to support combination therapies.

Conclusions
We have analyzed the whole genome sequence of a cancer of the ampulla of Vater to uncover the compendium of somatic events occurring in this tumor. Among the mutations discovered were those that might be considered potential therapeutic vulnerabilities. As whole-genome sequencing becomes more rapid and less expensive, the potential for targeted and truly personalized treatments increases. Consequently, as we continue to refine our abilities to uncover the full landscape of somatic alterations, we must in parallel continue innovative drug development methods, including preclinical and early phase I combination trials. This will allow us to understand toxicities and appropriate dosing regimens, to obtain the safest and most appropriate combinations matched to specific genomic and molecular contexts.