Tumor samples and DNA extraction
Archived FFPE pathology blocks of Merkel cell carcinoma (MCC) samples (n = 2) were obtained as previously described [6]. MCC cells from these previously analyzed samples were newly micro-dissected by the Roche Automated Tissue Dissection System (Roche) from 2–3 5-μm hematoxylin and eosin (H&E) stained sections, followed by shearing with sonication with the Covaris LE220 system. DNA was extracted using a MagAttract® HMW DNA mini Kit (Qiagen).
FFPE breast tumor samples (n = 4) were obtained for this study from the LifePool cohort (www.lifepool.org). LifePool prospectively recruits Australian female participants through the population-based mammographic screening program. Participants consent to use of their diagnostic tissue blocks for research. Ten-micron sections were H&E stained and DNA was extracted from manually needle micro-dissected cells using the Qiagen DNeasy FFPE Kit (Qiagen) as previously described [11] from both FFPE breast tumor samples and two FFPE pre-cancerous breast lesions (papilloma). The quality of DNA was assessed by a multiplex PCR assay [12] modified to include additional primer sets that produce up to 700 bp fragments from non-overlapping target sites in the GAPDH gene.
This study was approved by the Human Research Ethics Committee at the Peter MacCallum Cancer Centre. This study was carried out in accordance with all relevant regulations and guidelines.
Whole genome amplification
Extracted DNA from FFPE MCC samples were amplified using GenomePlex® Complete Whole Genome Amplification (WGA) kit (Sigma-Aldrich), following the manufacturer’s instruction with several minor modifications. In brief, 50 ng of DNA was prepared in a total volume of 10 μL for the fragmentation, followed by library preparation and 14 cycles of amplification as described in the protocol. The final product was purified using QIAquick® PCR purification kit (Qiagen), followed by quantification to determine the final concentration; the yield was 2–4 μg. The average fragmentation size of WGA products was 200–300 bp. A standard human genomic DNA was used as a positive control provided with the Genome Plex WGA kit (Sigma) and a no template control was used as a negative control.
NEB next FFPE repair
The NEB Next FFPE Repair kit (NEB M6630, New England® Biolabs Inc) was used for repairing 150 ng of total DNA, according to the manufacturer’s protocol with a minor change of eluting DNA in 30 μL instead of 40 μL. A total of 10 μL of eluted DNA (total 50 ng of repaired DNA) was used for WGA using the Sigma WGA kit as described above. The remaining 100 ng of repaired DNA was used for the library prep directly using the KAPA Hyper Prep Kit.
KAPA Hyper library preparation
Library preparation was performed as described in the KAPA Hyper Prep Kit Illumina® platforms (KR0961-v1.14, KAPA Bio systems). Slight modifications of the manufacturer’s protocol were incorporated. Briefly, 100 ng of non-WGA or unamplified DNA (both NEB Next treated repaired DNA and untreated DNA) was sheared with sonication (Covaris S2 system) for 3 × 60 s, with the following parameters: duty cycles of 10, intensity of 5, and 200 cycles/burst.
Subsequently, libraries of the both the fragmented unamplified DNA (200–400 bp) and WGA products were created by end repair and A-tailing, adaptor ligation with a stock concentration of 15 μM adaptor, followed by library amplification of six PCR cycles and eluted in 30 μL after post-amplification clean up. The library distribution was analyzed by TapeStation 2200 (Agilent Technologies) and quantified by Qubit (Life Technologies).
NEBNext® Ultra ™ II DNA Library Prep
Library preparation was performed from MCC samples (n = 2) with 5 ng and 20 ng input, breast tumor samples (n = 4), and pre-cancerous breast lesions (papilloma) (n = 2) with 5 ng DNA input as described in the NEBNext® Ultra ™ II DNA Library Prep Kit (NEB E7645S/L, New England BioLabs ® Inc.) with several minor modifications. In brief, DNA fragmented using the Covaris S2 in 50 μL was used for NEBNext End Prep, followed by an immediate adaptor ligation step with a 1.5 μM diluted adaptor. Clean-up of adaptor-ligated DNA without size selection was carried out followed by PCR amplification with eight cycles and ten cycles for 20 ng and 5 ng input, respectively. After adding resuspended AMPure XP Beads to the PCR products, the mixture was incubated at room temperature for at least 20 min instead of 5 min. Subsequently, after adding 33 μL elution buffer (0.1 × TE) into the beads after washing with ethanol, it was incubated for 10 min instead of 2 min. A total of 2 μL of the final 30 μL library was analyzed with the TapeStation for the size distribution.
Low coverage whole genome sequencing
The libraries prepared by both KAPA Hyper and NEBNext kits were used for LC WGS. An Illumina Nextseq platform (NextSeq 500) (paired-end 75 bp, on a mid-output flow cell) was used to run the pooled, normalized indexed libraries according to the standard protocol. The final concentration was 2 nM pooled and diluted to 1.8 pM as the standard Illumina protocol. Sequencing of those samples led to genome coverage of 1.6–1.8 × per sample.
Molecular inversion probe SNP arrays
The Affymetrix Molecular Inversion Probe (MIP) 330 K OncoScan array was used to analyze four breast cancer samples (version 3) and two papilloma samples (version 2) and was performed according to the manufacturer’s instructions by the Ramaciotti Centre for Genomics (version 3, NSW, Australia) or Affymetrix Inc (version 2, Santa Clara, CA, USA). DNA input was 40–100 ng for this assay.
Data analysis
Reads were aligned with bwa mem (v0.7.12-r1039) [13] to hg19 (GRCh37) after removal of sequencing primers by cutadapt (v1.7.1) [14]. ControlFREEC (version 6.7) [15] was used to estimate copy number from the low-coverage WGS data in 50 kb windows across hg19, with default parameters, no matched normal sample and baseline ploidy set to 2. Down-sampling of bam files was performed with samtools [16].
MIP data were pre-processed by the Ramaciotti Centre for Genomics or Affymterix Inc., with tumor samples batch normalized against Affymetrix controls [11].
All sample data were imported into Nexus (BioDiscovery Inc., Hawthorne, CA, USA) and segmented using SNP-FASST, a circular binary segmentation algorithm. Copy number gains were called if the log2 ratio of the segment was >0.15 and losses called if < –0.15. To reduce spurious calls, the genome was masked using a list of published problematic regions, including highly repetitive centromeric regions, where DNA copy number cannot be accurately measured [8].
Total CN profile overlap analysis was performed using Partek Genome Suite (Partek Inc., St. Louis, MO, USA). CNA segments for each matched pair were imported and the “finding regions in multiple samples” tool run, matching for event type (amplification/deletion). This tool reports each CNA region shared at base-pair resolution as well as each CNA region unique to a sample. Shared CN neutral regions were calculated by subtracting the length of all shared CNAs as well as sample only CNA events from the total base pairs covered.
Median Absolute Pair-wise Difference (MAPD) score was calculated as follows: if xi: is the log2 ratio for marker i: then MAPD = median(|x
i+1
− x
i
|,i ordered by genomic position). This metric provides a measure of the noise of the sample that is less dependent on true biological copy number variation than, for example, standard deviation.
FREEC normalized read counts in 50 kb bins were extracted from regions called as a gain or loss by FREEC in at least one of the 5 ng, 20 ng, and 100 ng DNA inputs or the WGA libraries for MCT-4 and MCT-6 LC WGS data. Gains or losses in regions lacking MIP array probes and regions in the blacklist of Scheinin et al. [8] were filtered out. The Pearson correlation of bin counts in these CNA regions was calculated and used to cluster by Euclidean distance using the hclust() function of R 3.2.1. Correlation between samples was visualized using the pheatmap package.