Patient eligibility criteria
This study was conducted among men with prostate cancer (Clinical data in Additional file 1, Table S1) who met the following criteria: histologically-proven, based on a biopsy, metastasized prostate cancer. We distinguished between CRPC and CSPC based on the guidelines on prostate cancer from the European Association of Urology , that is: 1, castrate serum levels of testosterone (testosterone <50 ng/dL or <1.7 nmol/L); 2, three consecutive rises of PSA, 1 week apart, resulting in two 50% increases over the nadir, with a PSA >2 ng/mL; 3, anti-androgen withdrawal for at least 4 weeks for flutamide and for at least 6 weeks for bicalutamide; 4, PSA progression, despite consecutive hormonal manipulations. Furthermore, we focused on patients who had ≥5 CTCs per 7.5 mL  and/or a biphasic plasma DNA size distribution as described previously by us .
The study was approved by the ethics committee of the Medical University of Graz (approval numbers 21-228 ex 09/10, prostate cancer, and 23-250 ex 10/11, prenatal plasma DNA analyses), conducted according to the Declaration of Helsinki, and written informed consent was obtained from all patients and healthy blood donors. Blood from prostate cancer patients and from male controls without malignant disease was obtained from the Department of Urology or the Division of Clinical Oncology, Department of Internal Medicine, at the Medical University of Graz. From prostate cancer patients we obtained a buccal swab in addition. Blood samples from pregnant females and from female controls without malignant disease were collected at the Department of Obstetrics and Gynecology, Medical University of Graz. The blood samples from the pregnant females were taken prior to an invasive prenatal diagnostic procedure.
Plasma DNA preparation
Plasma DNA was prepared using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) as previously described . Samples selected for sequence library construction were analyzed by using the Bioanalyzer instrument (Agilent Technologies, Santa Clara, CA, USA) to observe the plasma DNA size distribution. In this study we included samples with a biphasic plasma DNA size distribution as previously described .
Enumeration of CTCs
We performed CTC enumeration using the automated and FDA approved CellSearch assay. Blood samples (7.5 mL each) were collected into CellSave tubes (Veridex, Raritan, NJ, USA). The Epithelial Cell Kit (Veridex) was applied for CTC enrichment and enumeration with the CellSearch system as described previously [42, 43].
Array-CGH was carried out using a genome-wide oligonucleotide microarray platform (Human genome CGH 60K microarray kit, Agilent Technologies, Santa Clara, CA, USA), following the manufacturer's instructions (protocol version 6.0) as described . Evaluation was done based on our previously published algorithm [33, 44, 45].
HT29 dilution series
Sensitivity of our plasma-Seq approach was determined using serial dilutions of DNA from HT29 cell line (50%, 20%, 15%, 10%, 5%, 1%, and 0%) in the background of normal DNA (Human Genomic DNA: Female; Promega, Fitchburg, WI, USA). Since quantification using absorption or fluorescence absorption is often not reliable we used quantitative PCR to determine the amount of amplifiable DNA and normalized the samples to a standard concentration using the Type-it CNV SYBR Green PCR Kits (Qiagen, Hilden, Germany). Dilution samples were then fragmented using the Covaris S220 System (Covaris, Woburn, MA, USA) to a maximum of 150-250 bp and 10 ng of each dilution were used for library preparation to simulate plasma DNA condition.
Shotgun libraries were prepared using the TruSeq DNA LT Sample preparation Kit (Illumina, San Diego, CA, USA) following the manufacturer´s instructions with three exceptions. First, due to limited amounts of plasma DNA samples we used 5-10 ng of input DNA. Second, we omitted the fragmentation step since the size distribution of the plasma DNA samples was analyzed on a Bioanalyzer High Sensitivity Chip (Agilent Technologies, Santa Clara, CA, USA) and all samples showed an enrichment of fragments in the range of 160 to 340 bp. Third, for selective amplification of the library fragments that have adapter molecules on both ends we used 20-25 PCR cycles. Four libraries were pooled equimolarily and sequenced on an Illumina MiSeq (Illumina, San Diego, CA, USA).
The MiSeq instrument was prepared following routine procedures. The run was initiated for 1x150 bases plus 1x25 bases of SBS sequencing, including on-board clustering and paired-end preparation, the sequencing of the respective barcode indices and analysis. On the completion of the run, data were base called and demultiplexed on the instrument (provided as Illumina FASTQ 1.8 files, Phred+33 encoding). FASTQ format files in Illumina 1.8 format were considered for downstream analysis.
Calculation of segments with identical log2 ratio values
We employed a previously published algorithm  to create a reference sequence. The pseudo-autosomal region (PAR) on the Y chromosome was masked and the mappability of each genomic position examined by creating virtual 150 bp reads for each position in the PAR-masked genome. Virtual sequences were mapped to the PAR-masked genome and mappable reads were extracted. Fifty thousand genomic windows were created (mean size, 56,344 bp) each having the same amount of mappable positions.
Low-coverage whole-genome sequencing reads were mapped to the PAR-masked genome and reads in different windows were counted and normalized by the total amount of reads. We further normalized read counts according to the GC content using LOWESS-statistics. In order to avoid position effects we normalized the sequencing data with GC-normalized read counts of plasma DNA of our healthy controls and calculated log2 ratios.
Resulting normalized ratios were segmented using circular binary segmentation (CBS) [47
] and GLAD [48
] by applying the CGHweb [49
] framework in R [50
]. These segments were used for calculation of the segmental z-scores by adding GC-corrected read-count ratios (read-counts in window divided by mean read-count) of all the windows in a segment. Z-scores were calculated by subtracting mean sum of GC-corrected read-count ratios of individuals without cancer (10 for men and 9 for women) of same sex and dividing by their standard-deviation.
Calculation of z-scores for specific regions
In order to check for the copy-number status of genes previously implicated in prostate-cancer initiation or progression we applied z-score statistics for each region focusing on specific targets (mainly genes) of variable length within the genome. At first we counted high-quality alignments against the PAR-masked hg19 genome within genes for each sample and normalized by expected read counts.
Here expected reads are calculated as
Then we subtracted the mean ratio of a group of controls and divided it by the standard deviation of that group.
Calculation of genome-wide z-scores
In order to establish a genome-wide z-score to detect aberrant genomic content in plasma, we divided the genome into equally-sized regions of 1 Mbp length and calculated z-scores therein.
Under the condition that all ratios were drawn from the same normal distribution, z-scores are distributed proportionally to Student's t-distribution with n-1 degrees of freedom. For controls, z-scores were calculated using cross-validation. In brief, z-score calculation of one control is based on means and standard deviation of the remaining controls. This prevents controls from serving as their own controls.
The variance of these cross-validated z-scores of controls is slightly higher than the variance of z-scores of tumor patients. Thus ROC performance is underestimated. This was confirmed in the simulation experiment described below.
In order to summarize the information about high or low z-score that was observed in many tumor patients squared z-scores were summed up.
Genome-wide z-scores were calculated from S-scores. Other methods of aggregation of z-score information, such as sums of absolute values or PA scores , performed poorer and were therefore not considered. Per window z-scores were clustered hierarchically by the hclust function of R using Manhattan distance that summed up the distance of each window.
In order to validate the diagnostic performance of the genome-wide z-score in silico, artificial cases and controls were simulated from mean and standard deviations of ratios from 10 healthy controls according to a normal distribution. Simulated tumor cases were obtained through multiplication of the mean by the empirical copy number ratio of 204 prostate cancer cases . Segmented DNA-copy-number data were obtained via the cBio Cancer Genomics Portal .
To test the specificity of our approach at varying tumor DNA content, we performed in-silico
dilutions of simulated tumor data. To this end we decreased the tumor signal using the formula below, where λ is the ratio of tumor DNA to normal DNA:
We performed ROC analyses of 500 simulated controls and 102 published prostate tumor data and their respective dilutions using the pROC R-package . The prostate tumor data were derived from a previously published dataset  and the 102 cases were selected based on their copy number profiles.
Gene-Breakpoint Panel: target enrichment of cancer genes, alignment and SNP-calling, SNP-calling results
We enriched 1.3 Mbp of seven plasma DNAs (four CRPC cases, CRPC1-3 and CRPC5; three CSPC cases, CSPC1-2 and CSPC4) including exonic sequences of 55 cancer genes and 38 introns of 18 genes, where fusion breakpoints have been described using Sure Select Custom DNA Kit (Agilent, Santa Clara, CA, USA) following the manufacturer's recommendations. Since we had very low amounts of input DNA we increased the number of cycles in the enrichment PCR to 20. Six libraries were pooled equimolarily and sequenced on an Illumina MiSeq (Illumina, San Diego, CA, USA).
We generated a mean of 7.78 million reads (range, 3.62-14.96 million), 150 bp paired-end reads on an Illumina MiSeq (Illumina, San Diego, CA, USA). Sequences were aligned using BWA  and duplicates were marked using picard . We subsequently performed realigning around known indels and applied the Unified Genotyper SNP-calling software provided by the GATK .
We further annotated resulting SNPs by employing annovar  and reduced the SNP call set by removing synonymous variants, variants in segmental duplications and variants listed in the 1000 Genome Project  and Exome sequencing (Project Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, WA)  with allele frequency >0.01.
We set very stringent criteria to reduce false positives according to previously published values : a mutation had to be absent from the constitutional DNA sequencing and the sequencing depth for the particular nucleotide position had to be >20-fold. Furthermore, all putative mutations or breakpoint spanning regions were verified by Sanger sequencing.
Since plasma DNA is fragmented the read pair method is not suitable for identification of structural rearrangements  and therefore we performed split-read analysis of 150 bp reads. We used the first and the last 60 bp of each read (leaving a gap of 30 bp) and mapped these independently. We further analyzed discordantly mapped split-reads by focusing on targeted regions and filtering out split-reads mapping within repetitive regions and alignments having a low mapping quality (<25). Reads where discordantly mapped reads were found were aligned to the human genome using BLAT  to further specify putative breakpoints.
All sequencing raw data were deposited at the European Genome-phenome Archive (EGA) , which is hosted by the EBI, under accession numbers EGAS00001000451 (Plasma-Seq) and EGAS00001000453 (Gene-Breakpoint Panel).