Guinea pig smoking model
Sixteen male Hartley guinea pigs were divided into four groups: one group was exposed to CS for 3 months (n = 4); a second group was kept in normoxia for 10 weeks and subsequently placed in an hypoxic environment (12% O2) for 2 weeks (n = 4); a third group (n = 4) was CS-exposed for 3 months and to chronic hypoxia the last 2 smoking weeks; finally we included a fourth group (n = 4) of sham controls remaining in normoxia for the whole study period. To avoid problems related to ageing, young adults were used (8 weeks of age), and to avoid scaling effects body mass was similar (approximately 300 g/animal). All procedures involving animals and their care were approved by the Ethics Committee of the University of Barcelona and by the University of Valladolid Institutional Committee for Animal Care and Use, and were conducted following institutional guidelines that comply with national (Generalitat de Catalunya decree 214/1997, DOGC 2450) and international (Guide for the Care and Use of Laboratory Animals, National Institutes of Health, 85-23, 1985) laws and policies.
Whole lung as well as soleus and lateral gastrocnemius hindlimb muscles were isolated from each animal at the end of the study period. The soleus muscle and the lateral gastro were selected to represent oxidative and glycolytic muscles, respectively. It should be noted that although the gastrocnemius as a whole is a mixed muscle, the lateral portion is predominantly glycolytic.
Animals receiving CS were daily exposed to the smoke of four cigarettes (2R4F; Kentucky University Research; Lexington, KY, USA, 11 mg tar, 0.8 mg nicotine per cigarette), 5 days/week using a nose-only inhalation system (Protowerx Design Inc; Langley, BC, Canada). Sham-exposure to CS was done daily by placing control animals in the nose-only exposure chamber for the same duration (1 h) without cigarettes being lighted. In this experimental model, neither nutritional status (determined via measurements of plasma cholesterol, protein and lipids) nor whole-body weight gain at the end of the study period is significantly different between CS-exposed and sham animals . Moreover, no changes in the proportion of Type I and Type II fibres can be detected between CS-exposed and shams . Detailed information on exposure protocols, pulmonary function data and histological assessments from this study, which demonstrate that observations in lung function and pulmonary structural changes of COPD patients are indeed replicated in the CS-exposed guinea pigs, have been reported in two separate publications [15,19].
RNA isolation from guinea pig samples
Total RNA was extracted using the RNeasy Mini extraction kits (Qiagen, USA) according to the manufacturer’s recommendations. RNA purity and quality was evaluated using a NanoDrop (Thermo Scientific) and a BioAnalyzer 2100 instrument (Agilent Technologies), respectively. All samples had a RIN score >7.
Definition of the guinea pig transcriptome by mRNA sequencing and microarray design
In 2008 the guinea pig genome was sequenced to a depth of approximately 7X full coverage, and last updated in 2010. However, because of the lack of cDNA and protein resources the guinea pig genome is at present poorly annotated. Thus, in order to address this issue we performed an in-depth mRNA sequencing of the lung and skeletal muscles transcriptomes and used this to annotate the available guinea pig genome for transcribed sequences. We then used this information to design and validate the custom Agilent microarray platform used in this study.
Transcriptome sequencing was performed using Illumina sequencing. Briefly, NCBI and Ensembl transcripts of guinea pig were combined with transcripts constructed from Illumina paired end reads using the TopHat and Cufflinks algorithms [20,21]. Microarray probe sequences were then chosen based on the combined transcriptome assembly. The raw RNAseq data have been deposited at the Gene Expression Omnibus (GEO) under the reference number GSE56099. A full description of the procedure is provided in Additional file 1.
Guinea pig microarray gene expression profiling
One hundred nanograms of total RNA from each sample was amplified and converted into labelled cRNA using Agilent’s Low Input Quick Amp Labelling Kit according to the manufacturer’s recommendations. Cy3-labelled cRNA (600 ng/sample) was hybridised to our custom Cavia porcellus oligonucleotide microarray (manufactured by Agilent) in randomised sample order, which generated 61,657 measures per sample (18,073 annotated genes). Hybridisation, washing and scanning of arrays were performed according to manufacturer’s protocol. Three samples (one from each tissue) were lost during the process of generating raw data. All scanned microarrays passed all 11 of the Agilent’s quality metrics. Capture probes that were flagged (that is, did not pass Agilent’s ‘well above background’ condition) on at least 80% of the chips were removed prior to data analysis, such that only those capture probes with a raw signal greater than 99% of the background population signal, for at least 20% of the samples, were retained (29,333 probes were discarded).
Raw microarray data were then normalised against sham controls for each of the three tissues using loess in the ‘marray’  and ‘limma’  Bioconductor packages. Arrays were examined using hierarchical clustering and principal component analysis (PCA) to identify outliers prior to statistical analysis.
The statistical significance of differential expression of each gene was determined using the significance analysis of microarray (SAM) algorithm  with a false discovery rate (FDR) cutoff of 1%. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment in differentially expressed genes was examined using the web-based tool DAVID . Disease KEGG pathways were excluded from the analysis to maximise biological interpretability. Therefore the analysis was restricted to KEGG group 1-4 (Metabolism, Genetic Information Processing, Environmental Information Processing and Cellular Processes). The microarray data have been deposited in GEO under accession number GSE56099.
RT-qPCR validation of custom guinea pig array
Reverse transcription of 1 ug of isolated total RNA from whole-lung tissue (same RNA as were used for the microarray part) was performed using the Tetro cDNA synthesis kit (Bioline) with random hexamer primers following the manufacturer’s instructions. The resulting cDNA was diluted 10-fold and 2.5 uL of this was used to perform qPCR in triplicate (25 uL reaction mixture volume) using the Maxima SYBR green (Thermo Scientific) and 300 nM of primers according the manufacturer’s instructions. To adjust for variations in the cDNA synthesis, each gene was normalised to that of 18S ribosomal RNA and beta-actin mRNA, respectively. All reactions were run in singleplex on a StepOnePlus Real Time System (Applied Biosystems) at 95°C for 10 min, followed by 40 cycles at 95°C for 15 s and 60°C for 1 min. Two-fold dilution series were performed for all primer pairs to verify the linearity of the assay. In addition, dissociation curve analysis was performed after each PCR to check for unspecific signals. Quantification was performed using the comparative cycle threshold (2-∆∆Ct) method.
The following primers were used: CXCL9 fwr: 5′-AGGCACCCCAGTAATGAG-3′; CXCL9 rev: 5′-TGATTTCTGTTTTCTCACACG-3′; CXCL10 fwr: 5′-TCTGAGTGGGACTCAAGGAATACC-3′; CXCL10 rev: 5′-TCCAGACATCTCTTCTCCCCATTC-3′; beta-actin fwr: 5′-GAGGCACCAGGGAGTCATG-3′; beta-actin rev: 5′-AAGGTGTGGTGCCAGATCTTCTC-3′; 18S rRNA fwr: 5′-GTACAGTGAAACTGCGAATGGCTC-3′; 18S rRNA rev: 5′-CCGTCGGCATGTATTAGCTCTAG-3′.
Human COPD clinical studies
In order to assess the clinical relevance of the findings in respect to the guinea pig dataset, we took advantage of a human microarray dataset we have previously published . This defined the baseline/resting transcriptional state of the vastus lateralis muscle in severe COPD patients with either a normal (n = 9) or low (n = 6) body mass index (BMI) and healthy controls matched for age and smoking history (n = 12). In addition, the low BMI COPD group also had a significantly lower fat free mass index (FFMI) (on average 16.7 kg/m2; Additional file 2), a clear surrogate for muscle wasting. All participants signed a written, informed consent approved by the Ethics Committee on Investigations Involving Human Subjects at the Hospital Clinic, Universitat de Barcelona, and the study was conducted in accordance with principles of the Declaration of Helsinki. Briefly, raw Affymetrix CEL files were RMA normalized following removal of probes that were termed ‘absent’ in more than 80% of the samples by the MAS5 algorithm inside the affy package (26,197 probes were discarded). Following probe summarization, a two-class unpaired SAM analysis was performed using the R package ‘samr’ comparing gene expression levels between COPD patients with a muscle wasting phenotype and matched controls. Enrichment of KEGG terms (group 1 to 4) in the resulting gene lists was assessed using DAVID. Enriched terms used to define the ‘true response’ in the cross-species overlap analysis were defined as having an EASE score P value <0.2. Further, detailed demographic data on the human subjects, including pulmonary measurements are provided in Additional file 2. The raw microarray CEL files are deposited under the reference number GSE27536.
In addition, we also analysed a public microarray dataset published by the Ronald Crystal lab  examining transcriptional changes in small airway epithelium from healthy non-smokers (n = 47), healthy smokers (n = 58) and smokers with COPD (n = 22), respectively (GSE19407). Due to a clear scan date batch issue (Additional file 3, Figure A), we focused our analysis on the data generated in years 2006 and 2007 (hence excluding the two samples scanned in 2005 as well as the 36 samples processed in 2008). As both human studies were conducted on the Affymetrix U133 + 2 platform, the data analysis strategy of the raw CEL files representing the pulmonary data was identical to that of the human dataset in skeletal muscle presented in this paper (see above) (see Additional file 4 for the list of regulated transcripts with fold-changes as well as functional enrichment analysis).
Summarising the molecular state of skeletal muscle using indices of pathway transcriptional activity
In order to reduce the complexity of the genome-wide transcriptional state of guinea pig skeletal muscles, thereby increasing statistical power, we computed indices of the overall pathway transcriptional activity [28,29]. For each of the two guinea pig hindlimb muscles, we first mapped the thousands of individual gene expression measures onto KEGG pathways using DAVID . We then summarised the transcriptional activity for the enriched pathways (FDR <10%) by computing the first three principal components (PCs), a procedure that allowed us to retain between 50% and 78% of the total variance (63% on average). Computation of the PCs was performed using the ‘prcomp’ function within the statistical programming environment R (script available on request).
Inference of biological networks linking lung and skeletal muscles in guinea pigs
An exhaustive list of genes annotated to the cytokine superfamily (n = 72) was compiled from the SABiosciences PCR Array Web Portal  (see Additional file 5, worksheet 1 for the complete list). Such an approach has been used previously for compiling gene-lists . Among these candidates we identified 33 genes coding for cytokines, which were differentially expressed in guinea pig lung tissue (Additional file 5, worksheet 2). These were selected for further analysis. Correlation between the expression of these cytokines and skeletal muscle pathway indexes were computed using the Spearman correlation coefficient, which allows the identification of linear and non-linear monotone relationships . Resampling of samples (10,000 permutations) was conducted to obtain P values for each correlation coefficient. Pair-wise associations within the regulatory network were defined as statistically significant when P <0.01.
The resulting sparse network was visualised using a force-directed layout as implemented in the network visualization tool Cytoscape v2.8 .
Creating and visualising a KEGG pathway map
To visually represent the relationship between enriched KEGG pathways in the clinical dataset, we computed a pathway similarity matrix based on the Jaccards Index of overlap. This matrix was used as input to a hierarchical clustering procedure (average linkage).
Measurement of inflammatory mediators in human serum from COPD patients and healthy controls and validation of the guinea pig lung-muscle cross-talk network
Previously published multiplex protein profiling data from COPD serum samples (n = 26) and healthy controls (n = 23) were included . Briefly, data were log2 transformed followed by imputation of missing values using K nearest neighbours in the R package ‘impute’ . Finally, the full data matrix was Z-scored.
A Mack-Skillings test with two factors (disease and training) was used to identify overall main effects across groups in serum protein levels (P <0.05). A Gene Set Enrichment Analysis (GSEA) was used to establish statistical functional enrichment by ranking all Pearson correlation coefficients between serum protein levels and global muscle mRNA expression .