Page 54 - scheppingen
P. 54
three
52
cessed by size exclusion gel electrophoresis subsequent to sequencing adaptor ligation. After gel excision and digestion, sequences were amplified by PCR. Clustering and DNA sequencing was performed using the Illumina cBot and HiSeq 2500. Each library was subjected to paired-end sequencing, producing reads 125 nucleotides in length.
Bioinformatics analysis of RNA-seq data
Read quality was assessed using FastQC v0.11.2 (Babraham Institute, Babraham, Cambridgeshire, UK), and Trimmomatic v0.33 was used to trim and filter reads of low quality25. Low quality leading and trailing bases were removed from each read, a sliding window trimming using a window of 4 and a phred33 score threshold of 20 was used to assess the quality of the body of the read. Reads that dropped below 36 and 18 nucleo- tides, respectively in our mRNA and small RNA datasets, as well as sequence reads lacking both forward and reverse orientations were excluded from further analysis.
Next, paired-end reads were aligned to the human reference genome (GRCh38) with TopHat2 v2.0.13 using the default settings26. No mismatches between the small RNA trimmed reads and reference genome were allowed. The aligned mRNA reads were assembled into individual transcripts and the abundance of each transcript was esti- mated using Cufflinks v2.2.127. For mRNA sequence data, expression level was calculated as fragments per kilobase of exon per million fragments mapped (FPKM). The Cufflinks transcript assembly was guided using the reference annotation file Gencode v2128. The transcript assemblies from each sample were then assembled into a single unified tran- script catalog using Cuffmerge27. Finally, the merged transcript file along with the original alignment files produced from Tophat were analyzed using Cuffdiff. Libraries were quan- tile-normalized and differential expression analysis was performed considering genes with FPKM>1 in at least one of the sample groups. Small RNA count data were normalized (per 1x106 reads) and differential expression analysis between TSC and control patients was done by means of the limma method (version 3.14.4)29. Throughout Benjamini- Hochberg (BH)30 multiple comparison adjusted probabilities (P < 0.05) defined signifi- cance. Ingenuity Pathway Analysis (Ingenuity Systems IPA, www.ingenuity.com) was used to identify the associating canonical signaling pathways stratifying genes by over- and under-expressed patterns. The Ingenuity gene knowledgebase was selected as refer- ence and human species specified. All other parameters were default. Fisher’s exact test BH-adjusted probabilities (P<0.05) defined significance.
Integrative bioinformatics
The mRNA sequence transcriptome was analyzed by means of a weighted gene co-ex- pression network approach31, 32. A pair-wise Spearman’s correlation matrix of the top 10000 most variable unique genes (ranked by median absolute deviation) was trans- formed into an adjacency matrix by using a soft power function to ensure scale-free topology33. The adjacency matrix was further transformed into a topological overlap matrix to enable the identification of modules (clusters) of highly correlating genes by implementing a dynamic tree cut algorithm34, 35. Thus, each module representing a cluster of co-regulated genes with a distinct expression pattern from other identified modules. In order to define module “driver” genes we made use of the module eigengene con- cept, defined as the first principal component of the module expression matrix, and,