Bioinformatics and Deliverables
RNA-Seq analysis pipeline
CTG RNA-Seq analysis pipeline is divided into five main tasks:
- Demultiplexing: organizing the FASTQ files based on the sample index information, and generating the statistics and reporting files. This task will be performed using the bcl2fastq2 software with default settings.
- Quality Control (QC): FastQC provides quality control analyses and checks on raw sequence data coming from the above Demultiplexing step of the pipeline. For each analysis it provides a graphical plot and a suggestive warning if the data has any problems.
- Read Mapping: alignment of reads to a specified reference genome. This task will be performed using the HISAT2 software, and the reference genome sequence from the researcher requested database or by default from the Ensemble database.
- Picard QC: quality control checks such as a) base distribution by cycle, b) insert size, c) quality by cycle, and d) quality by distribution on above aligned data results. This task will be performed using Picard tool.
- Expression counts: assembly of the alignments into full transcripts and quantification of the expression levels of each gene and transcript. This task will be performed using the StringTie software.
Project delivery report
Summarized information about the experimental setup, methods, and corresponding references and links (20XX-XX_Project_Delivery_Report.pdf).
“DE_MUL_PLEX_project_number”. This folder contains subfolders with names as Sample ID, provided in the sample sheet. These subfolders contain .fastq files with the file names as Sample Name, provided in the sample sheet.
Quality Control results
“FASTQC_project_number”. This folder contains a FastQC summary report .html for each Read and a “.zip” file containing results from FastQC analysis.
Read Mapping results
“HISAT2_project_number”. This folder contains subfolders with names as Sample ID, provided in the sample sheet, which contains genome reference alignment result files.
Picard QC results
“PICARD_QC_project_number”. This folder contains genome reference alignment QC results.
QC metrices in text format:
- alignment summary metrics
- base distribution by cycle metrices
- insert size metrices
- quality by cycle metrices
- quality distribution metrices
QC figures in pdf format:
- base distribution by cycle
- insert size histogram
- quality by cycle
- quality distribution
Expression counts results
“StringTie_project_number”. This folder contains subfolders with names as Sample ID, provided in the sample sheet, which contains Read counts for exons, introns, transcripts, and genes.
- SampleID.tsv file contains gene abundances information in a tab limited format.
- t_data.ctab file contains transcript abundances information in a tab limited format.
- e_data.ctab file contains exon Read counts information in a tab limited format.
- i_data.ctab file contains intron Read counts information in a tab limited format.
- e2t.ctab file contains mapping information for exon index to transcript index.
- i2t.ctab file contains mapping information for intron index to transcript index.
- SampleID.gtf file contains a fully covered transcripts matching the reference annotation transcripts.
Visit this StringTie website link for more information about the files and filetypes.
Data management and analysis
CTG uses LUNARC (Center for Scientific and Technical Computing at Lund University) for data management and analysis.
Your project data will be delivered through either of the following medium:
- On a hard disk, encrypted and protected with a password. Please visit this site https://www.veracrypt.fr/en/Home.html for more information see.
- Your UPPMAX account, if you do not have one please visit UPPMAX site https://www.uppmax.uu.se/support/getting-started/ for more details.
The project data will be stored for 6 months.
For additional Bioinformatics services please contact NBIS – National Bioinformatics Infrastructure Sweden, (https://nbis.se/).
- In a hard disk, encrypted and protected with a password. Please visit this site https://www.veracrypt.fr/en/Home.html for more information.