Pipeline Overview
Raw RNA-seq reads were downloaded from the Sequence Read Archive (SRA). Quality control and removal of overrepresented sequences and sequencing adapters were performed with FastQC and Trim Galore! Rcorrector was used to remove and correct sequencing errors in the reads before performing de novo transcriptome assembly with Trinity. Identification of open reading frames and protein translation was carried out with TransDecoder. Overrepresented protein sequences were removed with an in-house Perl script filter_redundancy.pl before using BUSCO to assess proteome completeness. Prediction of functional protein domains were completed with InterProScan. The in-house Perl script parse_interproscan.pl can be utilized to retrieve proteins with a domain of interest by species.
To further examine pipeline details, mouse over and click a program box below to view software descriptions, script usage examples, and additional user recommendations.