Hi everybody!
I am trying to assign taxonomy to 18S rRNA metatranscriptomic reads that were originally extracted using SortMeRNA. I have created the following pipeline, and am just generally looking for input on whether or not this is the best approach. A concern I have, is that when I run through this it shows I have nearly the same amount of fungal reads in each sample (which it not expected at all), so this is originally what made me believe something is off. Please let me know, any input helps!
Step 1: Import the paired-end sequencing data in CasavaOneEightSingleLanePerSampleDirFmt format (example of format listed below)
sample1_S1_L001_R1_001.fastq.gz # Forward reads
sample1_S1_L001_R2_001.fastq.gz # Reverse reads
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path PE_seqs
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path PE.qza
Step 2: Generate a summary visualization of the demultiplexed sequences to assess data quality.
qiime demux summarize
--i-data PE.qza
--o-visualization PE.qzv
Step 3: Join paired-end reads to obtain complete sequences.
qiime vsearch merge-pairs
--i-demultiplexed-seqs PE.qza
--o-merged-sequences PE-joined.qza
Step 4: Dereplicate the joined sequences to obtain a table of unique sequences and their representative sequences.
qiime vsearch dereplicate-sequences
--i-sequences PE-joined.qza
--o-dereplicated-table PE-table.qza
--o-dereplicated-sequences PE-rep-seqs.qza
Step 5: Export the feature table in BIOM format and convert it to a tab-separated format.
qiime tools export
--input-path PE-table.qza
--output-path ASVTABLE_export
biom convert
-i ASVTABLE_export/feature-table.biom
-o ASVTABLE_export/otu_table.tsv
--to-tsv
Step 6: Remove the header from the exported table and replace a specific string (#OTU ID).
cd ASVTABLE_export
sed -i '1d' otu_table.tsv
sed -i 's/#OTU ID//' otu_table.tsv
cd ..
Step 7: Import the reference taxonomy data and reference sequences that will be used for downstream processing
Here, we are using the PR2 databased (most recent version as of 7/20/2023, https://pr2-database.org/). THIS DATABASE ALSO INCLUDES SOME BACTERIA AND PLASTIDS -> they will be removed during later processing points but it is important to keep this in mind as if a different database is used, step 16 will not apply.
qiime tools import --type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path pr2_version_5.0.0_SSU_mothur.tax
--output-path PR2-ref-tax.qza
qiime tools import --type 'FeatureData[Sequence]'
--input-path pr2_version_5.0.0_SSU_mothur.fasta
--output-path PR2-ref-seqs.qza
Step 8: Perform closed-reference clustering of the features against the reference database with a sequence identity threshold of 100%.
qiime vsearch cluster-features-closed-reference
--i-table PE-table.qza
--i-sequences PE-rep-seqs.qza
--i-reference-sequences PR2-ref-seqs.qza
--p-perc-identity 1
--p-threads 4
--o-clustered-table closed-ref-table.qza
--o-clustered-sequences closed-ref-seqs.qza
--o-unmatched-sequences closed-ref-unmatched-seqs.qza
Step 9: Generate a summary visualization of the closed-reference clustered feature table.
qiime feature-table summarize
--i-table closed-ref-table.qza
--o-visualization table.qzv
--m-sample-metadata-file sample_metadata.tsv
Step 10: Generate a visualization of the closed-reference clustered representative sequences.
qiime feature-table tabulate-seqs
--i-data closed-ref-seqs.qza
--o-visualization rep-seqs.qzv
Step 11: Filter samples from the feature table based on a minimum frequency of 1200.
qiime feature-table filter-samples
--i-table closed-ref-table.qza
--p-min-frequency 15000
--o-filtered-table filtered-table.qza
Step 12: Filter features (sequences) from the feature table based on a minimum frequency of 10.
qiime feature-table filter-features
--i-table filtered-table.qza
--p-min-frequency 10
--o-filtered-table feature-frequency-filtered-table.qza
Step 13: Classify the closed-reference clustered sequences using VSEARCH and the PR2 reference database with a sequence identity threshold of 99%.
qiime feature-classifier classify-consensus-vsearch
--i-query closed-ref-seqs.qza
--i-reference-reads PR2-ref-seqs.qza
--i-reference-taxonomy PR2-ref-tax.qza
--p-perc-identity
--o-classification taxonomy1.qza
--o-search-results search.results.qza
--p-threads 4