low proportion of identified ITS - OTU features

Hello guys again,

after previous strugling with DADA2-ITS extraction for quite some time (ITS extraction error - Missing sequence for record beginning on line 17 - #3 by Pardal_Oblackovy) I alternatively "moved back" and tryed focus on OTU based workflow (using UNITE classifier) and unfortunately the proportion of OTU features identified up to the species level i realy weak (max a few % in my best samples, majority of OTUs wasnt identified at all. I have played with stringency of my taxonomy identifiaction command, but without any significant positive results therefore i hypotetize i made some mistake in the DADA reads merging step. See my workflow ans some supporting files below, please. Thank you very much for any advices.

My data: doubleindexed Illumina paired-end reads of ITS 2 community metabarcoding in fastq.

multiqc_report.html - Google Drive input data MultiQC report

taxonomy-99%ident_classifier.tsv - Google Drive taxonomy output in table

and in .qzv UP-01H_taxa-bar-plots-99class.qzv - Google Drive

OTU table: feature-table.tsv - Google Drive

1. Create output directory for FastQC results

mkdir -p ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/fastqc_results

2. Run FastQC on the raw data (forward and reverse)

fastqc /home/svecka/FRESH/FRESH_ITS/UP-01H/UP-01H_raw_data/forward.fastq.gz
/home/svecka/FRESH/FRESH_ITS/UP-01H/UP-01H_raw_data/reverse.fastq.gz
-o ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/fastqc_results

3. Generate MultiQC report

multiqc ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/fastqc_results
-o ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/multiqc_report

4. Import the raw data into Qiime2

note: manualni oprava output path oproti vygenerovane od chatbota

prekontrolovat pathways v predchozim kodu

qiime tools import
--type 'MultiplexedPairedEndBarcodeInSequence'
--input-path /home/svecka/FRESH/FRESH_ITS/UP-01H/UP-01H_raw_data/
--output-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_multiplexed.qza

5. Demultiplex the sequences with cutadapt

qiime cutadapt demux-paired
--i-seqs ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_multiplexed.qza
--m-forward-barcodes-file ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_mapping_file.txt
--m-forward-barcodes-column ForwardBarcodeSequence
--m-reverse-barcodes-file ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_mapping_file.txt
--m-reverse-barcodes-column ReverseBarcodeSequence
--p-error-rate 0.4
--p-cores 64
--o-per-sample-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_demultiplexed.qza
--o-untrimmed-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_untrimmed-seqs.qza
--verbose

6. Trim the technical sequences with cutadapt (barcodes, linkers, indexes...)

qiime cutadapt trim-paired
--i-demultiplexed-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_demultiplexed.qza
--p-cores 64
--p-front-f AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-front-r AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-adapter-f AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-adapter-r AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-anywhere-f AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-anywhere-r AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--o-trimmed-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_trimmed_tech_seq.qza

7. Denoise sequences using DADA2

qiime dada2 denoise-paired
--i-demultiplexed-seqs ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_trimmed_tech_seq.qza
--p-trim-left-f 10
--p-trim-left-r 10
--p-trunc-len-f 300
--p-trunc-len-r 300
--p-n-threads 16
--o-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_table.qza
--o-representative-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_rep-seqs.qza
--o-denoising-stats ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_stats.qza

8. Cluster the features at 97% identity using VSEARCH

qiime vsearch cluster-features-de-novo
--i-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_table.qza
--i-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_rep-seqs.qza
--p-perc-identity 0.97
--p-threads 64
--o-clustered-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-table.qza
--o-clustered-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-seqs.qza

9. Filter sequences to remove those corresponding to singletons (those with frequency < 2)

qiime feature-table filter-seqs
--i-data ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-seqs.qza
--i-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-table.qza
--o-filtered-data ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-seqs.qza

10. Filter the feature table to remove singletons (those with frequency < 2)

qiime feature-table filter-features
--i-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-table.qza
--p-min-frequency 2
--o-filtered-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-table.qza

11. Classify taxonomy using a pre-trained classifier

qiime feature-classifier classify-sklearn
--i-classifier ~/Qiime2_classifier/ITS_classifier/classifier-99.qza
--i-reads ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-seqs.qza
--o-classification ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_taxonomy.qza
--p-n-jobs 16

12. Export the feature table

qiime tools export
--input-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-table.qza
--output-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table

13. Export the taxonomy file

qiime tools export
--input-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_taxonomy.qza
--output-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-taxonomy

14. Convert the feature table from .biom format to .tsv format

biom convert
-i ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table.biom
-o ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table.tsv
--to-tsv

15. Round counts to whole numbers (remove decimal points) in the exported feature table

awk 'BEGIN {OFS="\t"} {for(i=2; i<=NF; i++) $i=sprintf("%.0f", $i); print}'
~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table.tsv \

~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table-rounded.tsv

Not my greatest area of expertise but you could try copy and pasting the sequences (from your seqs file output from dada2) into BLAST for a sanity check to see if your pipeline is genuinely just not identifying anything or if you something is wrong and you indeed have easily identifiable fungal sequences?

Should be able to paste the sequences by making a qzv file of your seqs.qza file from dada2

2 Likes

Dear @Sam_Degregori ,

thank you for you help <3 PLS find enclosed sequences (raw fastqs R1 and R2, multiplexed and demultiplexed). I would be very grateful for your help...

https://filesender.cesnet.cz/?s=download&token=d25c76dc-5d47-418a-833f-2a87f96752f0

Do you think that my workflow itself its fine?

Thank you very much!

Karel