Hello guys again,
after previous strugling with DADA2-ITS extraction for quite some time (ITS extraction error - Missing sequence for record beginning on line 17 - #3 by Pardal_Oblackovy) I alternatively "moved back" and tryed focus on OTU based workflow (using UNITE classifier) and unfortunately the proportion of OTU features identified up to the species level i realy weak (max a few % in my best samples, majority of OTUs wasnt identified at all. I have played with stringency of my taxonomy identifiaction command, but without any significant positive results therefore i hypotetize i made some mistake in the DADA reads merging step. See my workflow ans some supporting files below, please. Thank you very much for any advices.
My data: doubleindexed Illumina paired-end reads of ITS 2 community metabarcoding in fastq.
multiqc_report.html - Google Drive input data MultiQC report
taxonomy-99%ident_classifier.tsv - Google Drive taxonomy output in table
and in .qzv UP-01H_taxa-bar-plots-99class.qzv - Google Drive
OTU table: feature-table.tsv - Google Drive
1. Create output directory for FastQC results
mkdir -p ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/fastqc_results
2. Run FastQC on the raw data (forward and reverse)
fastqc /home/svecka/FRESH/FRESH_ITS/UP-01H/UP-01H_raw_data/forward.fastq.gz
/home/svecka/FRESH/FRESH_ITS/UP-01H/UP-01H_raw_data/reverse.fastq.gz
-o ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/fastqc_results
3. Generate MultiQC report
multiqc ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/fastqc_results
-o ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_test_Redemux_250312/QC/multiqc_report
4. Import the raw data into Qiime2
note: manualni oprava output path oproti vygenerovane od chatbota
prekontrolovat pathways v predchozim kodu
qiime tools import
--type 'MultiplexedPairedEndBarcodeInSequence'
--input-path /home/svecka/FRESH/FRESH_ITS/UP-01H/UP-01H_raw_data/
--output-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_multiplexed.qza
5. Demultiplex the sequences with cutadapt
qiime cutadapt demux-paired
--i-seqs ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_fruity_multiplexed.qza
--m-forward-barcodes-file ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_mapping_file.txt
--m-forward-barcodes-column ForwardBarcodeSequence
--m-reverse-barcodes-file ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_mapping_file.txt
--m-reverse-barcodes-column ReverseBarcodeSequence
--p-error-rate 0.4
--p-cores 64
--o-per-sample-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_demultiplexed.qza
--o-untrimmed-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_untrimmed-seqs.qza
--verbose
6. Trim the technical sequences with cutadapt (barcodes, linkers, indexes...)
qiime cutadapt trim-paired
--i-demultiplexed-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_demultiplexed.qza
--p-cores 64
--p-front-f AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-front-r AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-adapter-f AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-adapter-r AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-anywhere-f AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--p-anywhere-r AGTACAAG AGTGGTCA AAGACGGA GATGAAGAACGYAGYRAA
--o-trimmed-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_trimmed_tech_seq.qza
7. Denoise sequences using DADA2
qiime dada2 denoise-paired
--i-demultiplexed-seqs ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_trimmed_tech_seq.qza
--p-trim-left-f 10
--p-trim-left-r 10
--p-trunc-len-f 300
--p-trunc-len-r 300
--p-n-threads 16
--o-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_table.qza
--o-representative-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_rep-seqs.qza
--o-denoising-stats ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_stats.qza
8. Cluster the features at 97% identity using VSEARCH
qiime vsearch cluster-features-de-novo
--i-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_table.qza
--i-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_dada2_rep-seqs.qza
--p-perc-identity 0.97
--p-threads 64
--o-clustered-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-table.qza
--o-clustered-sequences ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-seqs.qza
9. Filter sequences to remove those corresponding to singletons (those with frequency < 2)
qiime feature-table filter-seqs
--i-data ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-seqs.qza
--i-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-table.qza
--o-filtered-data ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-seqs.qza
10. Filter the feature table to remove singletons (those with frequency < 2)
qiime feature-table filter-features
--i-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_clustered-table.qza
--p-min-frequency 2
--o-filtered-table ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-table.qza
11. Classify taxonomy using a pre-trained classifier
qiime feature-classifier classify-sklearn
--i-classifier ~/Qiime2_classifier/ITS_classifier/classifier-99.qza
--i-reads ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-seqs.qza
--o-classification ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_taxonomy.qza
--p-n-jobs 16
12. Export the feature table
qiime tools export
--input-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_filtered-table.qza
--output-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table
13. Export the taxonomy file
qiime tools export
--input-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_taxonomy.qza
--output-path ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-taxonomy
14. Convert the feature table from .biom format to .tsv format
biom convert
-i ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table.biom
-o ~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table.tsv
--to-tsv
15. Round counts to whole numbers (remove decimal points) in the exported feature table
awk 'BEGIN {OFS="\t"} {for(i=2; i<=NF; i++) $i=sprintf("%.0f", $i); print}'
~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table.tsv \
~/FRESH/ITS_pipeline_test_250123/UP-01H_fruity_test_Redemux_250312/UP-01H_exported-feature-table/feature-table-rounded.tsv