Hi!
I am trying to analyze the fungal amplicons, but I have troubles to find an optimal way to do it.
The primers used - ITS86F/ITS4
MiSeq 2x300
(qiime2-2018.11, conda installation)
In my pipeline I import the sequences, then cut the primers ensuring that for the shorter amplicons the non-biological ends are cut off. Then I have tried either using ITSxpress, as well as without, then merging the files with the trunc-len 0, but adding the --p-trunc-q 20 (or without). With each variation I get different number of reads kept in different stages (denoising/ merging).
In the data there can be sequences that are not merging as the amplicons were too long so there's no overlap. I have yet not found any example in my data, but theoretically this can be the case.
First question : Is there any possibility to get from dada2 denoise paired as an output as well files with reads that didn't merge (so that for the longer amplicons only forward read could be used)?
When I compare the percentage of reads kept after denoising (see attached file), it looks like some samples loose many sequences when denoising, others only when merging.
Test_ITSX_cut_denoise.tsv (2.5 KB)
Sample LB145 gets better merged when the --p-trunc-q 20 is not included, which makes me think that maybe there are many longer amplicons and with the quality trimming they loose the necessary overlap, but when I look at the taxonomy file from the only forward file (where most reads are kept), then the major taxa is Wallemia muriae, which has rather short ITS2 and should have a good overlap.
https://www.dropbox.com/s/1pg4sfhudszercv/LB145.qza?dl=0
I was hoping that someone with more experience could have a look and help to choose the best strategy or suggest more parameters or techniques to try.
Thank you in advance!!!
qiime cutadapt trim-paired
--i-demultiplexed-sequences seq_test.qza
--p-adapter-f CATATCAATAAGCGGAGGA
--p-adapter-r TCAAAGATTCGATGATTCAC
--p-cores 40
--verbose
--o-trimmed-sequences cut_seq_test.qza
cut_seq_test.qzv (297.3 KB)
qiime dada2 denoise-single
--i-demultiplexed-seqs cut_seq_test.qza
--p-trunc-len 0
--p-n-threads 40
--verbose
--output-dir dada2out_cut_Forw
cd dada2out_cut_Forw
qiime metadata tabulate
--m-input-file denoising_stats.qza
--o-visualization denoising_stats.qzv
qiime dada2 denoise-paired
--i-demultiplexed-seqs cut_seq_test.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 40
--verbose
--output-dir dada2out_cut
cd dada2out_cut
qiime metadata tabulate
--m-input-file denoising_stats.qza
--o-visualization denoising_stats.qzv
qiime dada2 denoise-paired
--i-demultiplexed-seqs cut_seq_test.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-trunc-q 20
--p-n-threads 40
--verbose
--output-dir dada2out_cut_Q20
cd dada2out_cut_Q20
qiime metadata tabulate
--m-input-file denoising_stats.qza
--o-visualization denoising_stats.qzv
qiime itsxpress trim-pair-output-unmerged
--i-per-sample-sequences cut_seq_test.qza
--p-region ITS2
--p-taxa F
--p-threads 50
--verbose
--o-trimmed ITSX_seq_test.qza
qiime dada2 denoise-paired
--i-demultiplexed-seqs ITSX_seq_test.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 40
--verbose
--output-dir dada2out_ITSX
cd dada2out_ITSX
qiime metadata tabulate
--m-input-file denoising_stats.qza
--o-visualization denoising_stats.qzv
qiime dada2 denoise-paired
--i-demultiplexed-seqs ITSX_seq_test.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-trunc-q 20
--p-n-threads 40
--verbose
--output-dir dada2out_ITSX_Q20
cd dada2out_ITSX_Q20
qiime metadata tabulate
--m-input-file denoising_stats.qza
--o-visualization denoising_stats.qzv