Trimming fungal demultiplexed paired-end sequences before dada2

einamart · October 31, 2018, 2:27pm

Thank you for your suggestions!

I still found the (exact bases of the) forward primer and the complement of the reverse primer 2 times (in the same 2 sequences), and the reverse primer and the complement of the forward primer 1 time (the same sequence) in my rep-seqs. However, using ITSxpress, no primers or complement were found!

I did ITSxpress and further analyses on 5 samples, and compared it to cutadapt with --p-anywhere. The ITSxpress followed by dada2 resulted in 13 features and a total frequency of 81,583 compared to 688 and 143,645 from the cutadapt-method. The taxonomy from ITSxpress were for most cases classified at least to genus-level. Cutadapt yielded a lot of Fungi at Kingdom level (almost none from ITSxpress), and additionally gave some fungal classes not appearing in the ITSxpress method. I am somewhat confused – why is it such difference in feature and frequence number? And should I be able to reveal more from the ITSxpress method?

We actually are using degenerate primers. But that won´t change anything, I guess? Increasing the error rate to 0.5 resulted in fewer of these “almost-primers”, but some are still there. ITSxpress did not give “almost-primers”.

Regarding ITSxpress – using the following on 6 samples (the 5 from above + 1 more):

qiime itsxpress trim-pair-output-unmerged \ --i-per-sample-sequences import.qza \ --p-region ITS1 \ --p-taxa F \ --p-cluster-id 1.0 \ --p-threads 2 \ --o-trimmed trimmed_data_itsxpress.qza

I get the following error:

Filename_R2.fastq.gz is not a(n) FastqGzFormat file
Missing sequence for record beginning on line 17

I have read this topic:

suggesting using the cutadapt to remove 0-length reads, but I don´t understand how to find the read of interest. Any suggestion?