Thank you for your suggestions!
I still found the (exact bases of the) forward primer and the complement of the reverse primer 2 times (in the same 2 sequences), and the reverse primer and the complement of the forward primer 1 time (the same sequence) in my rep-seqs. However, using ITSxpress, no primers or complement were found!
I did ITSxpress and further analyses on 5 samples, and compared it to cutadapt with --p-anywhere. The ITSxpress followed by dada2 resulted in 13 features and a total frequency of 81,583 compared to 688 and 143,645 from the cutadapt-method. The taxonomy from ITSxpress were for most cases classified at least to genus-level. Cutadapt yielded a lot of Fungi at Kingdom level (almost none from ITSxpress), and additionally gave some fungal classes not appearing in the ITSxpress method. I am somewhat confused – why is it such difference in feature and frequence number? And should I be able to reveal more from the ITSxpress method?
We actually are using degenerate primers. But that won´t change anything, I guess? Increasing the error rate to 0.5 resulted in fewer of these “almost-primers”, but some are still there. ITSxpress did not give “almost-primers”.
Regarding ITSxpress – using the following on 6 samples (the 5 from above + 1 more):
qiime itsxpress trim-pair-output-unmerged \ --i-per-sample-sequences import.qza \ --p-region ITS1 \ --p-taxa F \ --p-cluster-id 1.0 \ --p-threads 2 \ --o-trimmed trimmed_data_itsxpress.qza
I get the following error:
Filename_R2.fastq.gz is not a(n) FastqGzFormat file
Missing sequence for record beginning on line 17
I have read this topic:
suggesting using the cutadapt to remove 0-length reads, but I don´t understand how to find the read of interest. Any suggestion?