Hello, I am a new user of QIIME2 as well the forums, and I apologize if I am posting this in the wrong section.
I am analyzing fungal ITS1 data that were amplified with the BITS/B58S3 primers. I included a mock community from Bakker et. al 2018 (staggered B mixture). Sequences were acquired on an Illumina 250x2 run.
The fastq files were already demultiplexed and the primers/adapters already trimmed off. Before importing into qiime2 I ran the sequences through DADA2 pipeline mainly following the dada2 ITS processing tutorial which uses cutadapt to remove primers and read-throughs.
All sanity checks were cleared however when I assigned taxonomy my mock community looks nothing like expected, the most abundant taxa is not identified at all. I trained my own classifier using the most recent UNITE database release (dynamic developer files).
There are 16720 reads in the mock community and my negatives have around 50-100 reads so I do not think it failed to amplify. Also, the species in my mock do not appear in my negatives so it does not look like contamination (?).
The only thing that seemed odd in the analysis was that when I searched for primer hits before running cutadapt there were hits for only the: a) reverse complement of the forward reverse reads, and b) the reverse complement of the reverse forward reads. I thought this was due to read through.
However, while troubleshooting I ran a subset of my data through the same analysis and despite having only a fraction of the files, I got MORE primer hits before cutadapt.
Can anyone help me understand,
a) why my mock community results might so far from expected, and
b) Why I would see more hits for my primers in a smaller subset of data, and could this have anything to do with my mock community taxonomy issues?
I would be extremely grateful for any guidance.
taxa-bar-plots.qzv (484.4 KB) rep-seqs.qzv (284.4 KB)