Dear QIIME developers! First, thank you for the amazing job you are doing here at the forum.
Background: I am a newly educated medical doctor that started on my PhD project mid September. Bioinformatics and QIIME are completely new to me, but hopefully my questions aren´t too basic. My data consists of human samples, and we aim to examine fungal DNA. We have amplified the ITS-region using ITS1-30F/ITS1-217R as primers, which sequence is GTCCCTGCCCTTTGTACACA and TTTCGCTGCGTTCTTCATCG. Illumina MiSeq was used for 250-bp paired-end sequencing, and reads were already demultiplexed when I received them. Current analyses are based on a subset of our samples (performed on 6 samples).
First issue: We want to use dada2
, and it seems essential to trim our primers before entering dada2
. We know that our primers consist of 20 base pairs, so why can´t we just use --p-trim-left-f 20 \ --p-trim-left-r 20
in qiime dada2 denoise-paired
?
Second issue: We are also concerned about the “read-through” issue (the complement of reverse primer showing up in forward read if it reaches the reverse primer, i.e. with short ITS amplicons), so we wanted to use qiime cutadapt
to remove both primers and both complements. Our suggestion:
qiime cutadapt trim-paired \ --i-demultiplexed-sequences import.qza \ --p-adapter-f CGATGAAGAACGCAGCGAAA \ --p-front-f GTCCCTGCCCTTTGTACACA \ --p-adapter-r TGTGTACAAAGGGCAGGGAC \ --p-front-r TTTCGCTGCGTTCTTCATCG \ --o-trimmed-sequences trimmed_paired_end_cutadapt.qza
Searching for primer sequences with its complements in the fastq-files using BBEdit give several hits, and most are removed by the cutadapt
. Still, 1 or 2 hits appear in the representative sequences, even though I tried every combination of ^ and $. Do you have any explanation why? Or some advice? We are also aware of the Trimmomatic tool. Do you have any experience with that, and is it better to solve the current task?
Third issue: Our representative sequences often start with base pairs pretty similiar to our primers, e.g. 15 of 20 bases exactly the same, and then some minor differences. The same applies for its complement. Is this some kind of sequencing error? Is it possible to fix this using error rate in cutadapt
, or is it any other method to remove the bases (if it should be removed)? Do cutadapt
accept any form of variation in primers, and if so, how do I implement it?
Thanks!