I used these 2 commands:
cutadapt -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -a CTGCWGCCNCCCGTAGG -A GGATTAGATACCCBDGTAGTC -Q 0 --discard-untrimmed --pair-filter=any -m 50 -j 30 -o cleaned_4_20/test1_r1 -p cleaned_4_20/test1_r2 r1.fastq.gz r2.fastq.gz
cutadapt -a ^CCTACGGGNGGCWGCAG...GGATTAGATACCCBDGTAGTC -A ^GACTACHVGGGTATCTAATCC...CTGCWGCCNCCCGTAGG -Q 0 --discard-untrimmed --pair-filter=any -m 50 -j 30 -o cleaned_4_20/r1_trimmed_prova2.fastq.gz -p cleaned_4_20/r2_trimmed_prova2.fastq.gz r1.fastq.gz r2.fastq.gz
My consideration are:
Assuming that the read-through always happens, i think that the 2nd command is more correct, but this does not always happens; and for this reason, 1st command should also good. Am i missing something?
What would you use and why?
Also, is primer removal mandatory? I've read on DADA2 that it can create problems if ambiguous characters are there, but in my case there shouldn't be any ambiguous character in my reads.
Thank you for your time and suggestions.
Yes. It is not only about the presence of the ambiguous IUPAC bases, but also the fact that those sequences are still primer sequences that have been incorporated into the PCR / sequencing product. Thus, they are not actually the sequences from the target organism. Remember PCR primers are "leaky" and can still bind to and amplify despite a few mismatches, which are basically masked by the incorporated primer.
Also you normally do not need to add the adapter sequence into cutadapt unless you are getting substantial read-through. You should be able to get away with simply :
Looks okay to me. You can probably omit -Q. All the steps you've outlined can be done within QIIME 2. Check out the tutorials. Nothing should change within QIIME 2 as many of the steps are using generally accepted approaches and wrap the wools in question.