Since for my sequencing I used the same set of primers and had the same condition of having sequences starting with CCTACGGG and NACTAC .
But when I contacted the sequencing company to confirm about details of the data (whether they removed barcodes etc) they told me the trimming in dada2 like what I did following the post is not necessary, they said those are part of the sequences so they shouldn't be excluded.
So, should I trim or not??
Thanks in advance!
Hew
Details:
I am using QIIME 2 2020.2 in Virtualbox.
For trimming:
Can you share the quality plot with us from the sequences?
Personally instead of trim-left/right I prefer to trim primers by using a tool like cutadapt as discussed here.
We won't know if you need to truncate to a specific length without seeing the quality plots. But if the data was generated on the novaseq platform it may be that you don't need to trim due to the improved quality of the reads. If that is the case I advise reading up on the limitations of using dada2 with novaseq data as discussed here and in the hyperlinks in that post. Having said that the conclusion of the github thread is it should be fine but be wary, but hopefully this will be confirmed/fixed officially soon.
Well this is is always going to be a bit of a judgement call based on the number of sequences we'd like to retain and the length of the amplicon but the quality on the reads does look pretty good.
I'd definitely start by removing primers using cutadapt using the --p-front-f and --p-front-r, I also like to use the --p-discard-untrimmed option (if your primers have been removed from the sequencing facility then you will lose all of your reads doing this) then remaking the quality graph and inspecting it.
Using the tutorial data from moving pictures you can see all sorts of statistics data below which can be used to guesstimate how many sequences will be retained. Your data does look really good so removing 10-0bps from the forward and 5-20bps from the reverse looks good, but use your judgement based on the new quality graphs.
If you don't want to use cutadapt then removing your primer length from the front for each and then work out how many bps you are able to trim (you need 12 overlapping for dada2 be default) and don't remove more than that combined. I'd probably remove a bit from the reverse reads as the quality of the last few bases is poor but doesn't improve between ~245 to ~180 which would be far too short and leave the rest.