Yes. Normally (when read orientation is consistent) truncation can be used with extract-reads to match the exact same site and length as in the query. For example:
imaginary query:
ACGTACGTACGTACGTACGTACGTACGTcccccccccccccccc
imaginary reference:
gggggggggggggggACGTACGTACGTACGTACGTACGTACGTccccccccccccccccgggggggggggggg
(lowercase "g"s == primer sites)
(lowercase "c"s == part of amplicon sequence that corresponds to region that was truncated from query by dada2)
(bold == part of amplicon that appears in truncated forward reads/reference sequences)
Using extract-reads on that reference would yield an exact match (or in real life just match the same region/conditions so enable exact or similar matches between query and ref):
truncated query: ACGTACGTACGTACGTACGTACGTACGT
truncated refseq: ACGTACGTACGTACGTACGTACGTACGT
However, when orientations are mixed, you don't know which end of the amplicon your query sequences are on... you will have a mixture of:
ACGTACGTACGTACGTACGTACGTACGTcccccccccccccccc
and its reverse complement:
gggggggggggggggACGTACGTACGTACGTACGTACGTACGT
Which truncated (on 3' end of each read) will yield:
ACGTACGTACGTACGTACGTACGTACGT
and
gggggggggggggggACGTACGTACGTAC
So you are covering different parts of the complete amplicon. Hence, the reference sequences should be untruncated to cover the full amplicon so that you can hit any part.
Should be not using the trun-len at all for the extract reads command ?
Correct
–p-trunc-len 250 \ #Remove this?
Yes, remove
–p-min-length 100 \ #Keep this?
–p-max-length 400 \ #Also keep this?
keep both, but adjust to expected ranges (this could also cause unassignments if the ranges are not being set correctly). Check the lit for what the expected amplicon ranges are — setting broad limits probably does not hurt, these are really just used as safeguards, since occasionally some primer sets and reference database combinations can yield some spurious hits that cause issues during classification... unusually short or large amplicons are a good indicator of spurious hits.
I got pretty good classification percentages in my other region but maybe this was just luck?
What region was good and what region was bad? Sometimes it's not luck, sometimes it does depend on region, primer, reference db, etc... e.g., I've seen issues with V1 primers on some databases before because some of the reference sequences might not have the correct forward primer included in the sequence. The default settings were designed based on benchmarks of different 16S domains as well as ITS... sort of general "catch all" settings... but for unusual amplicons (maybe?) or for other marker genes (not you, since you have 16S but just saying) these settings may need to be tweaked. Having a mock community to re-optimize for novel primer sets is ideal!