Trim sequences before using Greengenes2 database?

I'm following the Greengenes2 tutorial in the announcement (I have V4 data). During the DADA2 step I trimmed the sequences to --p-trunc-len-f 250 and --p-trunc-len-r 230. In the announcement it says that the majority of ASVs are 90nt, 100nt or 150nt.

Should I go back and "right" trim my fragments to 150nt?
What are the majority disadvantages to only using the forward reads?
Thanks

Hi @newberrf,

Use of the reverse read is not implicitly better, and I'm unaware of work which demonstrates that conclusively. Notably, you increase the amount of potential error through the stitching process and the use of the reverse read. The taxonomic and phylogenetic benefit of longer ASVs within 16S can be negligible. The value though is related to the specific questions being pursued, but it is not unusual for the reverse read from V4 data to not affect the biological conclusions of a study.

In Greengenes2, we placed V4 fragments predominantly derived from the EMP 16S protocol. A design consideration of those primers was meaningful taxonomic and phylogenetic signal proximal to 5', such that the signal from the amplicons reasonably approximated full length. An important challenge was Illumina instruments previously generated shorter fragments and with much higher error on the reverse read, so initially the focus had been entirely on the forward read. At present, quite a few large scale projects continue to only use the forward read.

If the intention is to leverage the existing phylogenetic placements in Greengenes2, then it is important to trim consistent with the EMP primers. At least for human studies, you would observe most (not all) ASVs starting with TAC.

Alternatively, ASVs (whether stitched or not) could be placed with SEPP into Greengenes2.

All the best,
Daniel