Recently received some low-biomass samples sequenced with a 5R 16S protocol. They've already been de-multiplexed.
I've run them through the qiime2 dada2 workflow all the way through feature table creation and taxonomic assignment with SILVA. Is this something that is appropriate for our type of files? I understand the sequences originally come from variable regions but I'm unsure if it's appropriate to run qiime2 without additional steps.
You might look into q2-Sidle. It adapts the work by Fuks et al by making it faster, more functional and integrated with QIIME 2. You may need to use an older (2024) version of QIIME2; re-factor is on my list but there are half a dozen other persistant tasks constantly in my queue.
A couple caveats/considerations with the use of Sidle
If you're planning on using the tree insertion, you need to either use the Silva 128 or Greengenes 13_8 reference databases, becuase the tips have to match the sequences in the database. You could also, theoretically, build your own SEPP insertion tree for an alternative database, but people have been trying to figure this out since Silva 128... and here we are Your other option is simply non-phylogenetic analysis. There's no database restrictions here; anything you can get into QIIME 2 as a reference can be used. A non-phylogenetic analysis can be great in some enviroments and may or may not be right for you.
Sidle is reference-based, so if the thing you're looking for isnt in your reference, you won't find it. This can be used to your advantage (skip assembling multiple regions from mitochondria or chloroplasts) or to your disadvantage (you lose novel diversity)
The current protocol requires knowing your primer pair. You could maybe spoof it, but if you do that, you can't assemble a phylognetic tree.