Hello! I am attempting to analyze SRA data along with my own. I have a total of 7 16S rRNA sequencing runs, some of which have V4 amplicons and the rest have V4-V5. So, I am trying to cut off the V5 region at the 806r V4 primer site. When I used cutadapt on individual reads to remove the V5 region, a huge proportion of my reads were being filtered out by DADA2 when compared to DADA2 runs on the uncut reads. Since the reverse reads tapered off in quality quite quickly, I decided to proceed with DADA2 normally, then export the whole amplicon and use cutadapt in the command line, then import that back into QIIME2. This seemed to work much better, however, I now need to dereplicate my data but the representative sequences have been named using the old sequence names. How can I dereplicate the representative sequences after DADA2?
The second related issue I am having is that some of the SRA runs appear to be completely reverse complimented compared to my data. How can I reverse complement all of the representative sequences? Once I do that, they will still be named by their previous sequence so I have the same dereplication issue as above. Is there a way to combine the representative sequences and the table in order to export them as SampleData[Sequences] so that I can dereplicate them? Would that even matter since the dereplication is done by name?
Lastly, do you know of any literature or examples where people analyzed different hypervariable regions? I have been trouble finding anything aside from huge meta-analyses on human microbiomes which is not what I am going for.
Thank you so much. I really appreciate the support on this forum!