Using vxtractor for paired end 16S illumina data

Dear QIIMErs,

I am pretty new in NGS analysis and would like some input from your side. I have paired end 16S illumina data which are already demultiplexed by the sequence provider. I want to use vxtractor to analyze the regions V3-V4 and V6-V7 separately , but I am not sure in which step I should do it. Should I first merge the forward and reverse sequences (R1.fastq and R2.fastq) and then transform them into fasta and use this fasta file to vxtractor? {Vxtractor takes only fasta files.}

Is there any other way to select the V3-V4 region and avoid vxtractor?

Thank you in advance for any feedback!

Best regards,

Hi @katerina_nik,

If you know primers for this region that you want to use, you can use extract-reads to slice out specific regions. This is a little different from what vxtractor does, but might suit your purposes.

Yes, merge first.

I hope that helps!

Hi @Nicholas_Bokulich! Thanks for the helpful reply!
I know the primers, but I was thinking that vxtractor might be easier. I merged the sequences and transformed them to fasta.Truth is the analysis is taking hours!

What do you mean that vxtractor works a bit different than using primers to extract the sequences? In what way?
do you think more sequences will be dropped if I use vxtractor?

Thank you once again!

best regards,

I have never used vxtractor before, so maybe I just don’t understand the inputs. It sounds like it uses HMMs to identify and extract variable regions from 16S, and you request a specific region, not a specific primer. Is that correct?

QIIME2’s extract-reads will trim out sequences at those primers. So depending on where the primers sit, there might be some conserved region present in the extracted read. Probably a minor difference, depending on the goals of your analysis.

I am really not sure — I do not know how vxtractor handles low-similarity sequences. That’s a question for the vxtractor developers.

If you are doing all of your upstream and downstream analyses in QIIME2, I would recommend sticking with extract-sequences so that you preserve your provenance and streamline your analysis.

I hope that helps!