Only V4 analysis from V3V4 reading

ysh1962 · November 2, 2018, 3:05pm

I have set of paired-end, demultiplexed fastqs that contain sequencing reads covering V3V4 (from start of V3 to the end of V4~465bp).
How can I analyzeonly the V4 regions (~~291bp)?

ysh1962 · November 2, 2018, 3:05pm

I have a set of demultiplexed Illuminina V3V4 reads fastqs.
Even though the reads cover V3V4(~465bp), I want to focus and analyze only V4 regions to compare previous public data.
How should I proceed??
Thanks

Mehrbod_Estaki · November 2, 2018, 6:32pm

Hi @ysh1962,
Comparing different projects that used different primers has been covered on this forum before, for example see this thread for various options discussed there.
In short, instead of trimming them you could use fragment-insertion tool which would allow to keep your reads as is. Or if you really do want to trim your V3-V4 down to just V4, you can use cutadapt to look for V4 primers in your V3-V4 set and trim it to there.
There’s another option of merging your separate reads after you assign them taxonomy then collapse them down to genus/speices. Personally I would rank these options fragment-insertion > trimming with cutadapt > merging at genus level.
Hope that helps!

ysh1962 · November 4, 2018, 1:09pm

Dear Mehrbod_Estaki
I am new to fragment-insertion (and read it, following your reply).
But I am not seeking for a meta-analysis of datasets covering different amplicons.
What I have are paired-end Illumina reads for V3V4 sequencing. That is all.
Mothur suggests pcr.seqs of reference seqs to trim reference seqs to V4. How’s your suggestion of cutadapt different Mothur approach??
Thanks

Mehrbod_Estaki · November 4, 2018, 11:13pm

Hi @ysh1962,
I was under the impression that were/you are trying to compare different projects that used different primers based on what you said earlier:

Given what you say there, those are 2 different regions though they share some overlap. So if you are still interested in comparing data with different targeted regions, see the suggestions above. If you are not comparing your data to others, in contrast to what you mentioned above, then you don't need to do any trimming at all, you can simply keep your reads as they are, no need to trim only to the V4 region since you are just throwing away valuable data.
As for mothur's suggestions to trim V3-V4 sequences to V4 only I can't say. Do you have a link to this suggestions? I personally don't see a reason to have good V3-V4 region and trim them down to V4 only, unless you are comparing it to other datasets.

system · December 6, 2018, 5:13am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.