I have question about possible workflow for my data.
My paired fqs are shotgun sequencing data of some gene amplicon and I want to extract fasta sequences of all existing variants and their frequency. Amplicons was done using heminested pcr with 2 degenereted forward and 1 degenerated reversed primers, target length was ~422bp. I think about pipeline like this:
Import fq > merged reads (vsearch join-pairs) > remove reads different than ~422bp length and without primers(extract-reads??) > dereplicate/clustering (dada2?) > visualization (tabulate-seqs?)
Is this good idea, maybe qiime is not good tools to do this. I will be very greatfull for any advice and suggestion.
Welcome on the forum!
Im afraid I never processed shotgun fragments obtained from amplicons, and so I am probably raising more questions than proposing answer (and to be honest I am not familiar with the kit you mentioned either).
How long are your sequences? What I can not figure it out in my mind is what happens if you stick together the sequences form the shotgun data. I am not sure you are going to get back the full amplicon.
My gut feeling is that try methaphan2 https://library.qiime2.org/plugins/q2-metaphlan2/12/ plug in or shogun https://library.qiime2.org/plugins/q2-shogun/15/ plug in within qiime2 may be more useful to give an overview of the data!
I’m curious to see what is the general idea on this
I misunderstood data owner, and made mistake in my previous post. Data is from amplicon sequencing, not shotgun. PCR products was ligated with barcodes and sequenced 2x300bp. When I merged reads most of it was exactly around 422bp.
good for you! Much simpler this way!
Only one correction on your pipeline then:
If you merge the amplicon with vsearch join-pairs, do not use dada2 to identify ASV the sequences, it is designed to work with unmerged data!
If you want to pre-merge the sequences just replace dada2 with deblur!
denoise-other reference sequences are required ,which I havent. Of course I could download some seq form genbank and use them but I thought about some sort of primer-based filtering. Or maybe I should use dada2 without pre-merge stage.
Sorry I did not get your amplicons are not from 16S!
My suggestion would be to avoid the pre merging then.