Recently I deal with my raw sequence which is sequenced under Miseq PE300 mode.
The quality of sequence is not really good.When I denoise it I found some representative sequences which are short than my expecting length.I blast it and I am sure they are not my target sequences.
So I want to filter my rep-seqs by length but I could not find a plugin which provide the function directly(Just like --p-min-length and --p-max-length).
Although I could transform it to an excel sheet and filter under microsoft office but I still want an easier way by deal under qiime2!
Could u implement that function in next version?
Iβm not aware of QIIME2 command lines for filtering sequences based on length. Depending on the primers used, the amplicon sizes may actually vary quite a bit. If the goal is to remove the non-targetted sequences, you can try the q2-quality-control plugin, which allows you to filter sequences by alignment, say to a reference database to exclude non-bacterial sequences.
Thank u @yanxianl
My sequences is actually targeting a functional gene so it has no reference database.
And some sequences length is far from my target amplicon size so I am sure they are not my aim sequences .Also check by blast to NT reference datasets.
Now I just try to filterd it using vsearch.
You can make one β it sounds like you already found one! This one:
It would just take a few representative sequences to do the trick, but the more the merrier.
That would do the same exact thing as what @yanxianl is recommending. With vsearch you will still presumably need some reference sequences to align against.
My target amplicon is Bacterial amoA(ammonium monooxygenase subunit A) . There is actually no good performance reference dataset.
Here is my rep-seqs file which only filter the singleton.rep-seqs-nosinglton.qzv (282.3 KB)
My target amplicon sequence size is 452bp but as u can see there is still some rep-seqs which is lower than 400bp(some even only 180bp ).
I have arranged my own reference database but I am not sure it would assign with a good performance.I have to test and adjust my reference database later. I dont want to take the risk to assign my feature and filter by it beacuse some real target may be discard due to uncorrect annotation.
Filter it first by rep-seqs length seems lower risk to me.
Now I use vsearch to filter it by command Vsearch --fastx_filter rep-seqs-nonsingleton.fasta --fastq_minlen 400 --fastaout file
Sure that works β I had assumed you were using vsearch to filter by alignment
I was not recommending that you classify taxonomically, nor is @yanxianl. the q2-quality-control plugin would blast your sequences against a set of reference sequences and discard based on alignment quality.
Sounds like you have sorted things out with vsearch, though, and that's fine
Now I know what u mean about using q2-quality-control to filter my sequences.I will try that laterοΌ Great solution!
Still I would want an implement of length filtering in qiime2 plugin.
Also I was banned a few month ago because a crosslink. May I unbanned now?