Filter representative sequences according to sequence's length

Recently I deal with my raw sequence which is sequenced under Miseq PE300 mode.
The quality of sequence is not really good.When I denoise it I found some representative sequences which are short than my expecting length.I blast it and I am sure they are not my target sequences.
So I want to filter my rep-seqs by length but I could not find a plugin which provide the function directly(Just like --p-min-length and --p-max-length).
Although I could transform it to an excel sheet and filter under microsoft office but I still want an easier way by deal under qiime2!
I’m not aware of QIIME2 command lines for filtering sequences based on length. Depending on the primers used, the amplicon sizes may actually vary quite a bit. If the goal is to remove the non-targetted sequences, you can try the q2-quality-control plugin, which allows you to filter sequences by alignment, say to a reference database to exclude non-bacterial sequences.

Thank u @yanxianl
My sequences is actually targeting a functional gene so it has no reference database.
And some sequences length is far from my target amplicon size so I am sure they are not my aim sequences :laughing: .Also check by blast to NT reference datasets.
Now I just try to filterd it using vsearch.

You can make one — it sounds like you already found one! This one:

It would just take a few representative sequences to do the trick, but the more the merrier.

That would do the same exact thing as what @yanxianl is recommending. With vsearch you will still presumably need some reference sequences to align against.

Thank for advice @Nicholas_Bokulich

My target amplicon is Bacterial amoA(ammonium monooxygenase subunit A) . There is actually no good performance reference dataset.

Here is my rep-seqs file which only filter the singleton.rep-seqs-nosinglton.qzv (282.3 KB)
My target amplicon sequence size is 452bp but as u can see there is still some rep-seqs which is lower than 400bp(some even only 180bp :laughing:).
I have arranged my own reference database but I am not sure it would assign with a good performance.I have to test and adjust my reference database later. I dont want to take the risk to assign my feature and filter by it beacuse some real target may be discard due to uncorrect annotation.
Filter it first by rep-seqs length seems lower risk to me.

Now I use vsearch to filter it by command
Vsearch --fastx_filter rep-seqs-nonsingleton.fasta --fastq_minlen 400 --fastaout file

Is there a better way deal with that issue?

Sure that works — I had assumed you were using vsearch to filter by alignment

I was not recommending that you classify taxonomically, nor is @yanxianl. the q2-quality-control plugin would blast your sequences against a set of reference sequences and discard based on alignment quality.

Sounds like you have sorted things out with vsearch, though, and that’s fine :smile:

Thank u nick

Now I know what u mean about using q2-quality-control to filter my sequences.I will try that later! :grinning: Great solution!
Still I would want an implement of length filtering in qiime2 plugin.
Also I was banned a few month ago because a crosslink. May I unbanned now?:grimacing:

I agree that would be useful — any interest in contributing to the q2-vsearch plugin?


Honestly I have no abilility to do this work.I am barely dont know how to use python.:disappointed_relieved:

