Rep-seqs length


My forward and reverse sequences have min and max length of 80 and 150 respectively.
after importing them into qiime2 , I used dada2 plugin for filtering, merging and feature table construction.
When I checked rep-seqs.qzv file, it showed minimum sequence length 80. This is quite unexpected as a minimum length of the reads itself was 80. Are they really merged sequences??
And even if I see them as technical replicates, they should be very few in numbers. So if I filter these sequences on the basis of frequency, they should get eliminated. but I can still see them in my filtered-rep-seqs.qzv.
Is there any way to remove low length sequences? or any other suggestion?

Thanks in advance

Hi @Nisha,

It's hard for us to know the answer to this without seeing some more details about the actual commands you've ran. Especially the stats output of DADA2 which you can visualize and share with us would be very helpful. But, what you are describing is technically possible if there is a complete overlap between your forward and reverse reads. What is the expected overlap and the length of the final merged products?

I guess this again would depend if they are real merged products with full overlap or somehow erroneous products. What were your filtering parameters?

There are a couple of ways of doing this:

  1. Using the RESCRIPt plugin with its plethora of useful actions.
  2. This neat little trick using feature table filter seqs

Hope this helps!

1 Like

stats.qzv (1.2 MB)
This is my dada2 output file.
I was expecting the minimum length after merging to be ~150.

qiime feature-table filter-features

--i-table sample-frequency-filtered-table.qza

--p-min-frequency 271

--o-filtered-table sample-contingency-filtered-table.qza
I think it is not filtering rep-seqs but table. am I right?

Thank you for this!
so I will be just filtering my sequences, then how is it going to make changes in feature table? Because many sequences will be removed and table needs to be updated. right?

Correct. You can follow the same approach used to remove chimeric sequences as outlined here.

For example, if you want to keep the features in your feature table based on your sequence length criteria, you can just feed in your representative sequence file as a metadata file like so:

qiime feature-table filter-features \
    --i-table feature-table.qza \
    --m-metadata-file sequences-to-retain.qza \
    --o-filtered-table feature-table-retained.qza
1 Like

Hi @Nisha ,
Just wanted to check in here to see if you had any resolution to this or are still waiting for an answer.
Your data2 stats looks ok to me, I don't see any real issues there. As for how there are 80 nts reads, I'm guessing you actually had small target fragments of 80nt that fully overlapped so after DADA2 merged them they were still only 80 nt. If DADA2 fails to merge any pairs, it will drop them both, so in this case they were successfully merged. What is your target region/primers used here? Is it unreasonable to think you may have had 80nts targets given your primers?