Is there a difference between using 'feature-table filter-samples' and using filtered input data from the beginning?

I found out that the feature table is different from the one that I categorized using 'feature-table filter-samples' and the one that categorizes and analyzed from the first input data. Is it okay to use any method? Also, can I mix other NGS data and use it as input data?

Hello @hyeonsu-seon,

Welcome to the forums! :qiime2:

Can you tell us more about the pipeline you used to process this data? That will make a big difference on the reproducibility of the results and if we expect results to be identical if other samples are added or removed from the pipeline.

1 Like

Thank you for your answer.
Pipeline processing data is as follows:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest.txt --output-path demux-paired.qza
--input-format PairedEndFastqManifestPhred33V2
qiime cutadapt trim-paired
qiime dada2 denoise-paired
qiime feature-classifier classify-sklearn
qiime taxa filter-table

The difference is "qiime tools import". manifest.txt files were analyzed in two ways. One was extracted and used only the data needed for analysis, and the other was analyzed using all data to create A and then classified using "qiime feature-table filter-samples". At this time, I checked denoising-stats.qzv and found that there was a difference from the denoizing process, but I don't know why.
Thank you again for answering my question

1 Like

Thanks for telling me more!

So dada2 denoise-paired should produce stable ASVs, however, depending on what settings are used to run it, these results are not guaranteed to be identical. Take a look at this discussion multiple dada2 runs:

The main issues are --p-pooling-method and --p-chimera-method both of which can lead one sample to affect another, results in the change you describe when omitting samples before this process. See the docs

It's worth noting that the exact counts will change, the major biological results should be the same for dada2. If adding/removing a sample changes the ASVs a lot, then that's a problem and something else is going on!


Thanks to your answer, the question has been answered!!

I can see that the difference in the number of samples affects the ASVs.

So if I add or remove samples, is there no difference in the accuracy or reliability of ASVs?

1 Like

In theory, ASVs should reflect observed amplicon variances regardless of input because the denoising pipeline should create :sparkles: stable features :sparkles:

In practice... the slightly different ASVs removed by the chimer filter could make a difference. (Or something could go very wrong and change many of the ASVs produced :scream_cat:) This is why it's good to include positive controls on your runs so you can check that everything is staying consistent and working as expected.

1 Like

Your answers so far have helped me a lot. Thank you.

1 Like