Weird counts introduced into feature table

gollison · July 26, 2021, 1:47am

Hello. Created a feature table using qiime2/dada2 and it looks weird. The ASVs with the highest counts are at the top of the table and the table gets progressively sparse with fast scrolling; a relatively short gradient.
Additionally, I noticed several ASVs that contain zeros for all but 1 of 38 samples contain a strange pseudo count that counts. The number counts down from 20-something, is never in the same sample for all ASVS, and is staggered in such a way as to make a diagonal line.

Below is my series of keystrokes:

1. activate qiime 2 environment: source activate qiime2-2020.6

2. Created the maifest file: SMP_ASVmanifest.csv

3. Imported the manifest file

nohup qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path SMP_ASVmanifest.csv --output-path demux-2018_2019 --input-format PairedEndFastqManifestPhred33 > nohup.out 2> nohup.err &

4. Next step is to look at the quality plot, make trimming decisions and do two DADA2 trial runs.

5. form the ASVs using DADA2

nohup qiime dada2 denoise-paired --i-demultiplexed-seqs demux-2018_2019.qza --p-n-threads 15 --p-trim-left-f 20 --p-trim-left-r 20 --p-trunc-len-f 250 --p-trunc-len-r 227 --o-table table-2018_2019-nc.qza --o-representative-sequences repseqs-2018_2019-nc.qza --o-denoising-stats stats-2018_2019-nc.qza

6. Apply taxonomy to the non cutadapted repseqs

nohup qiime feature-classifier classify-sklearn --i-classifier /usr/local/bioinf/tax_db/Pr2_V12_Qiime2-2020.6/Pr2_Classifier_V12_Q2-2020.qza --p-n-jobs 15 --i-reads repseqs-2018_2019-nc.qza --o-classification taxonomy-2018_2019

7. export the table

qiime tools export --input-path table-2018_2019-nc.qza --output-path exported-table-2018_2019&
biom convert -i exported-table-2018_2019/feature-table.biom -o feature_table-2018_2019.txt --to-tsv --header-key=taxonomy

8. export the representative sequences

qiime tools export --input-path repseqs-2018_2019-nc.qza --output-path exported-repseqs-2018_2019

Toy example of pattern below (more samples and ASVs involved on actual table):
Sample1 Sample2 Sample3 Sample4 Sample 5
ASV_1 24 0 0 0 0
ASV_2 0 24 0 0 0
ASV_3 0 0 24 0 0
ASV_4 0 0 0 0 0
ASV_5 23 0 0 0 0
ASV_6 0 23 0 0 0
ASV_7. 0 0 23 0 0
I've attached the feature table exported from qiime2 and highlighted the patterns in excel. Pattern starts (mildly) at line 3015 counting down from the number 15. More extreme cases are in the single digits around line 4373 (the number "8").
I can share any other data, tables, screenshots as needed.
Please help shine some light on what's going on.

Thanks,feature_table-2018_2019.txt (7.5 MB)
Gerid

mverce · July 26, 2021, 7:07am

Hi!

I think it's just a necessary consequence of sorting ASVs by rows and by columns. The ASVs are sorted by the sum of their frequencies - the first ASV in your txt example has a sum of frequencies over all samples 1972328, the next ASV 997444 and so on. If several ASVs have the same sum of frequencies, then they are apparently ordered by sample, so you would expect ASVs that are unique to different samples to create a pattern that you noticed.

I think that these counts are as real as any and you have nothing to worry about in this case

Regards,
Marko

gollison · July 26, 2021, 4:34pm

Thanks Marko. What do you make of the extremely orderly countdown of the ASVs that otherwise have zeros? Note the highlighted yellow around line 4373 where the table is at repeats of the number 8.

mverce · July 27, 2021, 7:52am

Because the ASV table is sorted, an extremely orderly countdown is expected if you have several ASVs unique to each sample and present at equal relative frequencies (around line 4373 a relative frequency of 8, but further down 7, 6, 5, ...). And given the huge number of ASVs that you have in your data set, it is not unexpected that you would have quite some ASVs at the same relative frequencies and unique to certain samples.

Regards,
Marko

gollison · July 29, 2021, 4:37pm

Thanks for having a peek at the table I attached. I'm glad that you, as an expert, don't believe they are artifacts. I think this will come down to some filtering for me on the basis of the questions I am asking in the dataset.
Thanks for the support.
Gerid

system · August 29, 2021, 10:38pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.