Each feature only seen in 1 sample

Hello, I’m new to qiime so I thank everyone for their help in advance. I’m working with Qiim2 (2019.4) on a remote super computer. I am working with Ion Torrent single end reads and am currently trying to filter out features that appear less than 20 times and are seen within less than 5 of my 36 samples. However, when filtering I found that I lose all my reads. After further investigation (I looked at the feature table after trimming for quality), I have found that every feature is only seen within 1 sample (I’m not sure if it’s the same sample). This exact code was used before to run samples from the same machine, so I’m not sure where i’m going wrong. Here is the code I’m running:

qiime dada2 denoise-pyro \
--i-demultiplexed-seqs single-end-demux.qza \
--p-trim-left 0 \
--p-trunc-len 260 \
--p-trunc-q 15 \
--p-chimera-method consensus \
--o-representative-sequences rep-seqs-denoise.qza \
--o-table rep_seq_feature_table.qza \
--o-denoising-stats denoising-stats.gza \
--verbose


#summary stats of denoise and quality filtering
##I looked at this .qzv and determined that all features were only seen within one sample
qiime feature-table summarize \
--i-table rep_seq_feature_table.qza \
--o-visualization rep_seq_feature_table-view.qzv \
--m-sample-metadata-file /scratch/aubhah/ricardo/full/Individual_Samples_Files_full/metadata.txt 

qiime feature-table tabulate-seqs \
--i-data rep-seqs-denoise.qza \
--o-visualization rep-seqs-view.qzv

#Filter features from feature table
#features must be a minimum sum of 20 across all samples and must be present in 
at least 5 samples
#https://docs.qiime2.org/2019.7/tutorials/filtering/
qiime feature-table filter-features \
--i-table rep_seq_feature_table.qza \
--p-min-frequency 20 \
--p-min-samples 5 \
--o-filtered-table rep_seq_feature_table2.qza


#Now filter sequences to match table 
##This table is empty
#https://docs.qiime2.org/2018.8/plugins/available/feature-table/filter-seqs/
qiime feature-table filter-seqs \
--i-data rep-seqs-denoise.qza \
--i-table rep_seq_feature_table2.qza \
--o-filtered-data rep-seqs-filtered.qza

Hey there @haleyhallowell!

That's a strange problem, I'm not too sure what might be at play here (is it just the signal of the data? or a technical mistake? aliens?)

Can you share the DADA2 denoising results? Also, the demux summary would be really helpful for troubleshooting, too.

:qiime2:

Hello!

Sorry for the late reply, the remote server I use was down for maintenance and I couldn't access the files.

We also thought it was extremely strange, seeing as this exact code has been used for other datasets.
So, here is the denoising-stats: denoising-stats.qzv (1.2 MB)

And here is the demux summary: single-end-demux.qzv (290.3 KB)

Please let me know if you have any trouble opening the files. Thanks for your help!

Is the table still empty if you lower this to 2 samples?

Yes, when I tried that the table was still empty. I also have 2 other datasets within this experiment, and this problem was seen within one of them, but not within the other. I've included the feature table for 1 dataset that ran fine (rep_seq_feature_table-view_ileum.qzv (482.7 KB) ) and one that is experiencing this issue (rep_seq_feature_table-view.qzv (653.0 KB) ).

As you mentioned above, the last tab in the feature table summary shows that every single feature is found in only 1 sample. That to me sounds like you still have sample barcodes on your reads when DADA2 denoising happens. Looking at your provenance, you didn’t specify any trim-left to be applied, but, perhaps you need to? Usually when there are still barcodes on reads that is the easiest way to remove them (trimming 12 or 15 nts from the left, for example). As to why things might be different this time, that might be a question for your sequencing center. Do they normally remove barcodes (and other non-biological adapter sequence)?

I didn’t specify trim-left because the sequences are single end, so there shouldn’t be any sort of barcode or adapter on that side. However, I can try trimming off a few from the left, just to be sure. We have these same datasets with the barcodes already removed, and the same issue is seen.

I'm not sure the two are necessarily related, right?

If you don't know I strongly encourage you check with your sequencing center.

What about other adapters?

In this case, the adapter is the barcode. With single end sequencing, there should only be an adapter (barcode in this example) on one end, so in theory there should only be an adapter (barcode) on the front end, which I believe we are trimming off. The sequencing facility has been a struggle to work with in the past, but I can try to reach out to them.

I asked around at Q2HQ this last week, and the general consensus is that somehow there are still barcode sequences in the reads, and that their presence is causing each feature to only be found in one sample.

Let us know what you find out from the sequencing center!

:qiime2:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.