0 freqeuncy values in some samples after paired-seq dada2

Alfred.Burian · October 24, 2017, 4:29pm

I have paired-sequence microbiome data and have the problem that in 2 out of my 25 samples the frequency of all features is reduced to 0 after the dada2 quality filtering. The other samples look pretty good (frequencies in samples are reduced by ~30-40% after dada2, which seems fair to me).

I looked at the quality plots before running dada2 and trimmed the sequences accordingly. I changed trimming options and rerun the analysis, with no fundamental change (still samples with no features). When I import only the two samples with my manifest file, and then run the dada2 analysis the picture changes and I do get good results (2000 features and a frequency of >10000 per sample.

However, when I merge the 23 samples and the 2 samples, which I have processed separately, I see that none of the features from the 2 samples match with features of the 23 other samples. This is weird, because my 23 samples include 8 biological replicates to the 2 samples, which create the problem and the total feature number in the data set with 23 sample is 7000. Hence there must be something wrong.

I thought about running a single-sequence run of dada2 on the whole samples. Any other tipps?
Here are my qzv files:
https://www.dropbox.com/sh/nbfcrsl4mc2qq83/AAD5n0h7KI8newCWveHijcE-a?dl=0

Nicholas_Bokulich · October 24, 2017, 5:05pm

Hi @Alfred.Burian,
Thank you for posting!

Did you look at the quality plots for those 2 problem samples specifically? Could you share plots for just those 2 samples here?

It sounds like what is could be happening is (as you deduce) those samples have lower quality sequences, possibly just on one read direction. I am not sure why replicate samples on the same sequencing run would have lower quality than others, but it's just a theory. Try running just on forward reads with appropriate trimming parameters and see what happens — that could help troubleshoot what is going on, in any case.

@benjjneb, do you have any insight on what may be happening here?

benjjneb · October 24, 2017, 5:29pm

So the 2 samples run fine on their own, but end up with zero reads when run with the others? That is mysterious...

My first guess is that there is some technical difference between those two samples and the others, like they are using different primer sets? Is that possible?

If not, could you share the fastq file for one of the disappearing samples, and its non-disappearing replicate?

Alfred.Burian · October 24, 2017, 6:24pm

thanks for the very quick response!

No, the quality control for the two samples does not look particularly badly. Also, all samples were analyzed with the same primers, illumina primers have been trimmed before further processing the files and all samples have been sent in together and were analyzed in the same sequencing run.

I have updated the dropbox folder, it contains now also the qza files (for both the 2 and the 25 samples). The samples, which end up with 0 frequency are number 18 and 21. Some specifications to the trimming parameters: I chose 15/20 for the start of the sequence and 190/150 for the length of the sequences for forward and reverse reads, respectively.

I started to run dada2 on only the forward sequences of all samples. I will post the outcome as soon as I have it.

Alfred.Burian · October 25, 2017, 3:25pm

Update: I have run now all samples with the forward sequences only and it worked fine. I have added the respective files to the drop-box folder.

Probably the reduction of all features to 0 frequencies in these two samples were related to the quality of reverse sequences. However, it is weird that (i) the quality plots did not reveal that even when I looked only at the two samples in isolation. (ii) There would be different results when I run dada2 on all 25 samples together or only the two samples separately. Shouldn't this lead to the same outcome?

Besides these relatively minor problems, Qiime2 is a huge advancement! Thanks for putting in all the work!

benjjneb · October 26, 2017, 2:36am

That is the mystery here, I still don't understand why that would be. And unfortunately I was unable to unzip the 25-sample qza files you provided so I am unable to look further into the raw data.

The two sample qza seems normal (although only contains forward reads). Is it possible that something simple, like the reverse read files not being packaged correctly into the 25-sample object, could be the cause?