Problem in the table file when working with publicly available microbiome data

Dear All,

I am trying to analyze publicly available microbiome data currently using 4 different datasets. I already have their FASTQ files and also metadata. All 4 datasets are related to 16S, V3-V4 region. I imported the data to obtain the demux.qza files separately for each dataset. Then, I did the quality check separately for each set to identify the trimming and truncation parameters.

Data set Trim left Forward Truncation Forward Trim left reverse Truncation reverse
1 0 285 10 203
2 0 250 10 242
3 0 251 0 200
4 0 251 0 203

Selected parameters for DADA2 denoising for all data sets

  • Trim left forward = 0
  • Truncation forward = 250
  • Trim left reverse = 10
  • Truncation reverse = 200

Then, I selected a common parameter values to do the DADA2 denoising separately for each dataset and did the DADA2 denoising. Then, I merged the files (4 tables and 4 rep-seqs files) to make one table and one rep-seq file and did the filtering steps before I do the taxonomic assignment.
However, I am in a problematic situation where I could not figure out a possible answer yet becase in my table file, I have only 498 samples although I have 744 total samples. Moreover, There are very less feature counts in my table file. Only 4 samples have feature counts hogher than 1000. Thus, it was bit difficult for me to find a proper sampling depth and also this may affect for the diversity analysis.
I also tried to get the relative abundance tables from these results and I found that there are several 0 values for many taxa in several samples which is very spares data. I think that it would not be possible to proceed further with this type of results since it may be associated with some bias in my initial analysis steps. I am not sure what could be the exact reason for this problem, is it related to truncation parameters I selected or any issue with the merging the table files or something else?

Therefore, I really appreciate and grateful for your comments on this observation and suggesting me any possible way to overcome this kind of situation. It would be very helpful for me to proceed further since I am stuck on this step. Thank you!


1 Like

It would be very useful to check denoising stats for each of the separate runs after Dada2.
If you lost most of the reads at filtering step, you need to try to lower truncation parameters (all reads shorter than that value are discard).
Another concern is a target region. V3-V4 is a large one, and there is a possibility that forward and reverse reads do not overlap enough to be merged. In that case most reads will be lost after merging step.
You need to play with truncating and min. overlap values to find a combination that produces the best output.
If nothing works for you, another option is to use only forward reads, but with poorer taxonomy annotations at the end.



Hi @timanix ,

Thank you so much for your quick response for my problem with very useful information and suggestions!!. I actually could not check the denosie stats files. I will refer to them and also try to play with truncating and min. overlap values to find a combination that produces the best output of my data. I appreciate much your help and thank you!!


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.