Only 1/73 sample left after dada denoise?

Hi

I am totally new user of qiime2 and don’t have experience in using Qiime1.

Recently I start to use QIIME2(2019.10) to do 16S analysis. And I met a problem regarding filtering reads I guess.

I used conda to install it. (as describing in this linkage about linux system:https://docs.qiime2.org/2019.10/install/native/#install-qiime-2-within-a-conda-environment)

Import command:import pair-end reads(clean data)

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' \

--input-path se-64-manifest \

--output-path /sbidata/lzhang/201911_hydractinia/RawData/16S_data/output/demux.qza \

--input-format PairedEndFastqManifestPhred64

se-64-manifest format

sample-id,absolute-filepath,direction

10,/sbidata/lzhang/201911_hydractinia/RawData/F19FTSEUHT1464_METzccM/BGI_results/Clean_Data/10/10_1.fq.gz,forward

10,/sbidata/lzhang/201911_hydractinia/RawData/F19FTSEUHT1464_METzccM/BGI_results/Clean_Data/10/10_2.fq.gz,reverse

11,/sbidata/lzhang/201911_hydractinia/RawData/F19FTSEUHT1464_METzccM/BGI_results/Clean_Data/11/11_1.fq.gz,forward

11,/sbidata/lzhang/201911_hydractinia/RawData/F19FTSEUHT1464_METzccM/BGI_results/Clean_Data/11/11_2.fq.gz,reverse

...,...,...

Description of my sample after import into qiime2


I totally have 73 samples before doing denoising
Qualtiy as belows

Command as follows: denoise and visualize step

based on the qulity plots, i set the trim length 300, which means I don’t want to trim reads. Cause it seems the reads have good quality. Right?

qiime dada2 denoise-paired \
   --i-demultiplexed-seqs demux.qza \
   --p-trunc-len-f 300 \
   --p-trunc-len-r 300 \
   --o-table table.qza \
   --o-representative-sequences rep-seqs.qza \
   --o-denoising-stats denoising-stats.qza


qiime feature-table summarize \
 --i-table table.qza \
 --o-visualization table.qzv \ 
 --m-sample-metadata-file 201912_hydractinia_metadatafile_Qiime2.tsv

Then I used qiime tools view to check the results

I got only one sample left …

stats of this procedure

It seems that other samples don’t pass the filter parameters?
So I check the parameter of overlap, the default overlap is 20.

In order to check if this is the problem. I rerun the denoise procedure by setting 0.

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs /sbidata/lzhang/201911_hydractinia/RawData/16S_data/output/demux.qza \
  --p-trunc-len-f 0 \
  --p-trunc-len-r 0 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza

And this time, I got results of total 77 samples in table.

I have a few questions :slight_smile:
I have read some questions relating this, but it seems that no results are like mine…only 1 sample left.

Q1. I am not sure if I set the parameter to 0, then the quality of the result will be low? Can I set it to 0? And will affect the results? If it affects results, do you have any suggestions for me?
Q2. I also have merged reads which are results from the sequencing company. Do I need to analyse directly with the merged reads?
Q3. Should I using deblur instead and try again?

Trully thanks for your help!

Hello @Lu_Zhang,

Welcome to the forums! :qiime2:

Thank you for your detailed question and all the dada2 output logs. I think I can answer all your questions.

If you set both --p-trunc-len-f and --p-trunc-len-r to zero, no trimming will be performed. This is why this setting worked well for your data set. If you set these to 300, then reads will be trimmed at 300 and reads under 300 bp will be removed. I think this is why so many samples were lost when you set these to 300.

Nope! It’s best to start with raw data with as little pre-processing as possible. In fact, dada2 needs the reads before joining to work.

You could. Dada2 and deblur are different complementary methods. You can could read about why they were made and their approach to denoising, and choose which method you like best. Or you could try them both!

Colin

2 Likes

Hi @colinbrislawn,

So excited to see your reply and appreciated for your reply!:smiley:

What I feel confused is that the totaly length of the reads is 300bp, so I guess there is no reads has been removed if I set 300bp?


And also if I set it to 0, is there any affects on the next denoise step? Casue if I set it to 0, the overlap 20 parameter is also disabled.

Thanks very much!

1 Like

Let’s figure out what trimming at 300 bp does. We need all the data we have. Can you post the Demultiplexed sequence length summary from that demux.qzv file?

Maybe… but I think something else might be happening. If any of your reads are less than 300 bp, then these will be removed. And the sequence length summary will tell us how long your reads are and how many of them would be removed at 300 bp.

I’m not sure… The zero setting will cause reads not to be truncated at all, and this could indirectly affect pairing and denoising. This is a good question for the dada2 developer, @benjjneb

Colin

I don’t see why the reads are getting removed with the trunc-len of 300 either, but maybe the reads are for some technical reason just less than 300 nts? (e.g., 299)? Then they would get removed. This sometimes happens if sequencing centers include a couple of technical bases at the start of the reads and then strip them out before returning the data.

Based on your quality plots though, the data looks high quality, and a high fraction of reads are passing through your workflow when using the non-truncated data, so going with trunc-len 0 appears to be just fine here.

My one concern is that you are losing such a high fraction of reads as chimeras, which almost always means that you have unremoved primers at the start of your reads. You should check whether primers are present on these reads, and remove them either with cutadapt, or easily by using the trim-left parameter in the dada2 denoise workflow.

1 Like

Hi :slight_smile:

Yes, you’re totally right, some of my reads are less than 300bp. Here attached is my sequence length summary.

So the principle is that if my parameter is 300bp, it will remove all the reads less than 300bp and then merge them? I am not very clear about the reason.

In that case, I should set a number that less than the length of the shortest reads?
I tried command as follows:

qiime dada2 denoise-paired \
      --i-demultiplexed-seqs /sbidata/lzhang/201911_hydractinia/RawData/16S_data/output/demux.qza \
      --p-trunc-len-f 240 \
      --p-trunc-len-r 220 \
      --o-table table.qza \
      --o-representative-sequences rep-seqs.qza \
      --o-denoising-stats denoising-stats.qza

results show as below,

Trully thanks.

As @benjjneb mentioned, the high fraction of reads as chimeras should be worried. Next I would check this. Thanks:)

1 Like

That’s right. Because 98% of your reads are <300 bp, trimming at 300 will remove 98% of your reads. (If you take a look at the dada2 paired docs, you can read the full description of this setting and see that “Reads that are shorter than this value will be discarded.”)

Maybe… but I would take Ben’s advice here:

The goal is to pick lengths that remove errors and find a setting that the most reads get merged. And I think 0 might be perfect!

Great work! I think you are making good progress.

Colin

Hi :smiley:

Acutually @benjjneb is totally right. I rechecked the data, and contact the sequecing company. They told me that only the merged fasta files are without primers. While the clean pair-end data files are with primers. And I simply trim it by using trim-left in the dada2 denoise workflow. It is very convenient!

So after using the command below:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs /sbidata/lzhang/201911_hydractinia/RawData/16S_data/output/demux.qza \
  --p-trunc-len-f 0 \
  --p-trunc-len-r 0 \
  --p-trim-left-f 20 \
  --p-trim-left-r 20 \
  --p-n-threads 24 \
  --o-table table20.qza \
  --o-representative-sequences rep-seqs20.qza \
  --o-denoising-stats denoising-stats20.qza

Statistics as follows:


Really thanks your great help! @colinbrislawn @benjjneb

However, for the problem of the percentage of chimeric, I still fee confused. The overall non-chimeric rate is imporved definitely after removing the primer.
But some of the samples(10+/73) percentage is still not very high around 50%.

I list the samples with low percentage of non-chimeric.

Do you think this is okay? or I should set it R pacakge dada2.
I have also tried these data in R package by setting minimum overlap 12

image

However, it seems not be improved…

Thanks

2 Likes

Hello @Lu_Zhang,

Glad you got trimming and filtering working well on your data!

Looks like both the R package and Qiime 2 plugin are getting pretty good results, but have 50% or so flagged as chimera. This is not impossible, especially with higher PCR cycle counts. Did you do 15x, 20x, or 30x PCR cycles?

If you want to lower the chimeric numbers, you could change the --p-min-fold-parent-over-abundance or --p-chimera-method described here.

Colin

3 Likes

Dear @colinbrislawn,

As you recommended, I tried to set the --p-min-fold-parent-over-abundance
command as follows:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs /sbidata/lzhang/201911_hydractinia/RawData/16S_data/output/demux.qza \
  --p-trunc-len-f 0 \
  --p-trunc-len-r 0 \
  --p-trim-left-f 20 \
  --p-trim-left-r 20 \
  --p-n-threads 24 \
  --p-min-fold-parent-over-abundance 8\
  --o-table table8.qza \
  --o-representative-sequences rep-seqs8.qza \
  --o-denoising-stats denoising-stats8.qza

And the table is as follows,the nonchimera level is improved, but limited to low merge percentage?


I also tried it in R package dada2 setting the maxmismatch to 3.

I got the results as follows:
image
I can see the merge percentage is improved.

I am not sure if I could just use this result ( with some sample 50% non-chimera ?)

To test the results, I also do taxonomy analysis with the model in Moving picture tutorial:

use the default classifier

command as follows:

qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads rep-seqs20.qza \
  --o-classification taxonomy20.qza

qiime metadata tabulate \
   --m-input-file taxonomy20.qza \
   --o-visualization taxonomy20.qzv

qiime taxa barplot \
  --i-table table20.qza \
  --i-taxonomy taxonomy20.qza \
  --m-metadata-file 201912_hydractinia_metadatafile_Qiime2.tsv \
  --o-visualization taxa-bar-plots20.qzv

Taxonomy barplots as follows:
taxa-bar-plots20.qzv (2.2 MB)

I found lots of unassigned taxonomy…

Then as the tutorial said, I also train my own classfier

command as follows:

qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path gg_13_8_otus/rep_set/99_otus.fasta \
  --output-path 99_otus.qza

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path gg_13_8_otus/taxonomy/99_otu_taxonomy.txt \
  --output-path ref-taxonomy.qza

qiime feature-classifier extract-reads \
  --i-sequences 99_otus.qza \
  --p-f-primer ACTCCTACGGGAGGCAGCAG \
  --p-r-primer GGACTACHVGGGTWTCTAAT \
  --o-reads ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads ref-seqs.qza \
  --i-reference-taxonomy ref-taxonomy.qza \
  --o-classifier classifier.qza

qiime feature-classifier classify-sklearn \
  --i-classifier training-feature-classifiers/classifier.qza \
  --i-reads rep-seqs20.qza \
  --o-classification taxonomy20_new.qza

qiime metadata tabulate \
  --m-input-file taxonomy20_new.qza \
  --o-visualization taxonomy20_new.qzv

qiime taxa barplot \
  --i-table table20.qza \
  --i-taxonomy taxonomy20_new.qza \
  --m-metadata-file 201912_hydractinia_metadatafile_Qiime2.tsv \
  --o-visualization taxa-bar-plots20_new.qzv

Results as follows:
taxa-bar-plots20_new.qzv (2.4 MB)

They seem to have very different results…

It makes me confused, as I couldn’t solve the problem of low percentage of merge as well as chimeria. I don’t know if the downstream results are confident or not. Also, the big difference between taxonomy classifier, is that normal?
Also the sample6 has very high non-chimeric level, but in the taxonomy level, most of this sample is only assigned to one genus. I thought high quality reads would be assigned to more information?

Sorry to bother you again…

Best,
Lu

2 Likes

Hello @Lu_Zhang,

You are making great progress! I appreciate your detailed posts. Let’s dive in! :swimming_man:

So we have lowered the percent chimeric (great!). In your first post you got merged percentages around 80% and 90%. Maybe you could run those settings again, now with the --p-min-fold-parent-over-abundance 8 setting.

It looks like that helped. I think you could safely use any one of these inputs for your downstream analysis. :+1:

I believe you can use those tables for downstream analysis just fine. Let’s see what other forum users agree!


Yes. Training your own classifier to matches your primers should give the best results, and I think that’s what we see here. Proteobacteria remain the 2nd most common microbe, and the unclassified k__bacteria;p__ now become p_Tenericutes. This is great news!

It’s possible that this sample was contaminated, or maybe there was just a lot of p_Tenericutes inside of it.


I think you are doing great and are getting very good results. It sounds like you are still worried if you are on the right path, and I think this care is good. I’m glad you are thinking about this.

If you want to make sure your methods work, the best method is to use positive control samples with known compositions. Did you include any positive control samples in your study?

Colin

2 Likes

Dear Colin,

I am appreciated for your helpful reply and sorry to reply so slowly. Recently I have other urgent thing then I couldn’t reply on time.

Those samples always have a low percentage merged rate. 80%-90% are the merge percentage of other samples. I think it is the problem of my screenshoot… :sweat:The samples with high chimera percetage are always with low relative merge percentage. But the max_mismatch parameter is not exposed in qiime2 dada plugin, so I am wondering if there is any way that I could improve the merge rate?

And for the positive control samples. It is really bad that I didn’t contain any positive samples in my study.

While the interesting thing is that the samples with high chimera percentage are all from one group. Does that mean the whole group is contaminated?

Do you think that I should start with joined reads and using OTU clustering method and compare the results to make sure the previous result is okay? Or the previous result is confident and I could preceed it for downstream analysis.

I also have joined reads from the company. I have noticed that there is q2-vsearch using OTU clustering method. https://docs.qiime2.org/2019.10/tutorials/otu-clustering/
I used the command below:

qiime tools import \
  --input-path /sbidata/lzhang/201911_hydractinia/RawData/16S_data/joined_data/seqs.fna \
  --output-path seqs.qza \
  --type 'SampleData[Sequences]'


    qiime tools import \
      --input-path /sbidata/lzhang/201911_hydractinia/RawData/16S_data/joined_data/q2-vsearch/gg_13_5_otus/rep_set/97_otus.fasta \
      --output-path 97_otus.qza \
      --type 'FeatureData[Sequence]'

    qiime vsearch dereplicate-sequences \
      --i-sequences seqs.qza \
      --o-dereplicated-table table.qza \
      --o-dereplicated-sequences rep-seqs.qza


    qiime vsearch cluster-features-open-reference \
      --i-table table.qza \
      --i-sequences rep-seqs.qza \
      --i-reference-sequences 97_otus.qza \
      --p-perc-identity 0.97 \
      --o-clustered-table table-or-97.qza \
      --o-clustered-sequences rep-seqs-or-97.qza \
      --o-new-reference-sequences new-ref-seqs-or-97.qza

Thanks too much!

Lu

Hi :slight_smile:

I also tried those data in dadaR package for setting max mismatch 3 as well as p-min-fold-parent-over-abundance 8

mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE, minOverlap = 20, maxMismatch = 3)
seqtab <- makeSequenceTable(mergers)
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE, minFoldParentOverAbundance = 8)

The results of chimera percentage are slightly improved. (with or without minFoldParentOverabdundance) But after changing the parameter of maxmismatch of merge, i could say that this is because of the chimera percentage, but not restricted to low merged percentage.

> seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE, minFoldParentOverAbundance = 8)
Identified 22422 bimeras out of 54131 input sequences.
> seqtab.nochim.nominfold <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE)
Identified 28085 bimeras out of 54131 input sequences.

with minFoldParentOverAbundance = 8

input filtered denoisedF denoisedR merged nonchim          
19 105598    98662     88274     93451  72128   58755 0.5564026
44 106774    98323     83282     90280  65161   59961 0.5615693
72 105997    97510     81120     90368  64064   59540 0.5617140
42 107262    99508     83357     91463  65685   60482 0.5638716
20 106783   100049     89866     95658  74487   60379 0.5654364
7  104346    97044     87834     93226  75232   59107 0.5664520
39 106265    98023     86479     92379  71256   61340 0.5772362
38 106030    99043     87026     93621  71488   61342 0.5785344
40 105864    97956     86758     92367  72478   61313 0.5791676
41 106148    98917     87940     93681  71980   62065 0.5847025
43 107016    99352     84097     91537  67670   62937 0.5881083
21 107233    99662     87403     93737  71748   63621 0.5932968
71 106080    97430     87574     92227  75382   68449 0.6452583
8  104845    97094     89591     93811  80011   68331 0.6517335
32 105428    97861     91529     95035  85269   75419 0.7153602
9  105867    97983     90700     94767  83144   79598 0.7518679
27 104928    97937     90955     95009  84265   81311 0.7749219
34 106373    99104     93457     96301  88125   83556 0.7855001
33 105056    97594     93067     95499  88819   84568 0.8049802
76 105979    97275     94725     95409  91997   88425 0.8343634
54 106568    99445     95515     97329  91243   89018 0.8353164
28 106249    99479     95353     97429  91050   89070 0.8383138
66 106513    98504     96065     97165  93361   89620 0.8413996
53 104719    97837     94337     95517  90619   89054 0.8504092
51 105485    98520     96132     96556  93178   92416 0.8761056
15 105608    97983     96944     96988  95528   92907 0.8797345
37 104518    97503     96102     96426  94750   92218 0.8823169
52 107974   100132     98438     98726  96082   95416 0.8836942
45 104550    97066     95335     95695  93076   92462 0.8843807
64 106611    98318     97741     97776  96968   94629 0.8876101
56 105058    97828     95970     96298  93760   93335 0.8884140
36 105748    97861     96575     96879  95186   94023 0.8891232
3  106734    99111     97267     97859  95382   94997 0.8900350
25 106509    99439     96065     95947  94984   94927 0.8912580
14 106609    99030     98200     98119  96947   95229 0.8932548
1  106332    98763     97536     97564  95883   94997 0.8933999
65 107555    99661     98740     98885  97742   96213 0.8945470
63 105932    97634     97025     97084  96326   94764 0.8945739
55 107280    99804     98230     98332  96327   95975 0.8946216
61 105216    98265     97825     97768  97092   94165 0.8949684
35 105595    98656     97212     97501  95584   94558 0.8954780
75 105622    97690     97142     97126  96370   94611 0.8957509
29 107303    99454     98060     98030  96716   96251 0.8970019
48 105468    97706     96481     96652  94965   94699 0.8978932
50 106839    99217     97879     97958  96210   95971 0.8982768
5  107025    99410     98829     98857  98289   96155 0.8984349
68  78311    72598     71035     71013  70423   70398 0.8989542
13 106475    98482     97691     97717  96636   95829 0.9000141
10 105310    97569     95602     95390  94923   94855 0.9007217
60 107624    99880     99496     99507  99065   97306 0.9041292
11 103237    95859     93912     93969  93362   93353 0.9042591
78 102952    94862     94241     94327  93529   93158 0.9048683
24 105307    97849     96562     96742  95873   95322 0.9051820
73 104463    96116     95636     95700  95045   94667 0.9062252
47 104861    97183     96333     96261  95241   95033 0.9062759
26  94211    88023     86155     86192  85516   85507 0.9076116
49 106405    98948     98179     98062  96895   96654 0.9083596
23  88313    82148     80657     80697  80274   80255 0.9087564
74 104168    96375     96236     96286  96137   94676 0.9088780
62 106548    99389     99188     99150  98848   97199 0.9122555
4  105079    97566     97041     97098  96677   95878 0.9124373
6  107847    99599     98875     98804  98616   98550 0.9137945
46 104395    96654     96218     96273  95648   95526 0.9150438
2  105438    97596     97270     97167  96749   96539 0.9155997
12 106735    99655     98526     98514  97993   97875 0.9169907
79 105764    97904     97584     97706  97222   96992 0.9170606
30 103914    97188     96514     96541  95704   95366 0.9177397
67  83848    77853     77424     77354  77160   77018 0.9185431
77 101799    93742     93679     93707  93647   93647 0.9199206
69  94890    88041     87666     87691  87397   87307 0.9200864
80 106598    98554     98418     98431  98257   98083 0.9201205
31 105516    98324     97599     97566  97327   97307 0.9222014
16  97238    90410     89887     89919  89757   89754 0.9230342

Without min fold over abundance = 8

input filtered denoisedF denoisedR merged nonchim          
40 105864    97956     86758     92367  72478   53537 0.5057149
41 106148    98917     87940     93681  71980   54245 0.5110318
19 105598    98662     88274     93451  72128   54034 0.5116953
39 106265    98023     86479     92379  71256   54556 0.5133958
38 106030    99043     87026     93621  71488   54607 0.5150146
20 106783   100049     89866     95658  74487   55039 0.5154285
44 106774    98323     83282     90280  65161   55222 0.5171858
42 107262    99508     83357     91463  65685   55924 0.5213776
72 105997    97510     81120     90368  64064   56327 0.5314018
7  104346    97044     87834     93226  75232   56148 0.5380944
43 107016    99352     84097     91537  67670   58673 0.5482638
21 107233    99662     87403     93737  71748   58908 0.5493458
71 106080    97430     87574     92227  75382   64901 0.6118118
8  104845    97094     89591     93811  80011   66362 0.6329534
32 105428    97861     91529     95035  85269   74027 0.7021569
9  105867    97983     90700     94767  83144   78710 0.7434800
27 104928    97937     90955     95009  84265   80555 0.7677169
34 106373    99104     93457     96301  88125   82740 0.7778290
33 105056    97594     93067     95499  88819   83921 0.7988216
54 106568    99445     95515     97329  91243   87596 0.8219728
76 105979    97275     94725     95409  91997   88034 0.8306740
66 106513    98504     96065     97165  93361   88601 0.8318327
28 106249    99479     95353     97429  91050   88487 0.8328267
53 104719    97837     94337     95517  90619   88119 0.8414805
51 105485    98520     96132     96556  93178   91220 0.8647675
15 105608    97983     96944     96988  95528   92524 0.8761079
37 104518    97503     96102     96426  94750   91663 0.8770068
45 104550    97066     95335     95695  93076   91733 0.8774079
56 105058    97828     95970     96298  93760   92188 0.8774962
52 107974   100132     98438     98726  96082   94767 0.8776835
64 106611    98318     97741     97776  96968   94247 0.8840270
36 105748    97861     96575     96879  95186   93515 0.8843193
3  106734    99111     97267     97859  95382   94395 0.8843949
55 107280    99804     98230     98332  96327   95154 0.8869687
1  106332    98763     97536     97564  95883   94586 0.8895347
25 106509    99439     96065     95947  94984   94763 0.8897182
35 105595    98656     97212     97501  95584   94064 0.8907998
65 107555    99661     98740     98885  97742   95875 0.8914044
61 105216    98265     97825     97768  97092   93793 0.8914329
14 106609    99030     98200     98119  96947   95059 0.8916602
75 105622    97690     97142     97126  96370   94299 0.8927970
48 105468    97706     96481     96652  94965   94166 0.8928395
63 105932    97634     97025     97084  96326   94586 0.8928936
50 106839    99217     97879     97958  96210   95529 0.8941398
29 107303    99454     98060     98030  96716   95966 0.8943459
73 104463    96116     95636     95700  95045   93591 0.8959249
5  107025    99410     98829     98857  98289   95963 0.8966410
13 106475    98482     97691     97717  96636   95523 0.8971402
68  78311    72598     71035     71013  70423   70292 0.8976006
10 105310    97569     95602     95390  94923   94752 0.8997436
78 102952    94862     94241     94327  93529   92790 0.9012938
47 104861    97183     96333     96261  95241   94514 0.9013265
60 107624    99880     99496     99507  99065   97126 0.9024567
11 103237    95859     93912     93969  93362   93274 0.9034939
49 106405    98948     98179     98062  96895   96137 0.9035008
24 105307    97849     96562     96742  95873   95158 0.9036246
74 104168    96375     96236     96286  96137   94330 0.9055564
26  94211    88023     86155     86192  85516   85432 0.9068156
23  88313    82148     80657     80697  80274   80155 0.9076240
62 106548    99389     99188     99150  98848   96898 0.9094305
4  105079    97566     97041     97098  96677   95696 0.9107053
46 104395    96654     96218     96273  95648   95163 0.9115666
6  107847    99599     98875     98804  98616   98365 0.9120791
79 105764    97904     97584     97706  97222   96614 0.9134866
2  105438    97596     97270     97167  96749   96421 0.9144805
30 103914    97188     96514     96541  95704   95152 0.9156803
12 106735    99655     98526     98514  97993   97796 0.9162505
69  94890    88041     87666     87691  87397   87011 0.9169670
16  97238    90410     89887     89919  89757   89271 0.9180670
67  83848    77853     77424     77354  77160   76987 0.9181734
80 106598    98554     98418     98431  98257   97922 0.9186101
77 101799    93742     93679     93707  93647   93560 0.9190660
31 105516    98324     97599     97566  97327   97281 0.9219550
1 Like

Good morning Lu,

This is exactly what I was going to suggest. You will get all the settings in R!

It’s not the end of the world! :man_shrugging: And now you know for next time!

Without controls, it’s very difficult to answer these questions. So… :woman_shrugging:

Here’s how I would approach this: 1) use modern methods, 2) use standard settings.

  1. Modern methods like dada2 are tolerant of many data types. Unless a sample failed to sequence, dada2 should be able to process it well.
  2. The qiime devs care about the default settings, and take time to choose good ones. These settings probably work great, and only need to be changed if you have evidence showing your samples need to be processed differently. “Tried-and-true” methods are also easy to defend during publication.

I think the biological findings are more interesting than the methods, anyway.
:earth_africa::deciduous_tree::microbe:

Colin

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.