Filtered OTU table using spike-in mock community to get actual read count

Isis_Guibert · April 12, 2023, 7:46am

Hi,

I have soil samples in which I spiked-in a known mock community (zymobiomics) prior to DNA extraction. I also have negative control and positive control (=mock community) samples. After 16S sequencing, I used QIIME2 and obtained a rep-seqs.qza and taxonomy.qza file/qzv file.
Using that file I would like filter out my table to get "actual reads count" based on the mock community I used. I would like to know if there is a way to do that with QIIME2?
I saw that QIIME2 have some filtering option but couldn't find exactly what I want. I want to do is to recover relative abundance of the known mixed species to apply a correction factor to a sample’s relative abundances.
Using my negative samples, I would like to filter out for potential contamination. I wonder if I should do that before doing the normalization step I want to do or not?
Thank you!

cherman2 · April 12, 2023, 8:03pm

Hi @Isis_Guibert,

Can you elaborate on this? Do you want to filter your table down to features that were in your mock community?

As for using your negative samples, I would personally run it with and without filtering for potential contamination. I do this because a negative control can give you an idea of what contamination could have occurred but there is also a chance that its a real signal in your data that you are filtering out. I would do this filtering step as early as possible in your pipeline.

Hope that helps!

Isis_Guibert · April 13, 2023, 1:38am

Hi @cherman2,

Thank you for your reply.
Yes, this is what I want to do but I don't want to just remove the mock community I want to normalise my data according to the mock community. For each of my sample, I spiked-in 100 ul of mock community. Lot's of recent review are saying that we should use mock community to get actual reads counts.
Some recommend dividing each sample's OTU size by the spike in reads in that sample, some to do the means of the mock community and then divide, I am not sure what is best and how to do that.
I saw that there is a software called AMPtk that provide an OTU filtering steps that can provide actual read counts but it doesn't seems to be compatible to QIIME2 files.
https://amptk.readthedocs.io/en/latest/filtering.html

I would like to see if it's possible to do the same with QIIME2.

So far my negative samples looks like more a small contamination than something from my samples but I will follow your advice and try with and without.

cherman2 · April 13, 2023, 3:41pm

Hi @Isis_Guibert,
Thank you for the clarification! From what I know there is not a way to do this in the QIIME 2 ecosystem.

If you want to use AMPtk, you could extract your qiime2 feature table and use the biom table for AMPtk and then re-import the output of AMPtk into qiime2 as a rarefied table.

Hope that helps!

Isis_Guibert · April 20, 2023, 6:16am

Hi @cherman2,

Thank you. I am bit lost with what to use and wondering if you could help.

According to the AMPTK pipeline I should use:
amptk filter -i OTU_table.txt -f OTU_fasta.fa -b spike -m mock2

With :
-b, --mock_barcode Name of barcode of mock community (Recommended)
-m, --mc Mock community FASTA file. Required if -b passed [synmock,mock1,mock2,mock3,other]

I am not sure what qiime2 file should I provide:
OTU_table.txt = dada2_table ? (feature - frequency - nb of samples observed in)
OTU_fasta.fa = downloaded fasta file from rep.seqs.qzv (feature ID + fasta sequence)?
spike = the name of my positive samples with only spike-in? I am really unsure about that one.
mock2 = fasta file created using the feature ID from my positive samples + sequences?

Then I should get a new dada2_table normalized that I can use to create a new taxa barplot using my taxonomy.qza file and for futher filtration (such as taxonomy based to remove the mitochondria & chloroplast) and data analysis?
Do you have the same understanding than me that AMPtk will remove the spike-in from each of my sample and normalise them?

Thank you so much

cherman2 · April 20, 2023, 4:59pm

Hi @Isis_Guibert,
We are definitely getting out of my wheelhouse here but I will try to help as much as I can.

Yep! Extract the dada2-table.qza(or whatever you named the table that came out of dada2). then use biom convert to convert it from a biom table to a .txt

Almost you will want to export this from rep.seqs.qza.

-b spike seems to be the name of the barcode column in their metadata? but I dont see where you pass in the metadata, so I am not exactly sure.

seems to be the a fasta of your Mock communities. Required if -b passed. [synmock,mock1,mock2,mock3,other]

Then you should be able to re-import your table and it will be normalized to your mock community

That is my understanding but I am not very familiar with AMPtk.

I hope that helps a little! Sorry I couldn't be more help!

system · May 21, 2023, 10:59pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.