Reads processing with different primers

Lu_Yang · January 29, 2018, 9:39pm

Hi, community,

May I know is that possible to deal with the sequences that sequenced by different primers (but primers have the overlapped region)? Such as the sequences that sequenced by 515F-806R and the sequences that sequenced by 341F-806R. Actually, they have the same overlapped region 515F-806R. SUre I can seperate deal with them in QIIME2 based on the protocal. But it is not fine to compare them. Their final OTU numbers are different.

May I know is that fine to trim them to the same region 515f-806r, then use DEBLUR/DADA2 to deal with them, then classifier the taxonomy? OR, separately analysis them in DEBLUR/DADA2 to get the feature table. Then any other way out to get the taxonomy?

Thanks in advance.

Mehrbod_Estaki · January 29, 2018, 11:02pm

Hi @Lu_Yang, while you wait for an expert answer on your question, have you looked at the q2-fragment-insertion plug-in? I haven't used it personally but it sounds like it was designed with your very same issue in mind, using different regions for analysis. Checkout their tutorial here.

Lu_Yang · January 30, 2018, 1:04am

Hi, @Mehrbod_Estaki,

Thanks for your suggestion. I have read through it. But seems not fully understand it. And the installation met some problem.

Nicholas_Bokulich · January 30, 2018, 3:26pm

Hi @Lu_Yang,
You have a few different options here:

@Mehrbod_Estaki's suggestion is excellent (thank you for the suggestion!) — comparing datasets amplified with different primers is indeed what q2-fragment-insertion was designed for (to my knowledge). I would think that plugin would be most advantageous, however, when comparing datasets with non-overlapping amplicons. So you have other options.
The process that you describe — trimming the longer reads to 515f-806r — is absolutely okay. Of course you lose the additional sequence information but it sounds like that is not important here. You would trim the reads, process separately by dada2/deblur, then merge together into a single feature table and single representative sequence file. Then classify taxonomy on the merged sequences.
You could process separately (without trimming), then classify taxonomy separately. Then collapse both feature tables on level 7 taxonomy, and merge those tables. Then all downstream analyses (e.g., diversity analyses) would be based on taxonomic information and the precise primer site does not matter too much. However, this is definitely the weakest option of the three, because (1) the longer reads may yield deeper taxonomic assignments, so collapsing on species level may still yield very different profiles between the different datasets and (2) collapsing on taxonomy reduces the amount of information you have — i.e., diversity analyses with ASVs are much more sensitive for differentiating sub-groups within your data. This approach would really only be most advantageous when trying to merge datasets from very different amplicons (e.g., 16S rRNA genes and a protein-coding gene).

I hope that helps!

Lu_Yang · January 30, 2018, 5:21pm

Hi, @Nicholas_Bokulich,

Thanks for the detailed answers. Also again thanks to @Mehrbod_Estaki. May I know more details about the following? Thanks in advance.

(1) I am more understanding the suggestion one of q2-fragment-insertion. I am still trying now.
a. I have tried the code
qiime fragment-insertion sepp \

--i-representative-sequences rep-seqs-341.qza
--o-tree insertion-tree-341.qza
--o-placements insertion-placements-341.qza

These codes have been running for more than one hour, still have no results comes out. I just have 3 samples in the testing. May I know is there any settings that can use multiple threads as DADA2 procedure did? Thanks.

b. As the github protocol listed code.
qiime fragment-insertion sepp
--i-representative-sequences rep-seqs.qza
--o-tree insertion-tree.qza
--o-placements insertion-placements.qza

(Does the rep-seqs.qza in this procedure means the results of merged rep-seqs.qza from primer 1 and the rep-seqs.qza from primer 2? I am not sure about the understanding)

qiime fragment-insertion classify-otus-experimental
--i-representative-sequences rep-seqs.qza
--i-tree insertion-tree.qza
--i-reference-taxonomy taxonomy_gg99.qza
--o-classification taxonomy.qza

Then I will get the final taxonomy result as the final taxonomy.qza similar to the Moving picture tutorial produced.

(2) About the trimming the longer reads to the overlapped region, such as 515f-806r. May I know are there any code or protocol can be used in QIIME2 or others? Thanks.

(3)I use the following command to merge the table, "qiime taxa collapse --i-table table.qza --i-taxonomy taxonomy.qza --p-level 7 --o-collapsed-table table-I7.qza". I have tried, and indeed this will give the result table only at the species level, NOT at OTU level. And I think I will not continue try this procedure. BUT very informative for me. Thanks.

Nicholas_Bokulich · January 31, 2018, 3:05pm

@Lu_Yang,

Yes, you can use q2-cutadapt to trim your demultiplexed reads prior to denoising. See this tutorial

Yes that is the point — to collapse at species level (since OTUs from different primer regions will be unique even if they overlap). But I think we can all agree this is not the best solution for your case.

Regarding your questions about q2-fragment-insertion, you can check out this post about this community plugin, or perhaps @Stefan can answer your questions.

I hope that helps!

Lu_Yang · January 31, 2018, 4:55pm

Hi, @Nicholas_Bokulich,

Thanks for your detailed answer.

I have successfully finished the procedure in the q2-fragment-insertion. And seems understand the procedure now. But I am still looking forward to @Stefan answering the question on the multithreads of q2-fragment-insertion. And also make sure that my understanding is right.

Thanks.

Stefan · January 31, 2018, 5:18pm

Hi @Lu_Yang,

you can invoke the help message for the plugin by executing qiime fragment-insertion sepp --help. This gives you the information you are looking for --p-threads INTEGER is The number of threads to use [default: 1].

The runtime does not depend on the number of samples, but only on the number of representative sequences. If you have ~4,000 sequences, typical runtime is about 2,5h but scales very well when running in parallel, thus with 4 threads the same task can be completed within ~50min.

Hope that helps,
Stefan

Lu_Yang · January 31, 2018, 9:04pm

Hi, @Stefan,

I see. Thanks so much!

Best.

Chloe