Merging two projects with different raw file types to perform analysis

sfstellavania · April 20, 2020, 3:05pm

Hello,

So I have 2 projects that has been done. One sequencing project is group A (@6samples) which is sequenced with Ion Torrent S5 so I see that the result only have one fq.gz for every sample as the raw data (single end). The other project is sequencing group B and C (@6samples, total 12 samples) with Illumina Novaseq so the result has two fq.gz as the raw data (pair end) for every sample. For both project I already have the clean tags (.fna files) for every sample and even the OTU .biom files

Now I want to compare between the 3 groups.

But I notice that I cannot just merely start by merging both the OTU .biom files because they both have the same OTU_ID but with different taxonomy (like both have OTU_1 but with different taxonomy details).

So I'm trying to start with by concatenating all the .fna files from every sample (total 18 files) into one file and start with clustering sequences with q2-vsearch tutorial with open reference clustering with 16s database from SILVA.
And then I notice this From QIIME fna to QIIME2 diversity and taxonomy data and see that I can't actually just concatenating all the .fna files together because it will result in a mess (?)
And from that forum, he recommends the user to start from the raw files.. But I'm confused now since the raw files I had is different types, one group is single end while the other two is pair end..

So.. any idea on how to start analysing this 3 groups...? Thank you so much for your help sorry I've been trying to find the similar topic but I can't find it

Nicholas_Bokulich · April 20, 2020, 4:14pm

Hi @sfstellavania,
I think you have two different options here — thanks for reading into this first!

Re-start analysis of both from the raw data in QIIME 2. Even though the raw data are different types (single vs. paired-end) that's fine — just analyze each batch separately and then merge after denoising (or if read lengths etc differ very much you could use an OTU clustering approach after denoising to cluster the denoised reads together)
import your biom tables and taxonomies into QIIME 2, collapse each on taxonomy, and then merge after collapsing.

I would personally go with option 1. You may still need to use methods like OTU clustering to massage the data so that the batches are compatible, but you would be using better methods (denoising methods vs. OTU clustering) for filtering sequence errors, and there would remain the possibility of comparing samples at the sequence level (instead of collapsing on taxonomy, which could still be prone to some biases, e.g., if the different batches have different levels of taxonomic resolution)

Good luck!

system · May 21, 2020, 10:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.