I have analysed a time-series of samples, including 'q2 deicode rpca'. For some reasons, the samples of the latest time points, did not distribute along axis 1 and 2, but only on axis 3 (shown in the biplot for the latest 2 timepoints). If I analyse the latest timepoint alone, I get the samples distributed across the rpca plot as expected. I have attached both biplot.qzv packages.biplot.qzv (1.2 MB) biplot.qzv (1.2 MB)
The only obvious difference between data of different timepoints, that I see at the moment, is that the sequencing depth for the latest samples is considerably lower compared to the other sample. However, the samples were filtered before to remove low-abundant features. The taxa barplot was fine, and unifrac plots were also as expected.
I am using qiime2-2021.2 with the latest deicode/qurro plugins. (I have tested it with qiime2-2021.4, same results).
I would appreciate your comments.
Hi @arwqiime ,
From a quick glance it seems as though there is a very large separation based on your Batch metadata variable, which as you pointed out has experienced very different sequencing depth. How much of a difference was there in sequencing depth between these 2 batches? If you able to share your feature-table summary artifact that would be very helpful.
Also, are these samples from similar communities or by chance completely different sample types?
If these 2 batches were processed or sequenced separately then you'll also want to denoise them with DADA2 separately before merging (which is something I didn't see in your provenance).
Overall though, the likely source of your issue is that there is such a large difference in sequencing depth between your 2 batches that not many shared ASVs are separating the variation in your dataset, filtering rare features in this case may not actually help reduce the batch effect.
You don't see this with the UniFrac distances is because you probably rarefied all your samples to the same depth and thus eliminating that batch effect. RPCA is a powerful distance matrix but because it doesn't operate on rarefied data it does require this extra sanity check which you have done.
Hi @Mehrbod_Estaki ,
based on your comments, I looked deeper into the two batches and found the reason for the differentiation between the two batches. The first batch of sequences were from MiSeq 300 bp paired end sequencing, the second from iSeq-100 single end (293 bp mode).
I used only the R1 reads up to 240 bp, cleaned for forward primers. But I realized now that the library of the first batch was created by an external company by ligating Illumina adapters, while I used a second index PCR to attach adapters. So, batch 1 contained a mixture of forward and reverse reads in R1 and R2 datasets.
Similar issue has been described in this forum earlier, ( merging R1 and R2 reads with mixed-orientation files ); Thanks to this post and thank you for your comments.
I will now search for a way to filter 'forward' (U341F primer) reads in R1 and/or R2 fastq files. I am mot sure whether this will be possible in Qiime2? I will try q2 cutadapt trim-single with the --p-discard-untrimmed option.
Hi @arwqiime ,
There are a few options for dealing with the mixed-orientation issue:
this would be best, if the primers are indeed in the reads.
If your goal is to filter rather than re-orient, you could filter pre-denoising by using
qiime quality-control filter-reads, or post-denoising with
qiime quality-control exclude-seqs. You would need to create some reference sequences to use for filtering, e.g., a segment of the read only present in the reads that are in the correct orientation (5' end of the amplicon but not 3' end), or the primer sequence if it was never trimmed.
- alternatively, you can re-orient your reads using the RESCRIPt plugin. This sounds like a suboptimal solution, though, since you are only using single-end R1 reads (so you probably want to filter rather than reorient, since the reads in different orientations will cover different segments of the full amplicon).
Hi @Nicholas_Bokulich ,
Thank you for listing the different options.
This seems to work quite well, as roughly 50% of the reads were kept indicating the random orientation of the ligation products.
This is a great idea, which I will apply in a different project where the gut microbiome DNA is contaminated by host genomic DNA. This was an experiment where severe stress resulted in damage of gut epithel cells, which released fragmented genomic host DNA to the gut content (and somehow made it's way into the 515F-806R library). Great idea to filter out this way.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.