deicode rpca plot - axis 1/2 issue

arwqiime · July 21, 2021, 9:03am

Hi,
I have analysed a time-series of samples, including 'q2 deicode rpca'. For some reasons, the samples of the latest time points, did not distribute along axis 1 and 2, but only on axis 3 (shown in the biplot for the latest 2 timepoints). If I analyse the latest timepoint alone, I get the samples distributed across the rpca plot as expected. I have attached both biplot.qzv packages.biplot.qzv (1.2 MB) biplot.qzv (1.2 MB)

The only obvious difference between data of different timepoints, that I see at the moment, is that the sequencing depth for the latest samples is considerably lower compared to the other sample. However, the samples were filtered before to remove low-abundant features. The taxa barplot was fine, and unifrac plots were also as expected.
I am using qiime2-2021.2 with the latest deicode/qurro plugins. (I have tested it with qiime2-2021.4, same results).
I would appreciate your comments.
Best regards,

Mehrbod_Estaki · July 21, 2021, 10:11pm

Hi @arwqiime ,

From a quick glance it seems as though there is a very large separation based on your Batch metadata variable, which as you pointed out has experienced very different sequencing depth. How much of a difference was there in sequencing depth between these 2 batches? If you able to share your feature-table summary artifact that would be very helpful.
Also, are these samples from similar communities or by chance completely different sample types?

If these 2 batches were processed or sequenced separately then you'll also want to denoise them with DADA2 separately before merging (which is something I didn't see in your provenance).
Overall though, the likely source of your issue is that there is such a large difference in sequencing depth between your 2 batches that not many shared ASVs are separating the variation in your dataset, filtering rare features in this case may not actually help reduce the batch effect.

You don't see this with the UniFrac distances is because you probably rarefied all your samples to the same depth and thus eliminating that batch effect. RPCA is a powerful distance matrix but because it doesn't operate on rarefied data it does require this extra sanity check which you have done.

arwqiime · July 22, 2021, 8:42am

Hi @Mehrbod_Estaki ,
based on your comments, I looked deeper into the two batches and found the reason for the differentiation between the two batches. The first batch of sequences were from MiSeq 300 bp paired end sequencing, the second from iSeq-100 single end (293 bp mode).
I used only the R1 reads up to 240 bp, cleaned for forward primers. But I realized now that the library of the first batch was created by an external company by ligating Illumina adapters, while I used a second index PCR to attach adapters. So, batch 1 contained a mixture of forward and reverse reads in R1 and R2 datasets.
Similar issue has been described in this forum earlier, ( merging R1 and R2 reads with mixed-orientation files ); Thanks to this post and thank you for your comments.

I will now search for a way to filter 'forward' (U341F primer) reads in R1 and/or R2 fastq files. I am mot sure whether this will be possible in Qiime2? I will try q2 cutadapt trim-single with the --p-discard-untrimmed option.

Best regards.

Nicholas_Bokulich · July 22, 2021, 9:35am

Hi @arwqiime ,
There are a few options for dealing with the mixed-orientation issue:
1.

this would be best, if the primers are indeed in the reads.

If your goal is to filter rather than re-orient, you could filter pre-denoising by using qiime quality-control filter-reads, or post-denoising with qiime quality-control exclude-seqs. You would need to create some reference sequences to use for filtering, e.g., a segment of the read only present in the reads that are in the correct orientation (5' end of the amplicon but not 3' end), or the primer sequence if it was never trimmed.

alternatively, you can re-orient your reads using the RESCRIPt plugin. This sounds like a suboptimal solution, though, since you are only using single-end R1 reads (so you probably want to filter rather than reorient, since the reads in different orientations will cover different segments of the full amplicon).

Good luck!

arwqiime · July 22, 2021, 11:06am

Hi @Nicholas_Bokulich ,
Thank you for listing the different options.

This seems to work quite well, as roughly 50% of the reads were kept indicating the random orientation of the ligation products.

This is a great idea, which I will apply in a different project where the gut microbiome DNA is contaminated by host genomic DNA. This was an experiment where severe stress resulted in damage of gut epithel cells, which released fragmented genomic host DNA to the gut content (and somehow made it's way into the 515F-806R library). Great idea to filter out this way.

Best regards