Issue with Bray-Curtis PCOA

Hello,
I have been having some trouble with my 16S analysis when I visualize the Bray-Curtis PCOA (from the core phylogenetics output):

When I look at the distance matrix that was also created, all of the values are 1, indicating that there are no shared features between any of my samples. These results seem like something is wrong at some point in my analysis. I've looked at this thread Bray-Curtis PCoA results visualisation problem - #5 by Sapsanas to try and figure out a solution to my issue, but the issue there was the presence of NA values in their metadata set, and to my knowledge qiime2 now accepts NAs in the metadata file.

Thank you for your help!

Good morning @ncep112,

Thanks for posting that screenshot. Yes, something is wrong!

That's correct. Let's see if we can figure out what removed shared microbes between samples. Did you perform any filtering or merging on your feature table?

I've also seen this issue when demultiplexing failed, leaving barcodes inside the reads. These unique barcodes made common features seam distinct, leading to totally different samples (which looks similar once the barcode issue was fixed).

Colin

Thank you for the fast reply!

Yes, I originally had to merge two datasets (same amplicon, they were just sequenced at two different times). I thought that was causing the issue so I went back and re-ran everything with only one of my datasets and still had the same issue.

These are the steps that I followed (I used the Moving Pictures tutorial as a guide):

  1. I extracted barcodes using Qiime 1’s extract barcodes command (another colleague did this for his data with no issues)
  2. I used the artifact creation step for paired end data
  3. I demultiplexed and used the demux visualization file to determine where to trim the reads
  4. I used DADA2 for quality control
  5. I originally used the feature merge command to merge my datasets, but I also tried running everything on my datasets separately and that did not fix the issue
  6. I followed the commands to create a tree which I used for the diversity analysis
  7. Finally, I ran the core metrics phylogeny command and visualized the Bray-Curtis PCOA.

Is there a way that I can check to see if the barcodes were left within the reads?

Thank you for the help!

Hi there @ncep112!

Unfortunately not at the present moment, sorry!

Have you had a chance to check out the cutadapt community tutorial? You could start with your reads before your step 1 listed above, import following the cutadapt guide suggested above, then demultiplex and/or trim any additional known adapters.

Let us know if you have any questions! :t_rex:

One option, in a quick and dirty way is to check a barcodes via grep:

$ grep -c '^AGCTGACTAGTC' your_input_file.fna_or_fastq

Note that this assumes that the barcode is at the beginning of the sequence. By the way, you could do the same for primers as long as you "expand" the ambiguous nucleotides; for example for GTGCCAGCMGCCGCGGTAA, you could search for:

$ grep -c '^GTGCCAGCAGCCGCGGTAA' your_input_file.fna_or_fastq
$ grep -c '^GTGCCAGCCGCCGCGGTAA' your_input_file.fna_or_fastq

In any scenario the counts should be pretty low.

Hope this helps.

2 Likes

Thank you for the reply! I have been trying to follow the cutadapt community tutorial for paired-end reads but it keeps timing out. Do you know approximately how long this should take? (Right now, it it fails after 4 days, which makes me think there is an issue.)

Thank you!

Hi @ncep112 -

Nope, this is going to depend on so many factors - computational resources, number of reads, length of reads, etc...

Can you provide us with a little more detail so that we can help?

  • What version of QIIME 2 are you running this on?
  • What is the exact command or commands you are running? Copy and paste please
  • What is the complete error you are seeing? Run with --verbose or copy and paste the log file saved on termination.

Thanks!

Thank you for your help!

We are using version 2017.12.

Here is the command I ran:

qiime cutadapt demux-paired --i-seqs multiplexed-seqs.qza --m-forward-barcodes-file NE_metadata_16S_batch1.tsv --m-forward-barcodes-category BarcodeSequence --o-per-sample-sequences demultiplexed-seqs.qza --o-untrimmed-sequences untrimmed.qza --verbose

It hasn’t given us an error, but it has been running for 12 days and based on how much computational resources are currently being used on the node its “running” on, I don’t think its actually doing anything. It did start properly though, and according to the outfile it is running:

This is cutadapt 1.15 with Python 3.5.4
Command line parameters: --front file:/tmp/tmphvl3za80 --error-rate 0.1 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-2e15vgt8/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-p7jg0mux/forward.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-2e15vgt8/{name}.2.fastq.gz --untrimmed-paired-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-p7jg0mux/reverse.fastq.gz /tmp/qiime2-archive-qliunjuj/575bb642-0496-44e9-b34c-b9cb010c5e3a/data/forward.fastq.gz /tmp/qiime2-archive-qliunjuj/575bb642-0496-44e9-b34c-b9cb010c5e3a/data/reverse.fastq.gz
Running on 1 core
Trimming 48 adapters with at most 10.0% errors in paired-end legacy mode …

Hey there @ncep112!

I would probably terminate the job - sounds like something went wrong...

Next, I would install 2018.6.

Then, rerun!

Keep us posted! :qiime2:

We installed version 2018.6 and tried to rerun the command, but we had the same issue- it hasn’t given us an error, but it also hasn’t finished running (its been running for more than 5 days). Based on the computational resources being used, we don’t think its actually doing anything.

Do you have any other suggestions?

Thank you!

How many reads are in this file you are trying to demultiplex?

This plugin uses cutadapt directly - are you able to demultiplex using that tool independently, then import into QIIME 2?

Its possible this is an issue with the computation environment — it sounds like this might be running on a cluster - can you provide some details about the configuration there? Also, maybe it is worth trying to run the command on another machine.

Hello,

We have approximately 7,861,000 reads in the 48 samples in our file (about 17,000 reads per sample).

We have tried installing the newest version of cutadapt (1.16) but we have not been able to successfully run it yet.

We are running this on a cluster- it runs on RedHat Enterprise Linux (2.6.32-696.30.1.e16.x86_64) Server 6.

Thank you for your help!

Since I posted the last comment, I was able to figure out the issue with cutadapt- it says demultiplexing paired-end files is not available, so this could be where we are running into all of the issues. Do you have any suggestions about this?

Thank you for the help!