Issue with Bray-Curtis PCOA

One option, in a quick and dirty way is to check a barcodes via grep:

$ grep -c '^AGCTGACTAGTC' your_input_file.fna_or_fastq

Note that this assumes that the barcode is at the beginning of the sequence. By the way, you could do the same for primers as long as you "expand" the ambiguous nucleotides; for example for GTGCCAGCMGCCGCGGTAA, you could search for:

$ grep -c '^GTGCCAGCAGCCGCGGTAA' your_input_file.fna_or_fastq
$ grep -c '^GTGCCAGCCGCCGCGGTAA' your_input_file.fna_or_fastq

In any scenario the counts should be pretty low.

Hope this helps.

2 Likes

Thank you for the reply! I have been trying to follow the cutadapt community tutorial for paired-end reads but it keeps timing out. Do you know approximately how long this should take? (Right now, it it fails after 4 days, which makes me think there is an issue.)

Thank you!

Hi @ncep112 -

Nope, this is going to depend on so many factors - computational resources, number of reads, length of reads, etc...

Can you provide us with a little more detail so that we can help?

  • What version of QIIME 2 are you running this on?
  • What is the exact command or commands you are running? Copy and paste please
  • What is the complete error you are seeing? Run with --verbose or copy and paste the log file saved on termination.

Thanks!

Thank you for your help!

We are using version 2017.12.

Here is the command I ran:

qiime cutadapt demux-paired --i-seqs multiplexed-seqs.qza --m-forward-barcodes-file NE_metadata_16S_batch1.tsv --m-forward-barcodes-category BarcodeSequence --o-per-sample-sequences demultiplexed-seqs.qza --o-untrimmed-sequences untrimmed.qza --verbose

It hasn’t given us an error, but it has been running for 12 days and based on how much computational resources are currently being used on the node its “running” on, I don’t think its actually doing anything. It did start properly though, and according to the outfile it is running:

This is cutadapt 1.15 with Python 3.5.4
Command line parameters: --front file:/tmp/tmphvl3za80 --error-rate 0.1 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-2e15vgt8/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-p7jg0mux/forward.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-2e15vgt8/{name}.2.fastq.gz --untrimmed-paired-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-p7jg0mux/reverse.fastq.gz /tmp/qiime2-archive-qliunjuj/575bb642-0496-44e9-b34c-b9cb010c5e3a/data/forward.fastq.gz /tmp/qiime2-archive-qliunjuj/575bb642-0496-44e9-b34c-b9cb010c5e3a/data/reverse.fastq.gz
Running on 1 core
Trimming 48 adapters with at most 10.0% errors in paired-end legacy mode …

Hey there @ncep112!

I would probably terminate the job - sounds like something went wrong...

Next, I would install 2018.6.

Then, rerun!

Keep us posted! :qiime2:

We installed version 2018.6 and tried to rerun the command, but we had the same issue- it hasn’t given us an error, but it also hasn’t finished running (its been running for more than 5 days). Based on the computational resources being used, we don’t think its actually doing anything.

Do you have any other suggestions?

Thank you!

How many reads are in this file you are trying to demultiplex?

This plugin uses cutadapt directly - are you able to demultiplex using that tool independently, then import into QIIME 2?

Its possible this is an issue with the computation environment — it sounds like this might be running on a cluster - can you provide some details about the configuration there? Also, maybe it is worth trying to run the command on another machine.

Hello,

We have approximately 7,861,000 reads in the 48 samples in our file (about 17,000 reads per sample).

We have tried installing the newest version of cutadapt (1.16) but we have not been able to successfully run it yet.

We are running this on a cluster- it runs on RedHat Enterprise Linux (2.6.32-696.30.1.e16.x86_64) Server 6.

Thank you for your help!

Since I posted the last comment, I was able to figure out the issue with cutadapt- it says demultiplexing paired-end files is not available, so this could be where we are running into all of the issues. Do you have any suggestions about this?

Thank you for the help!

This is not correct - cutadapt supports demultiplexing of paired-end reads. Can you provide a link to this note?

This isn't a particularly huge dataset, and should be able to be demultiplexed with cutadapt pretty easily (I would think). As I mentioned above, I think this might be a problem with your computation environment. Please try demuxing with another tool, or on another computer.

Thank you again for the help!

Cutadapt was running an older version (which caused the issue I mentioned previously), but when we updated it again, it appeared to run. However, when we looked at the file sizes for the forward and reverse reads, we noticed they have very different file sizes (297K vs 4.1M). Additionally, here are the first few lines of one of the fastq files:

I'm not sure if this question would be more appropriate for a cutadapt help forum, but I wanted to at least give you an update on where we are now.

Thank you!

1 Like

Hi,

I just wanted to let you know that we’ve finally solved the issue! We updated qiime2 to version 2018.6, re-uploaded the raw data, and started everything from scratch, and the Bray-Curtis plot now looks normal.

Thank you for all of your help!

2 Likes