Exporting demux artifact error


(Chris Burgess) #1


I am trying to process my ITS sequencing reads; however, I’ve hit a couple of road blocks. It seems some of my samples did not get sequence well so I have a number of samples with very few reads which is breaking q2-itsxpress plugin. Since there is no way to filter a demux atrifact by number of sequences I have to export the demux artifact out then reimport it into qiime using a manifest. So here is my problem. when I export the demux artifact I have a few samples which export in 2 different sets of paired end files. I guess I am just trying to get to the bottom of this.

Here is what I am doing:

 qiime tools import \
  --type EMPPairedEndSequences \
  --input-path "${DIRPATH}READS/RAW1" \
  --output-path "${output}r1_sequences.qza"

qiime demux emp-paired \
  --i-seqs "${output}r1_sequences.qza" \
  --m-barcodes-file "${DIRPATH}READS/its_mappign_file_01.txt" \
  --m-barcodes-column BarcodeSequence \
  --p-rev-comp-mapping-barcodes \
  --o-per-sample-sequences "${output}r1_sequences_demux.qza"

qiime tools export \
  --input-path "${output}r1_sequences_demux.qza" \
  --output-path "${DIRPATH}dada2/qiime/temp"

However, here is the problem:

(qiime2-2018.11) [[email protected]]$ qiime tools export \
>   --input-path "${output}r1_sequences_demux.qza" \
>   --output-path "${DIRPATH}dada2/qiime/temp"
Exported /home/roots/burgesch/Myrold_lab/Chris/Thesis/ITS/dada2/qiime/qiime2_its_r1_sequences_demux.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory /home/roots/burgesch/Myrold_lab/Chris/Thesis/ITS/dada2/qiime/temp
(qiime2-2018.11) [[email protected]]$ ls -ltr "${DIRPATH}dada2/qiime/temp" | grep "1096"
-rw-r--r-- 1 burgesch roots_dept 3065170 Jan 31 13:06 NRCS1096_58_L001_R2_001.fastq.gz
-rw-r--r-- 1 burgesch roots_dept 2349277 Jan 31 13:06 NRCS1096_58_L001_R1_001.fastq.gz
-rw-r--r-- 1 burgesch roots_dept 2297699 Jan 31 13:54 NRCS1096_195_L001_R2_001.fastq.gz
-rw-r--r-- 1 burgesch roots_dept 2060455 Jan 31 13:54 NRCS1096_195_L001_R1_001.fastq.gz
-rw-r--r-- 1 burgesch roots_dept 2109629 Jan 31 13:55 NRCS0389_432_L001_R1_001.fastq.gz
(qiime2-2018.11) [[email protected]]$

We have 1 sample which exported at 2 different paired end reads. Any idea why this is happening? I think this is more of a bug than an error.


(Matthew Ryan Dillon) assigned thermokarst #2

(Matthew Ryan Dillon) #3

Hey there @Chris_Burgess!

This is on our radar!

Hmm, weird!

Would you be willing to share download links (can be in a DM to me) to look at:

  • ${output}r1_sequences.qza
  • ${output}r1_sequences_demux.qza
  • ${DIRPATH}READS/its_mappign_file_01.txt

Thanks! :qiime2:

(Matthew Ryan Dillon) unassigned thermokarst #4

(Chris Burgess) #5

Hi Matt,

Here are the files you asked for (they are kinda big). I also uploaded the demux summary file as well.


(Matthew Ryan Dillon) assigned thermokarst #6

(Matthew Ryan Dillon) #7

Hi @Chris_Burgess — unfortunately that download link doesn’t work for me - it is prompting me to sign in to your institution’s SSO.

(Matthew Ryan Dillon) unassigned thermokarst #8

(Chris Burgess) #9

Hi @thermokarst the link should work now. Let me know if it is too big and I’ll figure out a way of pruning down the sequencing files.

(Matthew Ryan Dillon) assigned thermokarst #10

(Matthew Ryan Dillon) #11

Pulled the files down, will take a peek shortly. Stay tuned!

(Matthew Ryan Dillon) #12

I have an idea — check out the timestamps above, the NRCS1096_58 files were written at 13:06, while the NRCS1096_195 files were written almost an hour later at 13:54 — I suspect you didn’t clean out your temp/ dir between export runs.

I pulled down your data and poked through — there is no evidence of NRCS1096_58 anywhere in these data — my best guess is that this file is left-over from a previous debugging run. What do you think?

(Matthew Ryan Dillon) unassigned thermokarst #13

(Chris Burgess) #14

Huh, I think you’re right @thermokarst! I also am not coming up with duplicates when I re export it. Thanks for the help!