Hi,
I’ve encountered a weird issue that I can’t figure out - hopefully one of you can help!
First off, here is the version info etc:
QIIME2 = qiime2-2018.2
Install method = conda
cutadapt version = 1.15 (as part of same conda env)
Here’s what I did:
- Imported 37 paired-end fastq data into qiime 2
>
> qiime tools import \
> --type 'SampleData[PairedEndSequencesWithQuality]' \
> --input-path manifest-BACT.csv \
> --output-path POTATOE-BACT.qza \
> --source-format PairedEndFastqManifestPhred33
- Visualized the quality of forward and reverse reads
> qiime demux summarize \
> --i-data POTATOE-BACT.qza \
> --output-dir POTATOE-BACT_quality \
> --verbose
- Ran cutadapt to trim primers (redirecting cutadapt’s std output to a file):
> qiime cutadapt trim-paired \
> --i-demultiplexed-sequences POTATOE-BACT.qza \
> --p-front-f CCTACGGGNGGCWGCAG \
> --p-front-r GACTACHVGGGTATCTAATCC \
> --o-trimmed-sequences POTATOE-BACT_primers_trimmed.qza \
> --verbose
Issue encountered: Looking at the stdout from cutadapt, 6 files out of 37 said no primers were detected. I checked manually by opening up the qza artifact file, and sure enough the primers were there in the fastq files. A simple grep search showed that most of the reads had the fwd/rev primer in them.
Troubleshooting steps:
-
Tried importing two of the samples separately (1 that worked and 1 that didn’t). This time, the issue didn’t occur… the file that failed before had its primers detected successfully.
-
Tested same samples (from beginning) with cutadapt standalone. No issue here.
-
Extracted one of the sequence files that failed with qiime2 and one that succeeded from the qza artefact. Reran with cutadapt. No issues.
Upon closer inspection of the logs, it appears that somehow the CLI arguments are getting messed up such that R1 and R2 are switched in these samples in qiime2 (see below) which would explain why it’s not finding primers. I don’t understand how this can be the case as the manifest file I used for the initial batch test looks fine to me (pasted at the very bottom) and it works for the majority of samples. So to me, it seems like qiime2 is importing it correctly according to the manifest but somehow the instructions to cutadapt are getting garbled. But I would love this to be a simple mistake on my part but I just can’t see where it is… I can provide the sequence files privately but prefer not to share them openly as they are not yet published.
Thanks in advance for your help,
Jesse
Log for failed file (batch import of 37 samples to qiime2 then cutadapt run on all together with qiime2):
> This is cutadapt 1.15 with Python 3.5.5
> Command line parameters: --cores 1 --error-rate 0.1 --times 1 --overlap 3 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_41_L001_R2_001.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_4_L001_R1_001.fastq.gz --front CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_41_L001_R2_001.fastq.gz /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_4_L001_R1_001.fastq.gz
> Running on 1 core
> Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
> Finished in 18.84 s (111 us/read; 0.54 M reads/minute).
>
> === Summary ===
>
> Total read pairs processed: 170,296
> Read 1 with adapter: 3 (0.0%)
> Read 2 with adapter: 6 (0.0%)
> Pairs written (passing filters): 170,296 (100.0%)
>
> Total basepairs processed: 85,488,592 bp
> Read 1: 42,744,296 bp
> Read 2: 42,744,296 bp
> Total written (filtered): 85,488,496 bp (100.0%)
> Read 1: 42,744,254 bp
> Read 2: 42,744,242 bp
>
> === First read: Adapter 1 ===
>
> Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 3 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-17 bp: 1
>
> Overview of removed sequences
> length count expect max.err error counts
> 8 1 2.6 0 1
> 17 2 0.0 1 2
>
> === Second read: Adapter 2 ===
>
> Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 6 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
>
> Overview of removed sequences
> length count expect max.err error counts
> 3 4 2660.9 0 4
> 21 2 0.0 2 0 2
Log for successful run where I imported the failed file in a smaller batch (only 2 samples) and it worked inexplicably:
> This is cutadapt 1.15 with Python 3.5.5
> Command line parameters: -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o stationP5outR1.fastq.gz -p stationP5outR2.fastq.gz stationP5_4_L001_R1_001.fastq.gz stationP5_41_L001_R2_001.fastq.gz
> Running on 1 core
> Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
> Finished in 14.72 s (86 us/read; 0.69 M reads/minute).
>
> === Summary ===
>
> Total read pairs processed: 170,296
> Read 1 with adapter: 169,781 (99.7%)
> Read 2 with adapter: 169,584 (99.6%)
> Pairs written (passing filters): 170,296 (100.0%)
>
> Total basepairs processed: 85,488,592 bp
> Read 1: 42,744,296 bp
> Read 2: 42,744,296 bp
> Total written (filtered): 312,231 bp (0.4%)
> Read 1: 131,759 bp
> Read 2: 180,472 bp
>
> === First read: Adapter 1 ===
>
> Sequence: CCTACGGGNGGCWGCAG; Type: regular 3'; Length: 17; Trimmed: 169781 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-17 bp: 1
>
> Bases preceding removed adapters:
> A: 0.0%
> C: 0.0%
> G: 0.0%
> T: 0.0%
> none/other: 100.0%
>
> Overview of removed sequences
> length count expect max.err error counts
> 3 6 2660.9 0 6
> 4 4 665.2 0 4
> 250 18 0.0 1 15 3
> 251 169753 0.0 1 162055 7698
>
> === Second read: Adapter 2 ===
>
> Sequence: GACTACHVGGGTATCTAATCC; Type: regular 3'; Length: 21; Trimmed: 169584 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
>
> Bases preceding removed adapters:
> A: 0.0%
> C: 0.0%
> G: 0.0%
> T: 0.0%
> none/other: 100.0%
>
> Overview of removed sequences
> length count expect max.err error counts
> 3 4 2660.9 0 4
> 5 1 166.3 0 1
> 9 2 0.6 0 0 2
> 244 1 0.0 2 1
> 248 4 0.0 2 0 0 4
> 250 19 0.0 2 10 9
> 251 169553 0.0 2 164116 5030 407
Log for successful run of same file with standalone cutadapt:
> This is cutadapt 1.15 with Python 3.5.5
> Command line parameters: --discard-untrimmed -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -o 3237-WHJ-0005_S5_L001_R1_primers-trimmed.fastq.gz -p 3237-WHJ-0005_S5_L001_R2_primers-trimmed.fastq.gz 3237-WHJ-0005_S5_L001_R1_001.fastq.gz 3237-WHJ-0005_S5_L001_R2_001.fastq.gz
> Running on 1 core
> Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
> Finished in 14.24 s (84 us/read; 0.72 M reads/minute).
>
> === Summary ===
>
> Total read pairs processed: 170,296
> Read 1 with adapter: 169,895 (99.8%)
> Read 2 with adapter: 169,605 (99.6%)
> Pairs written (passing filters): 169,282 (99.4%)
>
> Total basepairs processed: 85,488,592 bp
> Read 1: 42,744,296 bp
> Read 2: 42,744,296 bp
> Total written (filtered): 78,550,377 bp (91.9%)
> Read 1: 39,613,870 bp
> Read 2: 38,936,507 bp
>
> === First read: Adapter 1 ===
>
> Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 169895 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-17 bp: 1
>
> Overview of removed sequences
> length count expect max.err error counts
> 3 85 2660.9 0 85
> 11 1 0.0 1 1
> 14 4 0.0 1 2 2
> 15 24 0.0 1 9 15
> 16 943 0.0 1 203 740
> 17 168558 0.0 1 162055 6503
> 18 280 0.0 1 15 265
>
> === Second read: Adapter 2 ===
>
> Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 169605 times.
>
> No. of allowed errors:
> 0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
>
> Overview of removed sequences
> length count expect max.err error counts
> 15 2 0.0 1 0 2
> 16 3 0.0 1 0 3
> 17 13 0.0 1 10 3
> 18 20 0.0 1 3 17
> 19 103 0.0 1 4 5 94
> 20 1748 0.0 2 86 1604 58
> 21 167289 0.0 2 164116 2972 201
> 22 420 0.0 2 10 379 31
> 23 2 0.0 2 0 0 2
> 25 4 0.0 2 0 0 4
> 28 1 0.0 2 1
My manifest file:
> sample-id,absolute-filepath,direction
> stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R1_001.fastq.gz,forward
> stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R1_001.fastq.gz,forward
> stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R1_001.fastq.gz,forward
> stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R1_001.fastq.gz,forward
> stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R1_001.fastq.gz,forward
> stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R1_001.fastq.gz,forward
> stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R1_001.fastq.gz,forward
> stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R1_001.fastq.gz,forward
> stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R1_001.fastq.gz,forward
> stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R1_001.fastq.gz,forward
> stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R1_001.fastq.gz,forward
> stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R1_001.fastq.gz,forward
> stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R1_001.fastq.gz,forward
> stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R1_001.fastq.gz,forward
> stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R1_001.fastq.gz,forward
> stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R1_001.fastq.gz,forward
> stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R1_001.fastq.gz,forward
> stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R1_001.fastq.gz,forward
> stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R1_001.fastq.gz,forward
> stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R1_001.fastq.gz,forward
> stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R1_001.fastq.gz,forward
> stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R1_001.fastq.gz,forward
> stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R1_001.fastq.gz,forward
> stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R1_001.fastq.gz,forward
> stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R1_001.fastq.gz,forward
> stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R1_001.fastq.gz,forward
> stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R1_001.fastq.gz,forward
> stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R1_001.fastq.gz,forward
> stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R1_001.fastq.gz,forward
> stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R1_001.fastq.gz,forward
> stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R1_001.fastq.gz,forward
> stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R1_001.fastq.gz,forward
> stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R1_001.fastq.gz,forward
> stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R1_001.fastq.gz,forward
> stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R1_001.fastq.gz,forward
> stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R1_001.fastq.gz,forward
> stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R1_001.fastq.gz,forward
> stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R2_001.fastq.gz,reverse
> stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R2_001.fastq.gz,reverse
> stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R2_001.fastq.gz,reverse
> stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R2_001.fastq.gz,reverse
> stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R2_001.fastq.gz,reverse
> stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R2_001.fastq.gz,reverse
> stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R2_001.fastq.gz,reverse
> stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R2_001.fastq.gz,reverse
> stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R2_001.fastq.gz,reverse
> stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R2_001.fastq.gz,reverse
> stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R2_001.fastq.gz,reverse
> stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R2_001.fastq.gz,reverse
> stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R2_001.fastq.gz,reverse
> stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R2_001.fastq.gz,reverse
> stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R2_001.fastq.gz,reverse
> stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R2_001.fastq.gz,reverse
> stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R2_001.fastq.gz,reverse
> stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R2_001.fastq.gz,reverse
> stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R2_001.fastq.gz,reverse
> stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R2_001.fastq.gz,reverse
> stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R2_001.fastq.gz,reverse
> stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R2_001.fastq.gz,reverse
> stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R2_001.fastq.gz,reverse
> stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R2_001.fastq.gz,reverse
> stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R2_001.fastq.gz,reverse
> stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R2_001.fastq.gz,reverse
> stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R2_001.fastq.gz,reverse
> stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R2_001.fastq.gz,reverse
> stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R2_001.fastq.gz,reverse
> stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R2_001.fastq.gz,reverse
> stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R2_001.fastq.gz,reverse
> stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R2_001.fastq.gz,reverse
> stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R2_001.fastq.gz,reverse
> stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R2_001.fastq.gz,reverse
> stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R2_001.fastq.gz,reverse
> stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R2_001.fastq.gz,reverse
> stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R2_001.fastq.gz,reverse