I’m having a problem with itsxpress qiime2 plugin wherein no sequences are produced. This may be related to the questions here and here .
q2-itsxpress plugin AND standalone itsxpress produce no sequences in output file. However, replicating the analysis using vsearch and ITSx finds viable ITS2 sequences for 100% of vsearch merged and dereplicated sequences.
I’ve also previously processed this data using a separate pipepline (USEARCH/UPARSE and ITSx) and gotten reasonable results in terms of expected community membership (our target fungi were there)
Here is the workflow I used to run itsxpress within qiime2, and then confirming the results using standalone itsxpress and standalone vsearch and ITSx
Import data. These are two files from one sample that have already been trimmed of primers and filtered for expected errors.
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path itsxpress_test \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path test.qza
qiime tools peek test.qza
qiime demux summarize --i-data test.qza --o-visualization test.qzv
Run itsxpress plugin
qiime itsxpress trim-pair-output-unmerged --i-per-sample-sequences test.qza --p-region ALL --p-taxa ALL --o-trimmed itsx_trimmed.qza
qiime demux summarize --i-data itsx_trimmed.qza --o-visualization itsx_trimmed.qzv --verbose
demux summarize gives an error.
Plugin error from demux:
Cannot describe a DataFrame without columns
See above for debug info.
There are no reads in the files.
tar -xf itsx_trimmed.qza
gunzip -c 05871b17-6917-425a-8b08-22e751e3bb55/data/test_1_L001_R1_001.fastq.gz | wc -l
0
gunzip -c 05871b17-6917-425a-8b08-22e751e3bb55/data/test_1_L001_R2_001.fastq.gz | wc -l
0
However, I have previously processed this dataset using a combination of USEARCH, and standalone ITSx. I know from this analysis that there are valid fungal ITS2 sequences in this data set. To confirm this I ran through standalone itsxpress, and standalone vsearch+ITSx to replicate the itsxpress process.
standalone itsxpress
itsxpress --fastq itsxpress_test/test_1_L001_R1_001.fastq.gz --fastq2 itsxpress_test/test_1_L001_R2_001.fastq.gz --outfile test_ITSx_R1.fastq --outfile2 test_ITSx_R2.fastq --log test_ITSx.log --region ALL --taxa 'Fungi'
output
Pairs: 8912
Joined: 8567 96.129%
Ambiguous: 342 3.838%
No Solution: 3 0.034%
Too Short: 0 0.000%
Avg Insert: 333.2
Standard Deviation: 9.2
Mode: 328
Insert range: 280 - 426
90th percentile: 346
75th percentile: 338
50th percentile: 330
25th percentile: 328
10th percentile: 325
…
Dereplicating 100%
Sorting 100%
1569 unique sequences, avg cluster 5.5, median 1, max 1281
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
2019-07-23 11:29:39,354: INFO Searching for ITS start and stop sites using HMMSearch. This step takes a while.
2019-07-23 11:29:41,269: INFO Parsing HMM results.
2019-07-23 11:29:41,484: INFO Writing out sequences
2019-07-23 11:29:42,847: INFO ITSxpress ran in 00:00:05
There are no sequences in the output files…
wc -l test_ITSx_R1.fastq
0 test_ITSx_R1.fastq
wc -l test_ITSx_R2.fastq
0 test_ITSx_R2.fastq
There is not much informative (to me) in the log file aside from DEBUG No ITS stop or start sites were identified for sequence ...
Replicating the itsxpress pipeline as I understand it with standalone vsearch and ITSx
vsearch --fastq_mergepairs itsxpress_test/test_1_L001_R1_001.fastq.gz --reverse itsxpress_test/test_1_L001_R2_001.fastq.gz --fastaout test_ITSx_standalone.merged.fasta
output
Merging reads 100%
8912 Pairs
8853 Merged (99.3%)
59 Not merged (0.7%)
Pairs that failed merging due to various reasons:
41 too few kmers found on same diagonal
16 alignment score too low, or score drop to high
2 overlap too short
Statistics of all reads:
226.70 Mean read length
Statistics of merged reads:
333.19 Mean fragment length
9.11 Standard deviation of fragment length
0.17 Mean expected error in forward sequences
0.22 Mean expected error in reverse sequences
0.12 Mean expected error in merged sequences
0.09 Mean observed errors in merged region of forward sequences
0.11 Mean observed errors in merged region of reverse sequences
0.20 Mean observed errors in merged region
Derep with vsearch
vsearch --derep_fulllength test_ITSx_standalone.merged.fasta --output test_ITSx_standalone.merged.derep.fasta
output
2949765 nt in 8853 seqs, min 280, max 426, avg 333
Dereplicating 100%
Sorting 100%
1678 unique sequences, avg cluster 5.3, median 1, max 1315
Writing output file 100%
Run standalone ITSx
ITSx -i test_ITSx_standalone.merged.derep.fasta -o test_ITSx_standalone_derep --preserve T
ITSx runs to completion and finds ITS2 regions for 1678 of the dereplicated reads (i.e. 100%)
grep ">" test_ITSx_standalone_derep.ITS2.fasta | wc -l
1678
Thoughts?