I’m having a problem with itsxpress qiime2 plugin wherein no sequences are produced. This may be related to the questions here and here .
q2-itsxpress plugin AND standalone itsxpress produce no sequences in output file. However, replicating the analysis using vsearch and ITSx finds viable ITS2 sequences for 100% of vsearch merged and dereplicated sequences.
I’ve also previously processed this data using a separate pipepline (USEARCH/UPARSE and ITSx) and gotten reasonable results in terms of expected community membership (our target fungi were there)
Here is the workflow I used to run itsxpress within qiime2, and then confirming the results using standalone itsxpress and standalone vsearch and ITSx
Import data. These are two files from one sample that have already been trimmed of primers and filtered for expected errors.
qiime tools import \ --type 'SampleData[PairedEndSequencesWithQuality]' \ --input-path itsxpress_test \ --input-format CasavaOneEightSingleLanePerSampleDirFmt \ --output-path test.qza qiime tools peek test.qza qiime demux summarize --i-data test.qza --o-visualization test.qzv
Run itsxpress plugin
qiime itsxpress trim-pair-output-unmerged --i-per-sample-sequences test.qza --p-region ALL --p-taxa ALL --o-trimmed itsx_trimmed.qza qiime demux summarize --i-data itsx_trimmed.qza --o-visualization itsx_trimmed.qzv --verbose
demux summarize gives an error.
Plugin error from demux: Cannot describe a DataFrame without columns See above for debug info.
There are no reads in the files.
tar -xf itsx_trimmed.qza gunzip -c 05871b17-6917-425a-8b08-22e751e3bb55/data/test_1_L001_R1_001.fastq.gz | wc -l 0 gunzip -c 05871b17-6917-425a-8b08-22e751e3bb55/data/test_1_L001_R2_001.fastq.gz | wc -l 0
However, I have previously processed this dataset using a combination of USEARCH, and standalone ITSx. I know from this analysis that there are valid fungal ITS2 sequences in this data set. To confirm this I ran through standalone itsxpress, and standalone vsearch+ITSx to replicate the itsxpress process.
itsxpress --fastq itsxpress_test/test_1_L001_R1_001.fastq.gz --fastq2 itsxpress_test/test_1_L001_R2_001.fastq.gz --outfile test_ITSx_R1.fastq --outfile2 test_ITSx_R2.fastq --log test_ITSx.log --region ALL --taxa 'Fungi'
Pairs: 8912 Joined: 8567 96.129% Ambiguous: 342 3.838% No Solution: 3 0.034% Too Short: 0 0.000% Avg Insert: 333.2 Standard Deviation: 9.2 Mode: 328 Insert range: 280 - 426 90th percentile: 346 75th percentile: 338 50th percentile: 330 25th percentile: 328 10th percentile: 325
Dereplicating 100% Sorting 100% 1569 unique sequences, avg cluster 5.5, median 1, max 1281 Writing output file 100% Writing uc file, first part 100% Writing uc file, second part 100% 2019-07-23 11:29:39,354: INFO Searching for ITS start and stop sites using HMMSearch. This step takes a while. 2019-07-23 11:29:41,269: INFO Parsing HMM results. 2019-07-23 11:29:41,484: INFO Writing out sequences 2019-07-23 11:29:42,847: INFO ITSxpress ran in 00:00:05
There are no sequences in the output files…
wc -l test_ITSx_R1.fastq 0 test_ITSx_R1.fastq wc -l test_ITSx_R2.fastq 0 test_ITSx_R2.fastq
There is not much informative (to me) in the log file aside from
DEBUG No ITS stop or start sites were identified for sequence ...
Replicating the itsxpress pipeline as I understand it with standalone vsearch and ITSx
vsearch --fastq_mergepairs itsxpress_test/test_1_L001_R1_001.fastq.gz --reverse itsxpress_test/test_1_L001_R2_001.fastq.gz --fastaout test_ITSx_standalone.merged.fasta
Merging reads 100% 8912 Pairs 8853 Merged (99.3%) 59 Not merged (0.7%) Pairs that failed merging due to various reasons: 41 too few kmers found on same diagonal 16 alignment score too low, or score drop to high 2 overlap too short Statistics of all reads: 226.70 Mean read length Statistics of merged reads: 333.19 Mean fragment length 9.11 Standard deviation of fragment length 0.17 Mean expected error in forward sequences 0.22 Mean expected error in reverse sequences 0.12 Mean expected error in merged sequences 0.09 Mean observed errors in merged region of forward sequences 0.11 Mean observed errors in merged region of reverse sequences 0.20 Mean observed errors in merged region
Derep with vsearch
vsearch --derep_fulllength test_ITSx_standalone.merged.fasta --output test_ITSx_standalone.merged.derep.fasta
2949765 nt in 8853 seqs, min 280, max 426, avg 333 Dereplicating 100% Sorting 100% 1678 unique sequences, avg cluster 5.3, median 1, max 1315 Writing output file 100%
Run standalone ITSx
ITSx -i test_ITSx_standalone.merged.derep.fasta -o test_ITSx_standalone_derep --preserve T
ITSx runs to completion and finds ITS2 regions for 1678 of the dereplicated reads (i.e. 100%)
grep ">" test_ITSx_standalone_derep.ITS2.fasta | wc -l 1678