exclude-seqs: expected fetureTable. But I want to run it on imported fastq files

Glubb · June 9, 2020, 2:06pm

I have some very large fq files from a very deep RNA seq run. The samples are of a malaria-infected host (non-human). I want to remove most of the host reads so mapping to the malaria genome downstream is much faster. Only 5% of the reads belong to malaria. I thought I could use the exclude-seqs plugin to do this.

I imported my fastq files, made a featuretable artifact of the host genome, and then run into a road block. I got the error:

(1/1) Invalid value for "--i-query-sequences": Expected an artifact of at

least type FeatureData[Sequence]. An artifact of type
SampleData[PairedEndSequencesWithQuality] was provided.

I guess the plugin is expecting a FeatureData. I was not planning on using qiime2 denoising or any other processes, just removing most of the host reads. Can someone help? Or suggest another tool?

Oddant1 · June 9, 2020, 8:21pm

Hello @Glubb, can you please show me exactly what commands you ran leading up to and causing this error. Thank you.

Glubb · June 10, 2020, 2:11pm

I believe the error I got is the expected behaviour. I suspect that qiime2 does not have the capability to filter out fastq reads based on alignment. At least I can't find a plugin that can do it, if the input needs to be a featureTable artifact and not SampleData artifact.

Glubb · June 10, 2020, 2:11pm

This is the command used to import my fastq files:

qiime tools import   --type 'SampleData[PairedEndSequencesWithQuality]'  \
--input-path manifest.tsv \
--output-path paired-end.qza   \
--input-format PairedEndFastqManifestPhred33V2

This is the command used to import my host genome to be used as an artifact in the exclude-seqs plugin:

qiime tools import  --input-path  
~/genomes/s.canaria/uppercase_GCF_007115625.1_cibio_Scana_2019_genomic.fna 
--output-path canary_genome_fasta.qza --type 'FeatureData[Sequence]'

This is the exclude-seqs command. It took a few minutes before giving the error.

qiime quality-control exclude-seqs --i-query-sequences paired-end.qza \
--i-reference-sequences canary_genome_fasta.qza \
--o-sequence-hits aligned-canary.qza \
--o-sequence-misses non_canary_aligned.qza

Oddant1 · June 10, 2020, 5:13pm

You are correct, that is the expected behavior. After the release of 2020.6 later this month QIIME 2 should support that filtering. As of right now you'll have to filter the reads outside of QIIME 2 then import them.

Glubb · June 11, 2020, 2:24pm

That's exciting! Will it use the same blast-alignment method as exclude-seqs?

Nicholas_Bokulich · June 11, 2020, 7:08pm

No, it will use bowtie2 and samtools for alignment against reference sequence(s)

Glubb · June 12, 2020, 5:30pm

Interesting. I'm curious about how it will perform on my data. The host data accounts for about 95% of the reads. Each compressed fq file has about 72 million reads.

Nicholas_Bokulich · June 12, 2020, 5:45pm

We will be very curious to hear your feedback on that method as well! With such high contamination rate yours will be a good dataset for "battle testing" this new method. Please let us know what you find.

system · July 13, 2020, 11:46pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.