I have some very large fq files from a very deep RNA seq run. The samples are of a malaria-infected host (non-human). I want to remove most of the host reads so mapping to the malaria genome downstream is much faster. Only 5% of the reads belong to malaria. I thought I could use the exclude-seqs plugin to do this.
I imported my fastq files, made a featuretable artifact of the host genome, and then run into a road block. I got the error:
(1/1) Invalid value for "--i-query-sequences": Expected an artifact of at
least type FeatureData[Sequence]. An artifact of type
SampleData[PairedEndSequencesWithQuality] was provided.
I guess the plugin is expecting a FeatureData. I was not planning on using qiime2 denoising or any other processes, just removing most of the host reads. Can someone help? Or suggest another tool?
Hello @Glubb, can you please show me exactly what commands you ran leading up to and causing this error. Thank you.
I believe the error I got is the expected behaviour. I suspect that qiime2 does not have the capability to filter out fastq reads based on alignment. At least I can’t find a plugin that can do it, if the input needs to be a featureTable artifact and not SampleData artifact.
This is the command used to import my fastq files:
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.tsv \
--output-path paired-end.qza \
This is the command used to import my host genome to be used as an artifact in the exclude-seqs plugin:
qiime tools import --input-path
--output-path canary_genome_fasta.qza --type 'FeatureData[Sequence]'
This is the exclude-seqs command. It took a few minutes before giving the error.
qiime quality-control exclude-seqs --i-query-sequences paired-end.qza \
--i-reference-sequences canary_genome_fasta.qza \
--o-sequence-hits aligned-canary.qza \
You are correct, that is the expected behavior. After the release of 2020.6 later this month QIIME 2 should support that filtering. As of right now you’ll have to filter the reads outside of QIIME 2 then import them.
That’s exciting! Will it use the same blast-alignment method as exclude-seqs?
No, it will use bowtie2 and samtools for alignment against reference sequence(s)
Interesting. I’m curious about how it will perform on my data. The host data accounts for about 95% of the reads. Each compressed fq file has about 72 million reads.
We will be very curious to hear your feedback on that method as well! With such high contamination rate yours will be a good dataset for “battle testing” this new method. Please let us know what you find.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.