I have some very large fq files from a very deep RNA seq run. The samples are of a malaria-infected host (non-human). I want to remove most of the host reads so mapping to the malaria genome downstream is much faster. Only 5% of the reads belong to malaria. I thought I could use the exclude-seqs plugin to do this.
I imported my fastq files, made a featuretable artifact of the host genome, and then run into a road block. I got the error:
(1/1) Invalid value for "--i-query-sequences": Expected an artifact of at
least type FeatureData[Sequence]. An artifact of type
SampleData[PairedEndSequencesWithQuality] was provided.
I guess the plugin is expecting a FeatureData. I was not planning on using qiime2 denoising or any other processes, just removing most of the host reads. Can someone help? Or suggest another tool?
I believe the error I got is the expected behaviour. I suspect that qiime2 does not have the capability to filter out fastq reads based on alignment. At least I can't find a plugin that can do it, if the input needs to be a featureTable artifact and not SampleData artifact.
You are correct, that is the expected behavior. After the release of 2020.6 later this month QIIME 2 should support that filtering. As of right now you'll have to filter the reads outside of QIIME 2 then import them.
Interesting. I'm curious about how it will perform on my data. The host data accounts for about 95% of the reads. Each compressed fq file has about 72 million reads.
We will be very curious to hear your feedback on that method as well! With such high contamination rate yours will be a good dataset for "battle testing" this new method. Please let us know what you find.