Host genes removal from a SampleData artifact

Dear Qiimers,

I am new to using this amazing tool. My current goal with this tool is to remove host genes (as shown here Evaluating and controlling data quality with q2-quality-control — QIIME 2 2024.2.0 documentation) from my data. My main issue (..or rather my misconception) is that the quality-control exclude-seqs function requires the --i-query-sequences parameter to be a FeatureData[Sequence]. I am confused as to how I created this specific type of artifact based on my workflow. I understand from this post https://forum.qiime2.org/t/create-featuredata-sequence/4913, the dionise step can achieve this, but I feel it makes more sense to remove host genes, prior to this.... let me know if this is an incorrect assessment please.

Here is my current working code and I would appreciate if any expert qiimers could identify where I am going wrong.

#set dir 
script_directory = "/Home/"
os.chdir(script_directory)

#import (my import data is paired so my columns are sample-id, forward-full-filepath, reverse-full-filepath)
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path qiime_input/manifest2.txt \
  --output-path qiime_input/paired_end_demux.qza \
  --input-format PairedEndFastqManifestPhred33V2

#perform initial QC
!qiime demux summarize \
  --i-data qiime_input/paired_end_demux.qza \
  --o-visualization qiime_input/paired_end_demux.qzv

#use join-pairs to merge paired ends
!qiime vsearch merge-pairs \
    --i-demultiplexed-seqs qiime_input/pe_demux.qza \
    --o-merged-sequences qiime_input/merged_pe_demux.qza \
    --o-unmerged-sequences qiime_input/unmerged_pe_demux.qza 

#create host gene sequence file
!qiime tools import \
    --input-path qiime_input/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.dna.toplevel.fa \
    --output-path qiime_input/ref_sequences.qza \
    --type 'FeatureData[Sequence]'

#remove host genes from qiime seqs
!qiime quality-control exclude-seqs \
    --i-query-sequences qiime_input/merged_pe_demux.qza \
    --i-reference-sequences qiime_input/ref_sequences.qza \
    --p-method blast \
    --p-perc-identity 0.97 \
    --p-perc-query-aligned 0.97 \
    --o-sequence-hits qiime_input/hits97.qza \
    --o-sequence-misses qiime_input/misses97.qza

I am consistently getting this error - which makes me think I am getting something very wrong in this pre-processing pipeline. How can I get to the FeatureData[Sequence] which is required for host removal using qiime2.

                There was a problem with the command:                     

(1/1) Invalid value for '--i-query-sequences': Expected an artifact of at
least type FeatureData[Sequence]. An artifact of type
SampleData[PairedEndSequencesWithQuality] was provided.

Many Thanks for any help,
Krutik

Hi @KpatelBio ,

You are importing your data as SampleData[PairedEndSequencesWithQuality], so you are using the wrong action for filtering, as the error message is indicating.

To filter a SampleData[PairedEndSequencesWithQuality] use this action:
https://docs.qiime2.org/2024.2/plugins/available/quality-control/filter-reads/

Note that you will first need to make a bowtie2 index of your host reference sequences using this action:
https://docs.qiime2.org/2024.2/plugins/available/quality-control/bowtie2-build/

Unfortunately we do not have a tutorial yet describing these steps, but you can read the documentation to see the expected inputs and outputs.

I have edited the title of this topic to better reflect what you are trying to accomplish.

Good luck!

2 Likes