Removal of mitochondria and chloroplast - filter sequence vs filter taxonomy

Hi,

I'd like to remove mitochondria and chloroplast from my data. I followed the Filtering data — QIIME 2 2023.7.0 documentation here. I noticed that there are two ways to remove them.

One is Taxonomy-based filtering, which simply removes the names of mitochondria and chloroplast from feature data (correct me if I was wrong). Next, I ran feature-classifier classify-sklearn using the new feature data and got an updated taxonomy table without mitochondria and chloroplast names. This data is ready for analysis.

The other is Filtering sequences. This is removing the sequences of mitochondria and chloroplast. If I follow the regular steps on the tutorial but using the no-mitochondria-no-chloroplast sequences files from this step, would it result in the same feature and taxonomy tables as the Taxonomy-based filtering did?

In other words, are these two methods the same? do they result in the same feature and taxonomy tables eventually but using different filtering methods? Any suggestions would be helpful.

Thank you for your help!

HI @Sihan_Bu,

One thing to keep in mind is that you are dealing with two separate files. The feature-table, and the sequences. So, it is best to keep these two files in sync with one another.

That is, if you only filter the table, your sequence file will still contain the mitochondrial and chloroplast sequences, which will be problematic when making a phylogeny, as the phylogeny might be slightly altered with the retention of these sequences. That is, you need to run both commands (below) to make sure your two files are in sync:

qiime taxa filter-table \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --p-exclude mitochondria,chloroplast \
  --o-filtered-table table-no-mitochondria-no-chloroplast.qza

qiime taxa filter-seqs \
  --i-sequences sequences.qza \
  --i-taxonomy taxonomy.qza \
  --p-exclude mitochondria,chloroplast \
  --o-filtered-sequences sequences-no-mitochondria-no-chloroplast.qza

You can also achieve the same result by running the following commands:

qiime taxa filter-table \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --p-exclude mitochondria,chloroplast \
  --o-filtered-table table-no-mitochondria-no-chloroplast.qza

qiime feature-table filter-seqs \ 
  --i-data  sequences.qza \
  --i-table  table-no-mitochondria-no-chloroplast.qza \
  --o-filtered-data sequences-no-mitochondria-no-chloroplast.qza

Note the difference in the second command here. I prefer this approach, as I am performing the explicit filtering once, and then using that filtered table to filter my sequences. This reduces mistakes in typing. I've caught myself not filtering these two files the same way, which can cause conflicts later. This approach minimizes any mistakes as I am only keeping sequences that are contained within my feature-table.

You can also go the other way around, filter the sequences, and then filter the table based on your new sequence file. Hope this helps!

-Cheers!

2 Likes

Hi,

Thank you for the reply!!

I got an error when I was running this:

(qiime2-2023.5) busihan@SihandeMacBook-Pro Rerun_3rd_final % qiime feature-table filter-seqs \ 
  --i-sequences representative_sequences.qza\
  --m-metadata-file table-no-mitochondria-no-chloroplast.qza\
  --o-filtered-sequences sequences-no-mitochondria-no-chloroplast.qza

 (1/3) Missing option '--i-data'.
 (2/3) Missing option '--o-filtered-data'. ("--output-dir" may also be used)
 (3/3) Got unexpected extra argument ( )

The representative_sequences.qza is from the DADA2

(qiime2-2023.5) busihan@SihandeMacBook-Pro Rerun_3rd_final % qiime dada2 denoise-paired --i-demultiplexed-seqs per_sample_sequences.qza --p-trim-left-f 13 --p-trim-left-r 13 --p-trunc-len-f 150 --p-trunc-len-r 150 --output-dir denoise_output

Saved FeatureTable[Frequency] to: denoise_output/table.qza
Saved FeatureData[Sequence] to: denoise_output/**representative_sequences.qza**
Saved SampleData[DADA2Stats] to: denoise_output/denoising_stats.qza

Any suggestions would be very helpful. I appreciate your help.

Hi @Sihan_Bu,

Sorry about that. My fault... I confused part of qiime feature-table filter-seqs with qiime taxa filter-seqs

I fixed my initial post. So, the command is not:

qiime feature-table filter-seqs \ 
  --i-sequences sequences.qza \
  --m-metadata-file  table-no-mitochondria-no-chloroplast.qza \
  --o-filtered-sequences sequences-no-mitochondria-no-chloroplast.qza 

but should be:

qiime feature-table filter-seqs \ 
  --i-data  sequences.qza \
  --i-table  table-no-mitochondria-no-chloroplast.qza \
  --o-filtered-data sequences-no-mitochondria-no-chloroplast.qza
1 Like

Hi Mike,

Thank you!

I run these commands and got the same error:

(qiime2-2023.5) busihan@SihandeMacBook-Pro Rerun_3rd_final % qiime feature-table filter-seqs \ 
  --i-data  representative_sequences.qza \
  --i-table  table-no-mitochondria-no-chloroplast.qza \
  --o-filtered-data sequences-no-mitochondria-no-chloroplast.qza

 (1/3) Missing option '--i-data'.
 (2/3) Missing option '--o-filtered-data'. ("--output-dir" may also be used)
 (3/3) Got unexpected extra argument ( )
(qiime2-2023.5) busihan@SihandeMacBook-Pro Rerun_3rd_final % qiime feature-table filter-seqs \ 
  --i-data /Volumes/Extreme\ SSD/University\ of\ Connecticut_2023/Rerun_3rd_final/representative_sequences.qza \ 
  --i-table  /Volumes/Extreme\ SSD/University\ of\ Connecticut_2023/Rerun_3rd_final/table-no-mitochondria-no-chloroplast.qza \ 
  --o-filtered-data sequences-no-mitochondria-no-chloroplast.qza

 (1/3) Missing option '--i-data'.
 (2/3) Missing option '--o-filtered-data'. ("--output-dir" may also be used)
 (3/3) Got unexpected extra argument ( )

However, when I put all the commands in the same line, the error disappeared.

(qiime2-2023.5) busihan@SihandeMacBook-Pro Rerun_3rd_final % qiime feature-table filter-seqs --i-data representative_sequences.qza --i-table table-no-mitochondria-no-chloroplast.qza --o-filtered-data sequences-no-mitochondria-no-chloroplast.qza

Saved FeatureData[Sequence] to: sequences-no-mitochondria-no-chloroplast.qza

The only difference is just to separate the commands into different lines or put them in the same line. I think I also had the same issue happen before. Could you explain this for me?

Thanks!

No problem @Sihan_Bu,

I think the issue has to do with having spaces in your folder and file names. I highly recommend that you avoid using any kind of spaces or special characters as part of your folder and file names.

One way for most systems to handle this is to make use of a \ character in order to handle any space (whitespace) characters in a file path. However, this character is also used to allow our multi-line commands. As you can see your system knows to use \ to ignore spaces seen in your file path here:

/Volumes/Extreme\ SSD/University\ of\ Connecticut_2023/Rerun_3rd_final

I think this might be causing the issue, when combined with using \ for spreading your commands over multiple lines. I'd recommend replacing the spaces with underscores _.

Thank you, Mike!! This is really helpful.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.