One is Taxonomy-based filtering, which simply removes the names of mitochondria and chloroplast from feature data (correct me if I was wrong). Next, I ran feature-classifier classify-sklearn using the new feature data and got an updated taxonomy table without mitochondria and chloroplast names. This data is ready for analysis.
The other is Filtering sequences. This is removing the sequences of mitochondria and chloroplast. If I follow the regular steps on the tutorial but using the no-mitochondria-no-chloroplast sequences files from this step, would it result in the same feature and taxonomy tables as the Taxonomy-based filtering did?
In other words, are these two methods the same? do they result in the same feature and taxonomy tables eventually but using different filtering methods? Any suggestions would be helpful.
One thing to keep in mind is that you are dealing with two separate files. The feature-table, and the sequences. So, it is best to keep these two files in sync with one another.
That is, if you only filter the table, your sequence file will still contain the mitochondrial and chloroplast sequences, which will be problematic when making a phylogeny, as the phylogeny might be slightly altered with the retention of these sequences. That is, you need to run both commands (below) to make sure your two files are in sync:
Note the difference in the second command here. I prefer this approach, as I am performing the explicit filtering once, and then using that filtered table to filter my sequences. This reduces mistakes in typing. I've caught myself not filtering these two files the same way, which can cause conflicts later. This approach minimizes any mistakes as I am only keeping sequences that are contained within my feature-table.
You can also go the other way around, filter the sequences, and then filter the table based on your new sequence file. Hope this helps!
The only difference is just to separate the commands into different lines or put them in the same line. I think I also had the same issue happen before. Could you explain this for me?
I think the issue has to do with having spaces in your folder and file names. I highly recommend that you avoid using any kind of spaces or special characters as part of your folder and file names.
One way for most systems to handle this is to make use of a \ character in order to handle any space (whitespace) characters in a file path. However, this character is also used to allow our multi-line commands. As you can see your system knows to use \ to ignore spaces seen in your file path here:
I think this might be causing the issue, when combined with using \ for spreading your commands over multiple lines. I'd recommend replacing the spaces with underscores _.