Do I have then to just use the new filtered-seq.qza file to create new taxa barplots? But I will be using then the same feature-table unfiltered, Is this okay, or should i filter also the feature table.qza too??
So, finally when we generate taxa barplots after filtration of chloroplast, we will use the filtered table.qzabut still we will use the unfiltered taxonomy file, Is this correct?as we only filtering table.qza and seq.qza but not taxonomy.qza ? correct?
Then, I believe, I should run Phylogeny, alpha and beta diversity again using the new filtered table.qza and seq.qza files, correct?
There is no need to filter taxonomy.qza file. It is not a problem when ASVs, present in taxonomy file, are missing in the feature table and representative sequences. The problem is when ASVs from feature table are missing in rep-seqs or taxonomy files.
No worries, I am learning here as well by helping others.
Also, if these are gut samples, it is entirely possible these could be dietary 'bycatch', that is, cyanobacteria living on surfaces that the host is eating. Possibly contamination too, but hard to say without knowing more about your study system.
I am sampling from nature gut of sea cucumbers and dividing the gut into three compartments, foregut, midgut, hingut. Additionally sampling seawater and sediment samples.
It would make sense to me that these cyanobacteria comes from feeding on sediment if the foregut also was dominated by these cyanobacteria but it is not the case. So this means these cyanobacteria is enriched in the mid gut but if they are photosynthetic cyanobacteria then it makes no sense.
How i could reach their metabolic activity? any suggestions!! I can not find them on literature in any gut study ,
Not necessarily true. The foregut may simply not be the right environment for them to "stay around", so they just keep moving through the gut until they find a region that they can do well and become enriched there (midgut). Or perhaps it was indeed diet, and you happened to sample at a point and time when the DNA from the Cyanobacteria had moved to the midgut. Of course, these are just a few thoughts . Sounds like you may have an interesting research question blooming.
Remember trying to identify very specific taxa at the genus and species level can be difficult with amplicon sequencing data. It could be there are no good representatives of non-photosynthetic cyanobacteria within the reference database. I've not looked very thoroughly myself. Though you can look here. Note, I did not find any Melainabacteria within SILVA. So they may be present under another name or may simply not have been included yet. Thus, it is possible that the assigned taxonomy is simply a spurious result of the query sequence consistently mapping to the closest, but unrelated, taxon within the reference database. Also, keep in mind that taxonomy is always changing, and new sequence data is being generated.
You can always use the various tools within QIIME 2 and RESCRIPt to fetch and append any 16S rRNA gene data of Melainabacteria from GenBank into your existing SILVA or other reference database (Assuming the taxonomy labelling schema is constant between the files, i.e. GenBank and SILVA differ in some respects. But you can use qiime rescript edit-taxonomy ... to help). Then you can merge the new data into the existing reference database by using the qiime feature-table merge-seqs ... and qiime feature-table merge-taxa ... commands. Then you can train your new classifier.
I have a question please regarding training my classifier, as i have fears it doesnot fit my sequences.
Here in this step in rescript,
qiime rescript filter-seqs-length-by-taxon
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza Does --p-min-lens 900 1200 1400 means i am loosing any ref. sequences less than 1200 bp for 16S, could that be limiting my classifier to full length ref. 16S? Would lowering that increase my chances and my ref. seq. by including shorter amplicon reads?
I am trying to understand why all my sequences are classified as bacteria and I have no unclassified results at all.
This is explained in the RESCRIPt documentation here, as well as the help text, which you can access by:
qiime rescript filter-seqs-length-by-taxon --help
Potentially. You can play around with the settings. If you do not need to differentially length-filter the sequences by taxonomy, you can simply use qiime rescript filter-seqs-length ... instead. This way you can trim everything down to 900 bases, or whatever length you choose.
Likely not. Mainly because these labels only exist as organisms names, that have been used as the species labels. It is often quite difficult to obtain species-level classification with amplicon reads.
I want now to add the Melainabacteria from NCBI to my classifier but so stuck how to start that.
How to only add Melainabacteria and not the whole 16S refs, or I should add the whole 16S?
here it adds the whole 16S
qiime rescript get-ncbi-data
*--p-query '33175[BioProject] OR 33317[BioProject]' *
Thanks in Advance
Everything looks fine to me. However, know that GenBank and SILVA use slightly different taxonomic nomenclature and also use different prefixes. Prior to merging the taxonomy, you'll likely want to run rescript edit taxonomy ... as I've described here:
@SoilRotifer yes true, I got different prefixes in my final classification.
First question, what if I want to keep the short reads and not filter them, then can i just skip step 2. qiime rescript filter-seqs-length-by-taxon and step 3. and just use ncbi-refseqs-unfiltered.qza and ncbi-refseqs-taxonomy-unfiltered.qza for downstream analysis of step 4.qiime rescript evaluate-fit-classifier ?
second question please, I am trying to use qiime rescript edit-taxonomy now to fix different prefixes issue. So I should chang the prefix K__, in NCBI taxonomy file before merging it to the SILVA taxonomy file like that,