Large portion of Cyanobacteria

I have different Cyanobacteria subclassification in my taxonomy as follows and I am confused if I should exclude all
d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Chloroplast;f__Chloroplast;g__Chloroplast | 19.457%

d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Cyanobacteriales;f__Xenococcaceae;g__Xenococcus_PCC-7305

d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Cyanobacteriales;f__Xenococcaceae;g__Pleurocapsa_PCC-7319

d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Synechococcales;f__Cyanobiaceae;g__Synechococcus_CC9902

d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Cyanobacteriales;f__Xenococcaceae;g__Chroococcidiopsis_PCC-6712
d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Cyanobacteriales;f__Cyanobacteriaceae;g__Cyanobacterium_CLg1
d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Cyanobacteriales;f__Phormidiaceae;g__Trichodesmium_IMS101

So I should exclude all the orders of Cyanobacteria, or just chloroplast?

What confuses me is that their existence in sediment or seawater samples could be normal but in midgut samples with high abundace is not normal to me, How they are there?

No, you do not need to exclude the whole order. Just chloroplast and mitochondria, as it is shown in the tutorial.

I know that a lot of chloroplast can be found in human mouth samples just because subject ate some salad that day. Can be that they are coming from the food?

2 Likes

Do I have then to just use the new filtered-seq.qza file to create new taxa barplots? But I will be using then the same feature-table unfiltered, Is this okay, or should i filter also the feature table.qza too??

Filter first your feature table and then filter rep-seq.qza file (optional) based on filtered feature table.

1 Like

qiime taxa barplot
--i-table table.qza
--i-taxonomy taxonomy.qza
--m-metadata-file sample-metadata.tsv
--o-visualization taxa-bar-plots.qzv

So, finally when we generate taxa barplots after filtration of chloroplast, we will use the filtered table.qza but still we will use the unfiltered taxonomy file, Is this correct? as we only filtering table.qza and seq.qza but not taxonomy.qza ? correct?

Then, I believe, I should run Phylogeny, alpha and beta diversity again using the new filtered table.qza and seq.qza files, correct?

Thank you so much for your patience with me.

There is no need to filter taxonomy.qza file. It is not a problem when ASVs, present in taxonomy file, are missing in the feature table and representative sequences. The problem is when ASVs from feature table are missing in rep-seqs or taxonomy files.

That's correct!

No worries, I am learning here as well by helping others.

Hi again,

after filteration of my taxonomy file and removing cholroplast and mitochondria, I still have cyanobacteria dominating my midgut samples whhich makes no sense to have it dominating here

atra-filtered-no-mitochondria-no-chloroplast-taxa-bar-plots.qzv (828.2 KB)

How I could please proceed with it without ruining my data? I have also cyanobacteria in seawater and sediment samples but in midgut samples is not logic

Where could that wired classification comes from?

Hi @Sabrin,

That is fine, as not all cyanobacteria are photosynthetic. Many are known to live in the gut too. See:

Also, if these are gut samples, it is entirely possible these could be dietary 'bycatch', that is, cyanobacteria living on surfaces that the host is eating. Possibly contamination too, but hard to say without knowing more about your study system.

-Mike

1 Like

thank you so much for your quick response.

I am sampling from nature gut of sea cucumbers and dividing the gut into three compartments, foregut, midgut, hingut. Additionally sampling seawater and sediment samples.

It would make sense to me that these cyanobacteria comes from feeding on sediment if the foregut also was dominated by these cyanobacteria but it is not the case. So this means these cyanobacteria is enriched in the mid gut but if they are photosynthetic cyanobacteria then it makes no sense.

How i could reach their metabolic activity? any suggestions!! I can not find them on literature in any gut study ,

Not necessarily true. The foregut may simply not be the right environment for them to "stay around", so they just keep moving through the gut until they find a region that they can do well and become enriched there (midgut). Or perhaps it was indeed diet, and you happened to sample at a point and time when the DNA from the Cyanobacteria had moved to the midgut. Of course, these are just a few thoughts :man_shrugging:. Sounds like you may have an interesting research question blooming. :slight_smile:

Remember trying to identify very specific taxa at the genus and species level can be difficult with amplicon sequencing data. It could be there are no good representatives of non-photosynthetic cyanobacteria within the reference database. I've not looked very thoroughly myself. Though you can look here. Note, I did not find any Melainabacteria within SILVA. So they may be present under another name or may simply not have been included yet. Thus, it is possible that the assigned taxonomy is simply a spurious result of the query sequence consistently mapping to the closest, but unrelated, taxon within the reference database. Also, keep in mind that taxonomy is always changing, and new sequence data is being generated.

You can always use the various tools within QIIME 2 and RESCRIPt to fetch and append any 16S rRNA gene data of Melainabacteria from GenBank into your existing SILVA or other reference database (Assuming the taxonomy labelling schema is constant between the files, i.e. GenBank and SILVA differ in some respects. But you can use qiime rescript edit-taxonomy ... to help). Then you can merge the new data into the existing reference database by using the qiime feature-table merge-seqs ... and qiime feature-table merge-taxa ... commands. Then you can train your new classifier.

2 Likes

@SoilRotifer thank you so much for your thoughts.

I have a question please regarding training my classifier, as i have fears it doesnot fit my sequences.

Here in this step in rescript,
qiime rescript filter-seqs-length-by-taxon
--i-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza
--i-taxonomy silva-138.1-ssu-nr99-tax.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs silva-138.1-ssu-nr99-seqs-filt.qza
--o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza
Does --p-min-lens 900 1200 1400 means i am loosing any ref. sequences less than 1200 bp for 16S, could that be limiting my classifier to full length ref. 16S? Would lowering that increase my chances and my ref. seq. by including shorter amplicon reads?

I am trying to understand why all my sequences are classified as bacteria and I have no unclassified results at all.

This is explained in the RESCRIPt documentation here, as well as the help text, which you can access by:

qiime rescript filter-seqs-length-by-taxon --help

Potentially. You can play around with the settings. If you do not need to differentially length-filter the sequences by taxonomy, you can simply use qiime rescript filter-seqs-length ... instead. This way you can trim everything down to 900 bases, or whatever length you choose.

@SoilRotifer Hi again, As i have doubts still about my taxonomy, I am assigning taxonomy against the full classifier and not the specific V3V4 classifier, just to be sure..

Is these steps correct please , especially step 2? to assign directly against silva-138-ssu-nr99-classifier.qza

  1. qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads /user/asga9989/Taxonomy_output/training-feature-classifiers/ silva-138-ssu-nr99-seqs-derep-uniq.qza --i-reference-taxonomy /user/asga9989/Taxonomy_output/training-feature-classifiers/ silva-138-ssu-nr99-tax-derep-uniq.qza --o-classifier /user/asga9989/Taxonomy_output/training-feature-classifiers/ silva-138-ssu-nr99-classifier.qza

  2. qiime feature-classifier classify-sklearn --i-classifier /user/**asga9989/Taxonomy_output/training-feature-classifiers/**silva-138-ssu-nr99-classifier.qza --i-reads /user/asga9989/atra-rep-seqs.qza --o-classification /user/asga9989/atra-vs-full-classifier-taxonomy.qza

  3. qiime metadata tabulate --m-input-file /user/asga9989/atra-rep-seqs.qza --o-classification /user/asga9989/atra-vs-full-classifier-taxonomy.qza --o-visualization /user/asga9989/atra-vs-full-classifier-taxonomy.qzv

Is there is available full length classifier silva-138-ssu-nr99-classifier.qza that I might directly use? Just to check if my full classifier file is not corrupted!!

Best Regards,
Sabrin

@SoilRotifer I found Melainabacteria within SILVA , here it is

But it is only four entries, which maybe still not enough?

You can find a few pre-made files on the Data resources page.

Likely not. Mainly because these labels only exist as organisms names, that have been used as the species labels. It is often quite difficult to obtain species-level classification with amplicon reads.

@SoilRotifer
I want now to add the Melainabacteria from NCBI to my classifier but so stuck how to start that.
How to only add Melainabacteria and not the whole 16S refs, or I should add the whole 16S?

here it adds the whole 16S
qiime rescript get-ncbi-data
*--p-query '33175[BioProject] OR 33317[BioProject]' *
--o-sequences ncbi-refseqs-unfiltered.qza
--o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza
Thanks in Advance

@SoilRotifer,

Could you please check if my codes for merging the NCBI and SILVA rep.seqs and tax. is correct, step 5&6 so simply merging both corresponding seqs and taxonomy files from both NCBI and SILVA?

qiime rescript get-ncbi-data \
--p-query '33175[BioProject] OR 33317[BioProject]' \
--o-sequences /user/asga9989/NCBI/ncbi-refseqs-unfiltered.qza \
--o-taxonomy /user/asga9989/NCBI/ncbi-refseqs-taxonomy-unfiltered.qza
qiime rescript filter-seqs-length-by-taxon \
--i-sequences /user/asga9989/NCBI/ncbi-refseqs-unfiltered.qza \
--i-taxonomy /user/asga9989/NCBI/ncbi-refseqs-taxonomy-unfiltered.qza \
--p-labels Archaea Bacteria \
--p-min-lens 900 1200 \
--o-filtered-seqs /user/asga9989/NCBI/ncbi-refseqs.qza \
--o-discarded-seqs /user/asga9989/NCBI/ncbi-refseqs-tooshort.qza
qiime rescript filter-taxa \
--i-taxonomy /user/asga9989/NCBI/ncbi-refseqs-taxonomy-unfiltered.qza \
--m-ids-to-keep-file /user/asga9989/NCBI/ncbi-refseqs.qza \
--o-filtered-taxonomy /user/asga9989/NCBI/ncbi-refseqs-taxonomy.qza
qiime rescript evaluate-fit-classifier \
--i-sequences /user/asga9989/NCBI/ncbi-refseqs.qza \
--i-taxonomy /user/asga9989/NCBI/ncbi-refseqs-taxonomy.qza \
--o-classifier /user/asga9989/NCBI/ncbi-refseqs-classifier.qza \
--o-evaluation /user/asga9989/NCBI/ncbi-refseqs-classifier-evaluation.qzv \
--o-observed-taxonomy /user/asga9989/NCBI/ncbi-refseqs-predicted-taxonomy.qza
qiime feature-table merge-seqs \
--i-data /user/asga9989/Taxonomy_output/training-feature-classifiers/ silva-138-ssu-nr99-seqs-derep-uniq.qza \
--i-data /user/asga9989/NCBI/ncbi-refseqs.qza \
--o-merged-data /user/asga9989/NCBI/SILVA-NCBI-merged-rep-seqs.qza
qiime feature-table merge-taxa \
--i-data /user/asga9989/Taxonomy_output/training-feature-classifiers/silva-138-ssu-nr99-tax-derep-uniq.qza \
--i-data /user/asga9989/NCBI/ncbi-refseqs-taxonomy.qza \
--o-merged-data /user/asga9989/NCBI/SILVA-NCBI-merged-rep-taxonomy.qza
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads /user/asga9989/NCBI/SILVA-NCBI-merged-rep-seqs.qza
--i-reference-taxonomy /user/asga9989/NCBI/SILVA-NCBI-merged-rep-taxonomy.qza
--o-classifier /user/asga9989/NCBI/SILVA-NCBI-mergerd-classifier.qza
qiime feature-classifier classify-sklearn
--i-classifier /user/asga9989/NCBI/SILVA-NCBI-mergerd-classifier.qza
--i-reads /user/asga9989/atra-rep-seqs.qza
--o-classification /user/asga9989/NCBI/atra-vs-SILVA-NCBI-classifier-taxonomy.qza
qiime metadata tabulate
--m-input-file /user/asga9989/NCBI/atra-vs-SILVA-NCBI-classifier-taxonomy.qza
--o-visualization /user/asga9989/NCBI/atra-vs-SILVA-NCBI-classifier-taxonomy.qzv

Hi @Sabrin,

Everything looks fine to me. However, know that GenBank and SILVA use slightly different taxonomic nomenclature and also use different prefixes. Prior to merging the taxonomy, you'll likely want to run rescript edit taxonomy ... as I've described here:

-Mike

@SoilRotifer yes true, I got different prefixes in my final classification.

First question, what if I want to keep the short reads and not filter them, then can i just skip step 2. qiime rescript filter-seqs-length-by-taxon and step 3. and just use ncbi-refseqs-unfiltered.qza and ncbi-refseqs-taxonomy-unfiltered.qza for downstream analysis of step 4.qiime rescript evaluate-fit-classifier ?

second question please, I am trying to use qiime rescript edit-taxonomy now to fix different prefixes issue. So I should chang the prefix K__, in NCBI taxonomy file before merging it to the SILVA taxonomy file like that,

qiime rescript edit-taxonomy
--i-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza
--p-search-strings 'k__Bacteria'
--p-replacement-strings 'd__Bacteria'
--o-edited-taxonomy ncbi-refseqs-edited-taxonomy-unfiltered.qza

Finally, Is there is a simple tutorial I can follow to export my files into R formats for further visualizations please.

Thanks a lot for your continuous help.

You can do whatever you like to suit your needs. The tutorial just simply offers a series of examples to process your data.

I'd be more generic and simply do the following (just incase you have some Archaea... ):

qiime rescript edit-taxonomy \
    --i-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza \
    --p-search-strings 'k__' \
    --p-replacement-strings 'd__' \
    --o-edited-taxonomy ncbi-refseqs-edited-taxonomy-unfiltered.qza

Try to keep the questions on topic. Ideally, this question should be a separate post. But if you search this forum you'll be able to fins many examples and discussions.