Hello,
I've tried to make a barplot (with qiime taxa barplot) for the results of clustering analysis (performed with vsearch and dada2, I made both attempts with different strategies and thresholds), but the plot seems to include only the clustered sequences.
Instead, I'd like to see also the unassigned sequences there.
I've noticed that qiime vsearch cluster-features-closed-reference in particular generates a file including such sequences. (resulting from the --o-unmatched-sequences option), which I don't know how to use for further analysis such as creating a barplot including them.
For instance, in this picture found on QIIME2 website, the part I'd like to retain is represented in grey and called 'Unassigned;;"
How do I make the unmatched sequences appear in the plot?
Also, how do I take them into account if I use the output of the option --o-clustered-sequences for the next chimera-filtering step, which is performed only on clustered sequences as a consequence?
Do I have to merge the files somehow?
Thanks in advance!
I'm not sure. I think when people build barplots of their data, they usually choose to not show low quality reads, reads that do not pair, and reads that are chimeric. It's an interesting idea to show all your data (good data and errors, unpaired, and chimers) in a single graph.
I didn't make it, it appeared on Google while looking for examples of barplots including them...
For example, dada2 will remove reads that are errors or noise, but it will keep every read even if they cannot be assigned taxonomy.
I tried that strategy: I used dada2, followed by training classify-sklearn with 99% otus from GreenGenes, and then made the plot with the resulting taxonomy file. But as you can see there is no part of the plot referred to as 'Unassigned'. (Yes, those samples have a huge chloroplast DNA contamination and I'm just using them for training for now)
I think I'm missing something, because I tried vsearch too, with all the three possible approaches, but no 'Unassigned' appears in the plots as well. Some may be VERY generic (Like 'Bacteria'), but no Unassigned ones.
It’s an interesting idea to show all your data (good data and errors, unpaired, and chimers) in a single graph.
I think you are going to have to do this manually. Qiime 2 sort of expects all the low quality, unpaired, and chimeric reads to be removed before graphing, so I think this graph will have to be made outside of Qiime. You could still use the Qiime 2 API in python or the Phyloseq package in R to make a graphs like this.
While we are discussing this, I guess I wanted to talk about the terminology a bit.
denoising is a separate step from taxonomy assignment/sequence-classification
When a sequence cannot be classified into a taxonomy, it might be called Unclassified. But it's still in the data set and will appears on graphs.
When a sequence is removed by dada2, it's called based on why it was remove ('unpaired, low quality, and chimeric reads'). These generally don't appear in graphs or stat tests at all as they are considered non-informative noise.
Yes! Once I get the number of clustered reads (after running vsearch for example) with the summarize table plugin, as well as the number of reads before running it, I can substract the former and get the information I need easily.
Qiime 2 sort of expects all the low quality, unpaired, and chimeric reads to be removed before graphing,
Indeed, so I guess this is also true for anything that can't be clustered at the chosen identity percentage, and will be called 'Unmatched' and outputted for instance by vsearch separetely.
When a sequence is removed by dada2, it’s called based on why it was remove (‘unpaired, low quality, and chimeric reads’). These generally don’t appear in graphs or stat tests at all as they are considered non-informative noise.
Indeed, I agree with you!
When a sequence cannot be classified into a taxonomy, it might be called Unclassified . But it’s still in the data set and will appears on graphs.
But if this is the case... why can't I see mine after performing a closed-reference clustering with vsearch? I mean, the 'Unmatched' ones.