I have 91 samples in total. I just finished the demultiplex step and DADA2 denoise step. I am wondering if I should start excluding the blank sample in the downstream analysis. Do you guys remove the blank samples at all?
My sample ID are listed as below. Anything begins with DZ or BLANK are the blank samples.
I think the answer to your question depends a lot on what you want to do. I tend to work in a high biomass system and so I dont use my blanks a lot. I tend to discard those and my postive controls very early on in the analysis process. However, if there’s a reason you think you need to include your blanks, you should keep them. Those might be working with a low biomass community where you need them as a reference, or something.
Try looking at qiime feature-table filter-samples. I might try the -p--where option if youve got the blank information coded in the metadata. I think the qiime diversity filter-distance probably behaves similarly, and so you may want to calculate distance first, so you have it, and then filter but that’s up to you.
Thanks for pointing me toward the Feature table command. I was checking the results from “denoising-stats.qzv” and downloaded the metadata.tsv. We had two QC samples with 5 replicates each that begins with DZ. The rest of the samples begins with BLANK are blank samples.
I found that 2 blank samples have large number of sequence. Do you think it’s contamination or I might have did something wrong with the mapping file?
I was looking at the results from “denoising-stats.qzv” and downloaded the metadata.tsv. We had two QC samples with 5 replicates each that begins with DZ. The rest of the samples begins with BLANK are blank samples.
I found that 2 blank samples have large number of sequence. Do you think it’s contamination or I might have did something wrong with the mapping file?
I would consider a few things. First, depending on your extraction method, well-to-well contamination can be an issue you want to think about. If you’re working in high biomass samples, I wouldn’t be concerned about it. If you’re working in low biomass, I would check out some of the dicussions around contamination filtering on the forum. You can always check it; i would use a PCoA to see if it clusters with the rest of your samples or if the deep sequencing depth is related to something else. If you’re not sure, PCoA is often a good way to do a quick visual check for patterns in your data. If you’re comfortable filtering it there, Id recommend filtering. However, in your analysis, you’ll likely end up fitlering it sooner or later…
While waiting for the results, can you please verify if I am doing things right? I put --p-where "SampleID = ‘11629.LEA’” \ in order to keep sample IDs that begin with 11629.LEA.
Id recommend reading the SQL-lite WHERE documentation. (I am perpetually reading this when I have to filter.) However, i can tell you it’s pretty literal and will look for a sample id that matches that exactly. Im not sure if you have a wild card character, again, check the documentation.
In general, I find its easier to make a bunch of columns in my metadata for things like filtering. So, I might have a sample type column, a clinical site column, a combo of the two…
Hi ihl2016,
As Justine mentioned, contamination depends on how your samples are being processed. As an illustration, if people deal with different samples/matrices in the same place or in the same laminar flow cabinet, cross-room contamination, well contamination during extraction/library prep.
Last week I've got something close to what your seeing.
My case we didn't run some samples but either didn't exluded them from the SampleSheet. Turns out they have got reads and I could assign taxonomy for them. But I believe it was nothing more than noise.
I believe the number of reads from your blank samples is low and may be nothing, but if your guessing that a likely contamination could be around, give it a try in what Nicholas suggested me in that thread.
Since my previous code was taking very long time so I ended up using echo command to create the “samples-to-keep.tsv”, and it did keep the 44 stool samples that I am interested in and remove the QC and blank samples. I also successfully created the feature table and feature table summary based on 44 samples in the "samples to keep tsv. Below is my code:
qiime feature-table filter-samples
–i-table table.qza
–m-metadata-file samples-to-keep.tsv
–o-filtered-table id-filtered-table.qza
/This step created the feature table filtered by Sample ID/
qiime feature-table summarize
–i-table id-filtered-table.qza
–o-visualization id-filtered-table.qzv
–m-sample-metadata-file samples-to-keep.tsv
/This step created the ID filtered sequence and feature table summary/
Now I would like to move forward to the (1) phylogenetic diversity analyses and (2) Alpha and beta diversity analysis. And I just realized that the code to generate phylogenetic diversity is based on rep-seqs.qza with the original total 91 samples (44 stool samples + 10 QC samples + 37 blank samples).
Creating a filtering list is another tidy solution!
You don’t need to go back and prune your tree, the algorithms will do this for you automagically. Essentially, they’ll just ignore the unused leaves in the calculations.
‘There are samples not included in the mapping file. Override this error by using the ignore_missing_samples argument. Offending samples: 11629.BLANK1.12A’
Debug info has been saved to /tmp/qiime2-q2cli-err-vggzq5p8.log
I think the error message showed up is because the “rooted-tree.qza” and “table.qza” had 91 total samples, but the metadata file “samples-to-keep.tsv” only contains the 44 stool samples that I would like to process.
Should I just add “ignore_missing_samples” in the command?
You're on the right track here! The code is set up to deal with samples that are in the metadata and not the feature table, but not the other way around. So, unfortunately, you need to filter your feature table, which you found with the
I guess I was confused earlier. I thought you'd used your new map for filtering. If that's not the case. I really recommend making a column in your full (original) map that's something like a sample type designation and then using that for filtering. Or, I think you can just pass in your new metadata file as a list of sample ids (but please double check the doc string to be sure) and then work off that file.
You need to filter the feature table, but you don't need to filter your tree.
I think I am still a bit of confused about filtering the table. So I do have a new map file that contains 44 stool samples + 10 QC samples and I ran the code below:
And the diversity command gave me the error:
Plugin error from diversity:
‘There are samples not included in the mapping file. Override this error by using the ignore_missing_samples argument. Offending samples: 11629.BLANK1.12A, 11629.BLANK1.12B’
Debug info has been saved to /tmp/qiime2-q2cli-err-_5m2kxmt.log
Do you mean that I need to filter “Lean2-table.qza” so the errors won’t show up? Can you please point me toward the page where teaches how to filter the table?
The easiest way to handle this is to have a matching mapping file and feature table. Your mapping file can be a superset of your feature table, in that it can have more samples, but all the samples in your feature table must be contained in your mapping file.
This means that you must generate a new filtered feature table with only the samples you want to analyze. Your errors stem from not using the correct table.
Based on what I'm seeing, the code for your filtering is good.
Here, you're filtering and checking your table. But, the issue you run into that see with the core diversity command here, you're not using the filtered table:
You're using your Lean2-table.qza in the command; QIIME is smart, but if you want it to worm on a featured set, you gotta hand it that feature set. So, my suggestion would be to try this, and see if it works a bit better.
I am currently re-running below step in order to generate the demultiplex artifact with only 54 samples that I want to analyze.
I am not sure if it would run into issue because the original file “Lean2-emp-paired-end-sequences.qza” has 91 samples, and the mapping file contains the subset of 54 samples.
If the above step is working, then I will be able to run the following steps beloa and generate the table with 54 samples that I want to analyze named “Lean2-54-table.qza”.