Remove blank samples

Hi,

I have 91 samples in total. I just finished the demultiplex step and DADA2 denoise step. I am wondering if I should start excluding the blank sample in the downstream analysis. Do you guys remove the blank samples at all?

My sample ID are listed as below. Anything begins with DZ or BLANK are the blank samples.

Thank you!

#SampleID
11629.LEA55883.0001
11629.DZ35298.0100
11629.LEA58314.0001
11629.DZ35315.0004
11629.LEA55786.0001
11629.LEA58480.0001
11629.LEA58059.0001
11629.BLANK1.8A
11629.BLANK1.9A
11629.BLANK1.10A
11629.BLANK1.11A
11629.BLANK1.12A
11629.LEA57973.0001
11629.LEA56287.0001
11629.DZ35315.0008
11629.LEA58305.0001
11629.DZ35298.0074
11629.LEA56535.0001
11629.DZ35298.0012
11629.BLANK1.8B
11629.BLANK1.9B
11629.BLANK1.10B
11629.BLANK1.11B
11629.BLANK1.12B
11629.LEA57349.0001
11629.LEA60532.0001
11629.LEA53658.0001
11629.DZ35315.0016
11629.LEA59792.0001
11629.LEA56321.0001
11629.LEA60889.0001
11629.BLANK1.8C
11629.BLANK1.9C
11629.BLANK1.10C
11629.BLANK1.11C
11629.BLANK1.12C
11629.LEA58637.0001
11629.LEA58590.0001
11629.LEA59268.0001
11629.LEA57170.0001
11629.LEA58821.0001
11629.LEA59995.0001
11629.LEA60531.0001
11629.BLANK1.8D
11629.BLANK1.9D
11629.BLANK1.10D
11629.BLANK1.11D
11629.BLANK1.12D
11629.LEA56607.0001
11629.LEA57821.0001
11629.LEA58329.0001
11629.LEA56648.0001
11629.LEA61150.0001
11629.LEA54710.0001
11629.LEA55652.0001
11629.BLANK1.8E
11629.BLANK1.9E
11629.BLANK1.10E
11629.BLANK1.11E
11629.BLANK1.12E
11629.LEA58151.0001
11629.DZ35298.0071
11629.LEA59766.0001
11629.LEA55247.0001
11629.LEA53980.0001
11629.LEA61203.0001
11629.LEA54694.0001
11629.BLANK1.8F
11629.BLANK1.9F
11629.BLANK1.10F
11629.BLANK1.11F
11629.BLANK1.12F
11629.DZ35315.0006
11629.DZ35298.0033
11629.LEA57160.0001
11629.LEA58775.0001
11629.LEA58227.0001
11629.LEA56273.0001
11629.BLANK1.7G
11629.BLANK1.8G
11629.BLANK1.9G
11629.BLANK1.10G
11629.BLANK1.11G
11629.BLANK1.12G
11629.LEA53909.0001
11629.LEA59230.0001
11629.LEA57406.0001
11629.DZ35315.0018
11629.LEA58461.0001
11629.LEA59114.0001
11629.BLANK1.7H
11629.BLANK1.8H
11629.BLANK1.9H
11629.BLANK1.10H
11629.BLANK1.11H
11629.BLANK1.12H

Hi @ihl216,

I think the answer to your question depends a lot on what you want to do. I tend to work in a high biomass system and so I dont use my blanks a lot. I tend to discard those and my postive controls very early on in the analysis process. However, if there’s a reason you think you need to include your blanks, you should keep them. Those might be working with a low biomass community where you need them as a reference, or something.

Best,
Justine

So if I would like to discard those blanks early on, how can I do it in QIIME2? Is there a command or code that I can refer to? Thank you!

Try looking at qiime feature-table filter-samples. I might try the -p--where option if youve got the blank information coded in the metadata. I think the qiime diversity filter-distance probably behaves similarly, and so you may want to calculate distance first, so you have it, and then filter but that’s up to you.

Best,
Justine

1 Like

Thanks for pointing me toward the Feature table command. I was checking the results from “denoising-stats.qzv” and downloaded the metadata.tsv. We had two QC samples with 5 replicates each that begins with DZ. The rest of the samples begins with BLANK are blank samples.

I found that 2 blank samples have large number of sequence. Do you think it’s contamination or I might have did something wrong with the mapping file?

|11629.BLANK1.12A|10883|9898|9898|9369|9369|
|11629.BLANK1.12B|2572|2352|2352|2110|2110|

sample-id input filtered denoised merged non-chimeric
11629.BLANK1.10A 1 1 1 1 1
11629.BLANK1.10E 55 51 51 38 38
11629.BLANK1.10F 8 7 7 5 5
11629.BLANK1.11A 3 2 2 2 2
11629.BLANK1.11B 19 19 19 12 12
11629.BLANK1.11C 6 5 5 0 0
11629.BLANK1.11D 5 4 4 0 0
11629.BLANK1.11E 18 18 18 16 16
11629.BLANK1.11F 16 13 13 9 9
11629.BLANK1.11G 18 17 17 6 6
11629.BLANK1.11H 23 23 23 12 12
11629.BLANK1.12A 10883 9898 9898 9369 9369
11629.BLANK1.12B 2572 2352 2352 2110 2110
11629.BLANK1.12C 7 4 4 4 4
11629.BLANK1.12D 9 7 7 7 7
11629.BLANK1.12E 18 17 17 13 13
11629.BLANK1.12F 34 31 31 5 5
11629.BLANK1.12G 9 3 3 0 0
11629.BLANK1.12H 6 6 6 0 0
11629.BLANK1.7G 20 20 20 17 17
11629.BLANK1.7H 15 15 15 6 6
11629.BLANK1.8A 4 4 4 4 4
11629.BLANK1.8B 2 1 1 1 1
11629.BLANK1.8C 4 4 4 4 4
11629.BLANK1.8D 4 3 3 0 0
11629.BLANK1.8E 45 33 33 14 14
11629.BLANK1.8F 136 112 112 73 73
11629.BLANK1.8G 13 8 8 0 0
11629.BLANK1.8H 11 8 8 3 3
11629.BLANK1.9A 6 2 2 2 2
11629.BLANK1.9B 8 4 4 0 0
11629.BLANK1.9C 6 2 2 0 0
11629.BLANK1.9D 12 9 9 6 6
11629.BLANK1.9E 23 21 21 9 9
11629.BLANK1.9F 35 29 29 20 20
11629.BLANK1.9G 9 8 8 0 0
11629.BLANK1.9H 4 3 3 0 0
11629.DZ35298.0012 24519 22397 22397 21946 20163
11629.DZ35298.0033 17569 16023 16023 15691 14700
11629.DZ35298.0071 20891 19035 19035 18733 17594
11629.DZ35298.0074 17952 16656 16656 16357 14947
11629.DZ35298.0100 21678 19838 19838 19489 18088
11629.DZ35315.0004 19505 15202 15202 15007 14445
11629.DZ35315.0006 24446 19361 19361 19016 18500
11629.DZ35315.0008 21305 16860 16860 16546 15889
11629.DZ35315.0016 12337 9581 9581 9478 9285
11629.DZ35315.0018 25257 19854 19854 19557 19009
11629.DZ35315.0018 25257 19854 19854 19557 19009
11629.LEA53658.0001 20988 19680 19680 18244 16161
11629.LEA53909.0001 26769 26112 26112 25047 20848
11629.LEA53980.0001 24341 22967 22967 21280 20073
11629.LEA54694.0001 16813 15711 15711 14872 13823
11629.LEA54710.0001 23632 22798 22798 22019 20425
11629.LEA55247.0001 19514 17253 17253 16045 15202
11629.LEA55652.0001 23534 21659 21659 19844 19097
11629.LEA55786.0001 32313 30708 30708 28347 23565
11629.LEA55883.0001 30660 29555 29555 28291 25385
11629.LEA56273.0001 14787 14000 14000 12992 10811
11629.LEA56287.0001 24830 22570 22570 21075 19945
11629.LEA56321.0001 22609 21587 21587 20816 19171
11629.LEA56535.0001 21135 20305 20305 19429 16873
11629.LEA56607.0001 21914 20227 20227 18970 18503
11629.LEA56648.0001 27237 26027 26027 24717 22469
11629.LEA57160.0001 27212 23956 23956 21679 18968
11629.LEA57170.0001 15015 14357 14357 13833 13546
11629.LEA57349.0001 17657 15201 15201 14225 13807

Hi,

I was looking at the results from “denoising-stats.qzv” and downloaded the metadata.tsv. We had two QC samples with 5 replicates each that begins with DZ. The rest of the samples begins with BLANK are blank samples.

I found that 2 blank samples have large number of sequence. Do you think it’s contamination or I might have did something wrong with the mapping file?

|11629.BLANK1.12A|10883|9898|9898|9369|9369|
|11629.BLANK1.12B|2572|2352|2352|2110|2110|

sample-id input filtered denoised merged non-chimeric
|11629.BLANK1.10A|1|1|1|1|1|
|11629.BLANK1.10E|55|51|51|38|38|
|11629.BLANK1.10F|8|7|7|5|5|
|11629.BLANK1.11A|3|2|2|2|2|
|11629.BLANK1.11B|19|19|19|12|12|
|11629.BLANK1.11C|6|5|5|0|0|
|11629.BLANK1.11D|5|4|4|0|0|
|11629.BLANK1.11E|18|18|18|16|16|
|11629.BLANK1.11F|16|13|13|9|9|
|11629.BLANK1.11G|18|17|17|6|6|
|11629.BLANK1.11H|23|23|23|12|12|
|11629.BLANK1.12A|10883|9898|9898|9369|9369|
|11629.BLANK1.12B|2572|2352|2352|2110|2110|
|11629.BLANK1.12C|7|4|4|4|4|
|11629.BLANK1.12D|9|7|7|7|7|
|11629.BLANK1.12E|18|17|17|13|13|
|11629.BLANK1.12F|34|31|31|5|5|
|11629.BLANK1.12G|9|3|3|0|0|
|11629.BLANK1.12H|6|6|6|0|0|
|11629.BLANK1.7G|20|20|20|17|17|
|11629.BLANK1.7H|15|15|15|6|6|
|11629.BLANK1.8A|4|4|4|4|4|
|11629.BLANK1.8B|2|1|1|1|1|
|11629.BLANK1.8C|4|4|4|4|4|
|11629.BLANK1.8D|4|3|3|0|0|
|11629.BLANK1.8E|45|33|33|14|14|
|11629.BLANK1.8F|136|112|112|73|73|
|11629.BLANK1.8G|13|8|8|0|0|
|11629.BLANK1.8H|11|8|8|3|3|
|11629.BLANK1.9A|6|2|2|2|2|
|11629.BLANK1.9B|8|4|4|0|0|
|11629.BLANK1.9C|6|2|2|0|0|
|11629.BLANK1.9D|12|9|9|6|6|
|11629.BLANK1.9E|23|21|21|9|9|
|11629.BLANK1.9F|35|29|29|20|20|
|11629.BLANK1.9G|9|8|8|0|0|
|11629.BLANK1.9H|4|3|3|0|0|

|11629.DZ35298.0012|24519|22397|22397|21946|20163|
|11629.DZ35298.0033|17569|16023|16023|15691|14700|
|11629.DZ35298.0071|20891|19035|19035|18733|17594|
|11629.DZ35298.0074|17952|16656|16656|16357|14947|
|11629.DZ35298.0100|21678|19838|19838|19489|18088|
|11629.DZ35315.0004|19505|15202|15202|15007|14445|
|11629.DZ35315.0006|24446|19361|19361|19016|18500|
|11629.DZ35315.0008|21305|16860|16860|16546|15889|
|11629.DZ35315.0016|12337|9581|9581|9478|9285|
|11629.DZ35315.0018|25257|19854|19854|19557|19009|
|11629.DZ35315.0018|25257|19854|19854|19557|19009|

1 Like

Hi @ihl216,

I would consider a few things. First, depending on your extraction method, well-to-well contamination can be an issue you want to think about. If you’re working in high biomass samples, I wouldn’t be concerned about it. If you’re working in low biomass, I would check out some of the dicussions around contamination filtering on the forum. You can always check it; i would use a PCoA to see if it clusters with the rest of your samples or if the deep sequencing depth is related to something else. If you’re not sure, PCoA is often a good way to do a quick visual check for patterns in your data. If you’re comfortable filtering it there, Id recommend filtering. However, in your analysis, you’ll likely end up fitlering it sooner or later…

Best,
Justine

4 Likes

An off-topic reply has been split into a new topic: How to identify low biomass samples?

Please keep replies on-topic in the future.

Hi @jwdebelius,

I am trying the feature-table-filter-samples command below and it has been running over an hour and still non-stop.

qiime feature-table filter-samples
–i-table Lean2-table.qza
–m-metadata-file metadata.tsv
–p-where "SampleID = ‘11629.LEA’”
–o-filtered-table clean-filtered-table.qza

While waiting for the results, can you please verify if I am doing things right? I put --p-where "SampleID = ‘11629.LEA’” \ in order to keep sample IDs that begin with 11629.LEA.

Thank you!

Hi @ihl216,

Id recommend reading the SQL-lite WHERE documentation. (I am perpetually reading this when I have to filter.) However, i can tell you it’s pretty literal and will look for a sample id that matches that exactly. Im not sure if you have a wild card character, again, check the documentation.

In general, I find its easier to make a bunch of columns in my metadata for things like filtering. So, I might have a sample type column, a clinical site column, a combo of the two…

Best,
Justine

Hi ihl2016,
As Justine mentioned, contamination depends on how your samples are being processed. As an illustration, if people deal with different samples/matrices in the same place or in the same laminar flow cabinet, cross-room contamination, well contamination during extraction/library prep.
Last week I've got something close to what your seeing.

My case we didn't run some samples but either didn't exluded them from the SampleSheet. Turns out they have got reads and I could assign taxonomy for them. But I believe it was nothing more than noise.
I believe the number of reads from your blank samples is low and may be nothing, but if your guessing that a likely contamination could be around, give it a try in what Nicholas suggested me in that thread.

Cheers
Leo

3 Likes

Hi @jwdebelius,

Since my previous code was taking very long time so I ended up using echo command to create the “samples-to-keep.tsv”, and it did keep the 44 stool samples that I am interested in and remove the QC and blank samples. I also successfully created the feature table and feature table summary based on 44 samples in the "samples to keep tsv. Below is my code:

echo SampleID > samples-to-keep.tsv
echo 11629.LEA53658.0001 >> samples-to-keep.tsv
echo 11629.LEA53909.0001 >> samples-to-keep.tsv

qiime feature-table filter-samples
–i-table table.qza
–m-metadata-file samples-to-keep.tsv
–o-filtered-table id-filtered-table.qza
/This step created the feature table filtered by Sample ID/

qiime feature-table summarize
–i-table id-filtered-table.qza
–o-visualization id-filtered-table.qzv
–m-sample-metadata-file samples-to-keep.tsv
/This step created the ID filtered sequence and feature table summary/

Now I would like to move forward to the (1) phylogenetic diversity analyses and (2) Alpha and beta diversity analysis. And I just realized that the code to generate phylogenetic diversity is based on rep-seqs.qza with the original total 91 samples (44 stool samples + 10 QC samples + 37 blank samples).

qiime phylogeny align-to-tree-mafft-fasttree
–i-sequences rep-seqs.qza
–o-alignment aligned-rep-seqs.qza
–o-masked-alignment masked-aligned-rep-seqs.qza
–o-tree unrooted-tree.qza
–o-rooted-tree rooted-tree.qza

Should I go back to the very beginning and create the “rep-seqs.qza” with the 44 stool samples? Thank you!

Hi @ihl216,

Creating a filtering list is another tidy solution!

You don’t need to go back and prune your tree, the algorithms will do this for you automagically. Essentially, they’ll just ignore the unused leaves in the calculations.

Best,
Justine

Hi @jwdebelius,

So I ran the alpha diversity code below:

qiime diversity core-metrics-phylogenetic
–i-phylogeny rooted-tree.qza
–i-table table.qza
–p-sampling-depth 9285
–m-metadata-file samples-to-keep.tsv
–output-dir core-metrics-results

Then I got the error message here:

Plugin error from diversity:

‘There are samples not included in the mapping file. Override this error by using the ignore_missing_samples argument. Offending samples: 11629.BLANK1.12A’
Debug info has been saved to /tmp/qiime2-q2cli-err-vggzq5p8.log

I think the error message showed up is because the “rooted-tree.qza” and “table.qza” had 91 total samples, but the metadata file “samples-to-keep.tsv” only contains the 44 stool samples that I would like to process.

Should I just add “ignore_missing_samples” in the command?

I added – ignore_missing_samples \ in my command, but it was not working.

qiime diversity core-metrics-phylogenetic
–i-phylogeny rooted-tree.qza
–i-table table.qza
–p-sampling-depth 9285
*– ignore_missing_samples *
–m-metadata-file samples-to-keep.tsv
–output-dir core-metrics-results

Hi @ihl216,

You're on the right track here! The code is set up to deal with samples that are in the metadata and not the feature table, but not the other way around. So, unfortunately, you need to filter your feature table, which you found with the

I guess I was confused earlier. I thought you'd used your new map for filtering. If that's not the case. I really recommend making a column in your full (original) map that's something like a sample type designation and then using that for filtering. Or, I think you can just pass in your new metadata file as a list of sample ids (but please double check the doc string to be sure) and then work off that file.

You need to filter the feature table, but you don't need to filter your tree.

Hope that helps clarify the question.

Best,
Justine

Hi @jwdebelius,

I think I am still a bit of confused about filtering the table. So I do have a new map file that contains 44 stool samples + 10 QC samples and I ran the code below:

qiime feature-table filter-samples
–i-table Lean2-table.qza
–m-metadata-file new_map54.tsv
–o-filtered-table type-filtered-table.qza

qiime feature-table summarize
–i-table type-filtered-table.qza
–o-visualization type-filtered-table.qzv
–m-sample-metadata-file new_map54.tsv

qiime diversity core-metrics-phylogenetic
–i-phylogeny Lean2-rooted-tree.qza
–i-table Lean2-table.qza
–p-sampling-depth 1109
–m-metadata-file new_map54.tsv
–output-dir type-core-metrics-results

And the diversity command gave me the error:
Plugin error from diversity:

‘There are samples not included in the mapping file. Override this error by using the ignore_missing_samples argument. Offending samples: 11629.BLANK1.12A, 11629.BLANK1.12B’

Debug info has been saved to /tmp/qiime2-q2cli-err-_5m2kxmt.log

Do you mean that I need to filter “Lean2-table.qza” so the errors won’t show up? Can you please point me toward the page where teaches how to filter the table?

Thanks so much again!

Hi @ihl216,

The easiest way to handle this is to have a matching mapping file and feature table. Your mapping file can be a superset of your feature table, in that it can have more samples, but all the samples in your feature table must be contained in your mapping file.

This means that you must generate a new filtered feature table with only the samples you want to analyze. Your errors stem from not using the correct table.

Based on what I'm seeing, the code for your filtering is good.

Here, you're filtering and checking your table. But, the issue you run into that see with the core diversity command here, you're not using the filtered table:

You're using your Lean2-table.qza in the command; QIIME is smart, but if you want it to worm on a featured set, you gotta hand it that feature set. So, my suggestion would be to try this, and see if it works a bit better.

qiime diversity core-metrics-phylogenetic \
–i-phylogeny Lean2-rooted-tree.qza \
–i-table type-filtered-table.qza \
–p-sampling-depth 1109 \
–m-metadata-file new_map54.tsv \
–output-dir type-core-metrics-results

Best,
Justine

Hi @jwdebelius,

I am currently re-running below step in order to generate the demultiplex artifact with only 54 samples that I want to analyze.

I am not sure if it would run into issue because the original file “Lean2-emp-paired-end-sequences.qza” has 91 samples, and the mapping file contains the subset of 54 samples.

qiime demux emp-paired
–m-barcodes-file new_map54.tsv
–m-barcodes-column BarcodeSequence
–i-seqs Lean2-emp-paired-end-sequences.qza
–o-per-sample-sequences Lean2-54-demux.qza \

If the above step is working, then I will be able to run the following steps beloa and generate the table with 54 samples that I want to analyze named “Lean2-54-table.qza”.

qiime demux summarize
–i-data Lean2-54-demux.qza
–o-visualization Lean2-54-demux.qzv

qiime dada2 denoise-paired
–i-demultiplexed-seqs Lean2-demux.qza
–p-trim-left-f 13
–p-trim-left-r 13
–p-trunc-len-f 150
–p-trunc-len-r 150
–o-table Lean2-54-table.qza
–o-representative-sequences Lean2-54-rep-seqs.qza
–o-denoising-stats Lean2-54-denoising-stats.qza

At the moment I am still waiting for QIIME2 running.

It has been two hours passed, but I have not gotten the demux.qza. Does it sound normal to take this long?