Remove blank samples

ihl216 · April 3, 2019, 3:42pm

Hi,

I have 91 samples in total. I just finished the demultiplex step and DADA2 denoise step. I am wondering if I should start excluding the blank sample in the downstream analysis. Do you guys remove the blank samples at all?

My sample ID are listed as below. Anything begins with DZ or BLANK are the blank samples.

Thank you!

#SampleID
11629.LEA55883.0001
11629.DZ35298.0100
11629.LEA58314.0001
11629.DZ35315.0004
11629.LEA55786.0001
11629.LEA58480.0001
11629.LEA58059.0001
11629.BLANK1.8A
11629.BLANK1.9A
11629.BLANK1.10A
11629.BLANK1.11A
11629.BLANK1.12A
11629.LEA57973.0001
11629.LEA56287.0001
11629.DZ35315.0008
11629.LEA58305.0001
11629.DZ35298.0074
11629.LEA56535.0001
11629.DZ35298.0012
11629.BLANK1.8B
11629.BLANK1.9B
11629.BLANK1.10B
11629.BLANK1.11B
11629.BLANK1.12B
11629.LEA57349.0001
11629.LEA60532.0001
11629.LEA53658.0001
11629.DZ35315.0016
11629.LEA59792.0001
11629.LEA56321.0001
11629.LEA60889.0001
11629.BLANK1.8C
11629.BLANK1.9C
11629.BLANK1.10C
11629.BLANK1.11C
11629.BLANK1.12C
11629.LEA58637.0001
11629.LEA58590.0001
11629.LEA59268.0001
11629.LEA57170.0001
11629.LEA58821.0001
11629.LEA59995.0001
11629.LEA60531.0001
11629.BLANK1.8D
11629.BLANK1.9D
11629.BLANK1.10D
11629.BLANK1.11D
11629.BLANK1.12D
11629.LEA56607.0001
11629.LEA57821.0001
11629.LEA58329.0001
11629.LEA56648.0001
11629.LEA61150.0001
11629.LEA54710.0001
11629.LEA55652.0001
11629.BLANK1.8E
11629.BLANK1.9E
11629.BLANK1.10E
11629.BLANK1.11E
11629.BLANK1.12E
11629.LEA58151.0001
11629.DZ35298.0071
11629.LEA59766.0001
11629.LEA55247.0001
11629.LEA53980.0001
11629.LEA61203.0001
11629.LEA54694.0001
11629.BLANK1.8F
11629.BLANK1.9F
11629.BLANK1.10F
11629.BLANK1.11F
11629.BLANK1.12F
11629.DZ35315.0006
11629.DZ35298.0033
11629.LEA57160.0001
11629.LEA58775.0001
11629.LEA58227.0001
11629.LEA56273.0001
11629.BLANK1.7G
11629.BLANK1.8G
11629.BLANK1.9G
11629.BLANK1.10G
11629.BLANK1.11G
11629.BLANK1.12G
11629.LEA53909.0001
11629.LEA59230.0001
11629.LEA57406.0001
11629.DZ35315.0018
11629.LEA58461.0001
11629.LEA59114.0001
11629.BLANK1.7H
11629.BLANK1.8H
11629.BLANK1.9H
11629.BLANK1.10H
11629.BLANK1.11H
11629.BLANK1.12H

jwdebelius · April 3, 2019, 4:26pm

Hi @ihl216,

I think the answer to your question depends a lot on what you want to do. I tend to work in a high biomass system and so I dont use my blanks a lot. I tend to discard those and my postive controls very early on in the analysis process. However, if there’s a reason you think you need to include your blanks, you should keep them. Those might be working with a low biomass community where you need them as a reference, or something.

Best,
Justine

ihl216 · April 3, 2019, 4:29pm

So if I would like to discard those blanks early on, how can I do it in QIIME2? Is there a command or code that I can refer to? Thank you!

jwdebelius · April 3, 2019, 4:31pm

Try looking at qiime feature-table filter-samples. I might try the -p--where option if youve got the blank information coded in the metadata. I think the qiime diversity filter-distance probably behaves similarly, and so you may want to calculate distance first, so you have it, and then filter but that’s up to you.

Best,
Justine

ihl216 · April 3, 2019, 5:49pm

Thanks for pointing me toward the Feature table command. I was checking the results from “denoising-stats.qzv” and downloaded the metadata.tsv. We had two QC samples with 5 replicates each that begins with DZ. The rest of the samples begins with BLANK are blank samples.

I found that 2 blank samples have large number of sequence. Do you think it’s contamination or I might have did something wrong with the mapping file?

|11629.BLANK1.12A|10883|9898|9898|9369|9369|
|11629.BLANK1.12B|2572|2352|2352|2110|2110|

sample-id	input	filtered	denoised	merged	non-chimeric
11629.BLANK1.10A	1	1	1	1	1
11629.BLANK1.10E	55	51	51	38	38
11629.BLANK1.10F	8	7	7	5	5
11629.BLANK1.11A	3	2	2	2	2
11629.BLANK1.11B	19	19	19	12	12
11629.BLANK1.11C	6	5	5	0	0
11629.BLANK1.11D	5	4	4	0	0
11629.BLANK1.11E	18	18	18	16	16
11629.BLANK1.11F	16	13	13	9	9
11629.BLANK1.11G	18	17	17	6	6
11629.BLANK1.11H	23	23	23	12	12
11629.BLANK1.12A	10883	9898	9898	9369	9369
11629.BLANK1.12B	2572	2352	2352	2110	2110
11629.BLANK1.12C	7	4	4	4	4
11629.BLANK1.12D	9	7	7	7	7
11629.BLANK1.12E	18	17	17	13	13
11629.BLANK1.12F	34	31	31	5	5
11629.BLANK1.12G	9	3	3	0	0
11629.BLANK1.12H	6	6	6	0	0
11629.BLANK1.7G	20	20	20	17	17
11629.BLANK1.7H	15	15	15	6	6
11629.BLANK1.8A	4	4	4	4	4
11629.BLANK1.8B	2	1	1	1	1
11629.BLANK1.8C	4	4	4	4	4
11629.BLANK1.8D	4	3	3	0	0
11629.BLANK1.8E	45	33	33	14	14
11629.BLANK1.8F	136	112	112	73	73
11629.BLANK1.8G	13	8	8	0	0
11629.BLANK1.8H	11	8	8	3	3
11629.BLANK1.9A	6	2	2	2	2
11629.BLANK1.9B	8	4	4	0	0
11629.BLANK1.9C	6	2	2	0	0
11629.BLANK1.9D	12	9	9	6	6
11629.BLANK1.9E	23	21	21	9	9
11629.BLANK1.9F	35	29	29	20	20
11629.BLANK1.9G	9	8	8	0	0
11629.BLANK1.9H	4	3	3	0	0
11629.DZ35298.0012	24519	22397	22397	21946	20163
11629.DZ35298.0033	17569	16023	16023	15691	14700
11629.DZ35298.0071	20891	19035	19035	18733	17594
11629.DZ35298.0074	17952	16656	16656	16357	14947
11629.DZ35298.0100	21678	19838	19838	19489	18088
11629.DZ35315.0004	19505	15202	15202	15007	14445
11629.DZ35315.0006	24446	19361	19361	19016	18500
11629.DZ35315.0008	21305	16860	16860	16546	15889
11629.DZ35315.0016	12337	9581	9581	9478	9285
11629.DZ35315.0018	25257	19854	19854	19557	19009
11629.DZ35315.0018	25257	19854	19854	19557	19009
11629.LEA53658.0001	20988	19680	19680	18244	16161
11629.LEA53909.0001	26769	26112	26112	25047	20848
11629.LEA53980.0001	24341	22967	22967	21280	20073
11629.LEA54694.0001	16813	15711	15711	14872	13823
11629.LEA54710.0001	23632	22798	22798	22019	20425
11629.LEA55247.0001	19514	17253	17253	16045	15202
11629.LEA55652.0001	23534	21659	21659	19844	19097
11629.LEA55786.0001	32313	30708	30708	28347	23565
11629.LEA55883.0001	30660	29555	29555	28291	25385
11629.LEA56273.0001	14787	14000	14000	12992	10811
11629.LEA56287.0001	24830	22570	22570	21075	19945
11629.LEA56321.0001	22609	21587	21587	20816	19171
11629.LEA56535.0001	21135	20305	20305	19429	16873
11629.LEA56607.0001	21914	20227	20227	18970	18503
11629.LEA56648.0001	27237	26027	26027	24717	22469
11629.LEA57160.0001	27212	23956	23956	21679	18968
11629.LEA57170.0001	15015	14357	14357	13833	13546
11629.LEA57349.0001	17657	15201	15201	14225	13807

ihl216 · April 3, 2019, 9:58pm

Hi,

I was looking at the results from “denoising-stats.qzv” and downloaded the metadata.tsv. We had two QC samples with 5 replicates each that begins with DZ. The rest of the samples begins with BLANK are blank samples.

I found that 2 blank samples have large number of sequence. Do you think it’s contamination or I might have did something wrong with the mapping file?

|11629.BLANK1.12A|10883|9898|9898|9369|9369|
|11629.BLANK1.12B|2572|2352|2352|2110|2110|

sample-id input filtered denoised merged non-chimeric
|11629.BLANK1.10A|1|1|1|1|1|
|11629.BLANK1.10E|55|51|51|38|38|
|11629.BLANK1.10F|8|7|7|5|5|
|11629.BLANK1.11A|3|2|2|2|2|
|11629.BLANK1.11B|19|19|19|12|12|
|11629.BLANK1.11C|6|5|5|0|0|
|11629.BLANK1.11D|5|4|4|0|0|
|11629.BLANK1.11E|18|18|18|16|16|
|11629.BLANK1.11F|16|13|13|9|9|
|11629.BLANK1.11G|18|17|17|6|6|
|11629.BLANK1.11H|23|23|23|12|12|
|11629.BLANK1.12A|10883|9898|9898|9369|9369|
|11629.BLANK1.12B|2572|2352|2352|2110|2110|
|11629.BLANK1.12C|7|4|4|4|4|
|11629.BLANK1.12D|9|7|7|7|7|
|11629.BLANK1.12E|18|17|17|13|13|
|11629.BLANK1.12F|34|31|31|5|5|
|11629.BLANK1.12G|9|3|3|0|0|
|11629.BLANK1.12H|6|6|6|0|0|
|11629.BLANK1.7G|20|20|20|17|17|
|11629.BLANK1.7H|15|15|15|6|6|
|11629.BLANK1.8A|4|4|4|4|4|
|11629.BLANK1.8B|2|1|1|1|1|
|11629.BLANK1.8C|4|4|4|4|4|
|11629.BLANK1.8D|4|3|3|0|0|
|11629.BLANK1.8E|45|33|33|14|14|
|11629.BLANK1.8F|136|112|112|73|73|
|11629.BLANK1.8G|13|8|8|0|0|
|11629.BLANK1.8H|11|8|8|3|3|
|11629.BLANK1.9A|6|2|2|2|2|
|11629.BLANK1.9B|8|4|4|0|0|
|11629.BLANK1.9C|6|2|2|0|0|
|11629.BLANK1.9D|12|9|9|6|6|
|11629.BLANK1.9E|23|21|21|9|9|
|11629.BLANK1.9F|35|29|29|20|20|
|11629.BLANK1.9G|9|8|8|0|0|
|11629.BLANK1.9H|4|3|3|0|0|

|11629.DZ35298.0012|24519|22397|22397|21946|20163|
|11629.DZ35298.0033|17569|16023|16023|15691|14700|
|11629.DZ35298.0071|20891|19035|19035|18733|17594|
|11629.DZ35298.0074|17952|16656|16656|16357|14947|
|11629.DZ35298.0100|21678|19838|19838|19489|18088|
|11629.DZ35315.0004|19505|15202|15202|15007|14445|
|11629.DZ35315.0006|24446|19361|19361|19016|18500|
|11629.DZ35315.0008|21305|16860|16860|16546|15889|
|11629.DZ35315.0016|12337|9581|9581|9478|9285|
|11629.DZ35315.0018|25257|19854|19854|19557|19009|
|11629.DZ35315.0018|25257|19854|19854|19557|19009|

jwdebelius · April 4, 2019, 9:02am

Hi @ihl216,

I would consider a few things. First, depending on your extraction method, well-to-well contamination can be an issue you want to think about. If you’re working in high biomass samples, I wouldn’t be concerned about it. If you’re working in low biomass, I would check out some of the dicussions around contamination filtering on the forum. You can always check it; i would use a PCoA to see if it clusters with the rest of your samples or if the deep sequencing depth is related to something else. If you’re not sure, PCoA is often a good way to do a quick visual check for patterns in your data. If you’re comfortable filtering it there, Id recommend filtering. However, in your analysis, you’ll likely end up fitlering it sooner or later…

Best,
Justine

thermokarst · April 4, 2019, 4:25pm

An off-topic reply has been split into a new topic: How to identify low biomass samples?

Please keep replies on-topic in the future.

ihl216 · April 4, 2019, 8:09pm

Hi @jwdebelius,

I am trying the feature-table-filter-samples command below and it has been running over an hour and still non-stop.

qiime feature-table filter-samples
–i-table Lean2-table.qza
–m-metadata-file metadata.tsv
–p-where "SampleID = ‘11629.LEA’”
–o-filtered-table clean-filtered-table.qza

While waiting for the results, can you please verify if I am doing things right? I put --p-where "SampleID = ‘11629.LEA’” \ in order to keep sample IDs that begin with 11629.LEA.

Thank you!

jwdebelius · April 5, 2019, 7:59am

Hi @ihl216,

Id recommend reading the SQL-lite WHERE documentation. (I am perpetually reading this when I have to filter.) However, i can tell you it’s pretty literal and will look for a sample id that matches that exactly. Im not sure if you have a wild card character, again, check the documentation.

In general, I find its easier to make a bunch of columns in my metadata for things like filtering. So, I might have a sample type column, a clinical site column, a combo of the two…

Best,
Justine

lca123 · April 5, 2019, 1:42pm

Hi ihl2016,
As Justine mentioned, contamination depends on how your samples are being processed. As an illustration, if people deal with different samples/matrices in the same place or in the same laminar flow cabinet, cross-room contamination, well contamination during extraction/library prep.
Last week I've got something close to what your seeing.

My case we didn't run some samples but either didn't exluded them from the SampleSheet. Turns out they have got reads and I could assign taxonomy for them. But I believe it was nothing more than noise.
I believe the number of reads from your blank samples is low and may be nothing, but if your guessing that a likely contamination could be around, give it a try in what Nicholas suggested me in that thread.

Cheers
Leo

ihl216 · April 8, 2019, 3:17pm

Hi @jwdebelius,

Since my previous code was taking very long time so I ended up using echo command to create the “samples-to-keep.tsv”, and it did keep the 44 stool samples that I am interested in and remove the QC and blank samples. I also successfully created the feature table and feature table summary based on 44 samples in the "samples to keep tsv. Below is my code:

echo SampleID > samples-to-keep.tsv
echo 11629.LEA53658.0001 >> samples-to-keep.tsv
echo 11629.LEA53909.0001 >> samples-to-keep.tsv

qiime feature-table filter-samples
–i-table table.qza
–m-metadata-file samples-to-keep.tsv
–o-filtered-table id-filtered-table.qza
/This step created the feature table filtered by Sample ID/

qiime feature-table summarize
–i-table id-filtered-table.qza
–o-visualization id-filtered-table.qzv
–m-sample-metadata-file samples-to-keep.tsv
/This step created the ID filtered sequence and feature table summary/

Now I would like to move forward to the (1) phylogenetic diversity analyses and (2) Alpha and beta diversity analysis. And I just realized that the code to generate phylogenetic diversity is based on rep-seqs.qza with the original total 91 samples (44 stool samples + 10 QC samples + 37 blank samples).

qiime phylogeny align-to-tree-mafft-fasttree
–i-sequences rep-seqs.qza
–o-alignment aligned-rep-seqs.qza
–o-masked-alignment masked-aligned-rep-seqs.qza
–o-tree unrooted-tree.qza
–o-rooted-tree rooted-tree.qza

Should I go back to the very beginning and create the “rep-seqs.qza” with the 44 stool samples? Thank you!

jwdebelius · April 8, 2019, 3:32pm

Hi @ihl216,

Creating a filtering list is another tidy solution!

You don’t need to go back and prune your tree, the algorithms will do this for you automagically. Essentially, they’ll just ignore the unused leaves in the calculations.

Best,
Justine

ihl216 · April 8, 2019, 4:26pm

Hi @jwdebelius,

So I ran the alpha diversity code below:

qiime diversity core-metrics-phylogenetic
–i-phylogeny rooted-tree.qza
–i-table table.qza
–p-sampling-depth 9285
–m-metadata-file samples-to-keep.tsv
–output-dir core-metrics-results

Then I got the error message here:

Plugin error from diversity:

‘There are samples not included in the mapping file. Override this error by using the ignore_missing_samples argument. Offending samples: 11629.BLANK1.12A’
Debug info has been saved to /tmp/qiime2-q2cli-err-vggzq5p8.log

I think the error message showed up is because the “rooted-tree.qza” and “table.qza” had 91 total samples, but the metadata file “samples-to-keep.tsv” only contains the 44 stool samples that I would like to process.

Should I just add “ignore_missing_samples” in the command?

ihl216 · April 8, 2019, 4:38pm

I added – ignore_missing_samples \ in my command, but it was not working.

qiime diversity core-metrics-phylogenetic
–i-phylogeny rooted-tree.qza
–i-table table.qza
–p-sampling-depth 9285
*– ignore_missing_samples *
–m-metadata-file samples-to-keep.tsv
–output-dir core-metrics-results

jwdebelius · April 8, 2019, 8:28pm

Hi @ihl216,

You're on the right track here! The code is set up to deal with samples that are in the metadata and not the feature table, but not the other way around. So, unfortunately, you need to filter your feature table, which you found with the

I guess I was confused earlier. I thought you'd used your new map for filtering. If that's not the case. I really recommend making a column in your full (original) map that's something like a sample type designation and then using that for filtering. Or, I think you can just pass in your new metadata file as a list of sample ids (but please double check the doc string to be sure) and then work off that file.

You need to filter the feature table, but you don't need to filter your tree.

Hope that helps clarify the question.

Best,
Justine

ihl216 · April 9, 2019, 9:37pm

Hi @jwdebelius,

I think I am still a bit of confused about filtering the table. So I do have a new map file that contains 44 stool samples + 10 QC samples and I ran the code below:

qiime feature-table filter-samples
–i-table Lean2-table.qza
–m-metadata-file new_map54.tsv
–o-filtered-table type-filtered-table.qza

qiime feature-table summarize
–i-table type-filtered-table.qza
–o-visualization type-filtered-table.qzv
–m-sample-metadata-file new_map54.tsv

qiime diversity core-metrics-phylogenetic
–i-phylogeny Lean2-rooted-tree.qza
–i-table Lean2-table.qza
–p-sampling-depth 1109
–m-metadata-file new_map54.tsv
–output-dir type-core-metrics-results

And the diversity command gave me the error:
Plugin error from diversity:

‘There are samples not included in the mapping file. Override this error by using the ignore_missing_samples argument. Offending samples: 11629.BLANK1.12A, 11629.BLANK1.12B’

Debug info has been saved to /tmp/qiime2-q2cli-err-_5m2kxmt.log

Do you mean that I need to filter “Lean2-table.qza” so the errors won’t show up? Can you please point me toward the page where teaches how to filter the table?

Thanks so much again!

jwdebelius · April 9, 2019, 10:42pm

Hi @ihl216,

The easiest way to handle this is to have a matching mapping file and feature table. Your mapping file can be a superset of your feature table, in that it can have more samples, but all the samples in your feature table must be contained in your mapping file.

This means that you must generate a new filtered feature table with only the samples you want to analyze. Your errors stem from not using the correct table.

Based on what I'm seeing, the code for your filtering is good.

ihl216:

qiime feature-table filter-samples
–i-table Lean2-table.qza
–m-metadata-file new_map54.tsv
–o-filtered-table type-filtered-table.qza

qiime feature-table summarize 
–i-table type-filtered-table.qza 
–o-visualization type-filtered-table.qzv 
–m-sample-metadata-file new_map54.tsv

Here, you're filtering and checking your table. But, the issue you run into that see with the core diversity command here, you're not using the filtered table:

ihl216:

qiime diversity core-metrics-phylogenetic
–i-phylogeny Lean2-rooted-tree.qza
–i-table Lean2-table.qza
–p-sampling-depth 1109
–m-metadata-file new_map54.tsv
–output-dir type-core-metrics-results

You're using your Lean2-table.qza in the command; QIIME is smart, but if you want it to worm on a featured set, you gotta hand it that feature set. So, my suggestion would be to try this, and see if it works a bit better.

qiime diversity core-metrics-phylogenetic \
–i-phylogeny Lean2-rooted-tree.qza \
–i-table type-filtered-table.qza \
–p-sampling-depth 1109 \
–m-metadata-file new_map54.tsv \
–output-dir type-core-metrics-results

Best,
Justine

ihl216 · April 10, 2019, 2:53pm

Hi @jwdebelius,

I am currently re-running below step in order to generate the demultiplex artifact with only 54 samples that I want to analyze.

I am not sure if it would run into issue because the original file “Lean2-emp-paired-end-sequences.qza” has 91 samples, and the mapping file contains the subset of 54 samples.

qiime demux emp-paired
–m-barcodes-file new_map54.tsv
–m-barcodes-column BarcodeSequence
–i-seqs Lean2-emp-paired-end-sequences.qza
–o-per-sample-sequences Lean2-54-demux.qza \

If the above step is working, then I will be able to run the following steps beloa and generate the table with 54 samples that I want to analyze named “Lean2-54-table.qza”.

qiime demux summarize
–i-data Lean2-54-demux.qza
–o-visualization Lean2-54-demux.qzv

qiime dada2 denoise-paired
–i-demultiplexed-seqs Lean2-demux.qza
–p-trim-left-f 13
–p-trim-left-r 13
–p-trunc-len-f 150
–p-trunc-len-r 150
–o-table Lean2-54-table.qza
–o-representative-sequences Lean2-54-rep-seqs.qza
–o-denoising-stats Lean2-54-denoising-stats.qza

At the moment I am still waiting for QIIME2 running.

ihl216 · April 10, 2019, 4:48pm

It has been two hours passed, but I have not gotten the demux.qza. Does it sound normal to take this long?