MOCK sample and PCR blanks

Hi Everyone,

In my dataset, besides my samples, we also have a Mock sample (with forward and reverse reads) in order to assess our sequencing error rates and some controls (PCR blanks, with forward and reverse reads) to address for any contamination that might have happened during the processing of our samples.

Is it possible in Qiime2 to use our Mock sample to estimate sequencing errors? And, we are interested in removing the sequences that appeared in our PCR blanks from the sequences we get from our real samples. Can we do this?

Thanks in advance,
Fernando Studart

2 Likes

Hi @fstudart!

I'm not sure I follow your question, can you provide some additional detail about what it is that you are trying to do? Thanks!

Cool! I think I can help with this, although it is a bit of a roundabout solution, because we haven't yet implemented filtering of FeatureData[Sequence] :frowning:. What we can do today is filter your feature table to remove the features found in the "blanks":

Step 0) I am going to use the Moving Pictures tutorial dataset for this example, and for the sake of illustration, will consider Samples L6S93 and L6S68 as "blanks". The general strategy is to filter our feature table to just those samples, then we will grab the features that are found in those samples, and finally filter the original table using those feature IDs. Let's go!!

Step 1) Filter our feature table to grab just the "blanks"

$ qiime feature-table filter-samples \
  --i-table table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where '"#SampleID" IN ("L6S68", "L6S93")' \
  --o-filtered-table table-blanks.qza
# Note, the single vs double quotes matter above!

$ qiime feature-table summarize \
  --i-table table-blanks.qza \
  --o-visualization table-blanks.qzv

Step 2) Prep our list of feature IDs to filter

$ qiime tools view table-blanks.qzv
# Note, you could also take a look at this file at view.qiime2.org, your call!

You should see something that looks like this (note though, this is the Moving Pictures dataset):

A quick scan tells us that there are two samples in this filtered feature table (which makes sense, we asked it to filter our table to only two samples), and in those two samples there are 60 unique features. Let's scroll all the way down and grab the "Frequency per feature detail" CSV file.

So the first few lines of this file look like this:

$ head feature-frequency-detail.csv
fe30ff0f71a38a39cf1717ec2be3a2fc,3109.0
1d2e5f3444ca750c85302ceee2473331,1756.0
997056ba80681bbbdd5d09aa591eadc0,1413.0
d29fe3c70564fc0f69f2c03e0d1e5561,965.0
3c9c437f27aca05f8db167cd080ff1ec,945.0
f35ce9c514e1398308f5f84ed50b260f,566.0
6edca9464612efff71d8f97299f01663,428.0
699071a4a7ce68918e4cd93a61275ecd,314.0
fba3d9b3872ff9021d66434323fb9565,288.0
9f718b8f66b79c3de80546ca9a54539c,288.0

The first column is the feature ID, the second observed frequency (we don't need this column for this procedure, just wanted to note it). So, what we need to do to make this file useful is to convert it to TSV, and add a header row, that way we can use this file as QIIME 2 metadata (specifically as feature metadata!). You can do this in Excel, or use some command line magic, your call. I will use a command line one-liner to set this up:

$  echo 'Feature ID\tFrequency' | cat - feature-frequency-detail.csv | tr "," "\\t" > features-to-filter.tsv

The new features-to-filter.tsv file should look like this:

$ head features-to-filter.tsv
Feature ID      Frequency
fe30ff0f71a38a39cf1717ec2be3a2fc        3109.0
1d2e5f3444ca750c85302ceee2473331        1756.0
997056ba80681bbbdd5d09aa591eadc0        1413.0
d29fe3c70564fc0f69f2c03e0d1e5561        965.0
3c9c437f27aca05f8db167cd080ff1ec        945.0
f35ce9c514e1398308f5f84ed50b260f        566.0
6edca9464612efff71d8f97299f01663        428.0
699071a4a7ce68918e4cd93a61275ecd        314.0
fba3d9b3872ff9021d66434323fb9565        288.0

Step 3) Filter out the features found in the "blanks"

# First, remove the blanks from the feature table
$ qiime feature-table filter-samples \
  --i-table table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where '"#SampleID" IN ("L6S68", "L6S93")' \
  --p-exclude-ids \
  --o-filtered-table table-sans-blanks.qza

# Then, remove the features that are identified in `features-to-filter.tsv`
$ qiime feature-table filter-features \
  --i-table table-sans-blanks.qza \
  --m-metadata-file features-to-filter.tsv \
  --p-exclude-ids \
  --o-filtered-table filtered-table.qza

# This isn't necessary, just want to view the stats about our `filtered-table.qza`
$ qiime feature-table summarize \
  --i-table filtered-table.qza \
  --o-visualization filtered-table.qzv

$ qiime tools view filtered-table.qzv
# Note, you could also take a look at this file at view.qiime2.org, your call!

The stats look like this:

And the original feature table:

So it looks like we filtered out the samples that had the blanks (32 vs 34 samples), and we have removed the 60 features that were found in those blanks *phew*!

Let us know if you have any questions or get stuck! :tada:

6 Likes

QIIME 2 now supports filtering of FeatureData[Sequence]! Check out the release notes to learn more! :tada:

1 Like

Hi Dillon,

Just one more question: after removing the sequences found in my PCR blanks from table.qza, do I also have to remove the same sequences from rep-seqs.qza? if so, is that possible? Because, I’ll use this file later on in the analysis to build the tree.

Thanks very much,
FS

Hi @fstudart!

You could probably get away with not removing those features, but your phylogenetic tree will have some extra tips in it, which may or may not impact downstream analyses.

Yes! As I mentioned above, we now support this kind of filtering in QIIME 2 2017.9. Check out the docs for some more info on how to perform this, and let us know if you have any questions.

I would suggest trying building a tree and running your analyses both ways (with and without filtering), and see how the results compare, that way you can be confident you are making the right decision.

Thanks! :t_rex:

2 Likes

An off-topic reply has been split into a new topic: Filtering FeatureData[Sequence] using metadata file

Please keep replies on-topic in the future.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

@fstudart,
Just to follow up, methods to assess accuracy using mock communities are now available as new actions evaluate-composition and evaluate-seqs as of the 2017.11 release. These are designed with mock communities in mind, but could also be useful for testing simulated communities or other samples types with an “expected” composition/sequences.

I hope these help! :fireworks::tada::full_moon_with_face: