[TOC]

QIIME 2用户文档. 11数据筛选

Filtering data

https://docs.qiime2.org/2018.11/tutorials/filtering/

下载实验相关数据

Obtain the data

``````# 创建工作目录并进入
mkdir qiime2-filtering-tutorial
cd qiime2-filtering-tutorial

# 下载实验设计、特征表和距离矩阵
wget \
wget \
-O "table.qza" \
"https://data.qiime2.org/2018.11/tutorials/filtering/table.qza"
wget \
-O "distance-matrix.qza" \
"https://data.qiime2.org/2018.11/tutorials/filtering/distance-matrix.qza"
wget \
-O "taxonomy.qza" \
"https://data.qiime2.org/2018.11/tutorials/filtering/taxonomy.qza"
wget \
-O "sequences.qza" \
"https://data.qiime2.org/2018.11/tutorials/filtering/sequences.qza"
``````

过滤特征表

Filtering feature tables

按数据量过滤

Total-frequency-based filtering

``````qiime feature-table filter-samples \
--i-table table.qza \
--p-min-frequency 1500 \
--o-filtered-table sample-frequency-filtered-table.qza
``````

• distance-matrix.qza: 距离矩阵
• taxonomy.qza: 物种注释
• sequences.qza: 代表序列
• table.qza: 特征表
• sample-frequency-filtered-table.qza: 过滤后的特征表

``````qiime feature-table filter-features \
--i-table table.qza \
--p-min-frequency 10 \
--o-filtered-table feature-frequency-filtered-table.qza
``````

偶然因素的过滤

Contingency-based filtering

``````# 过滤至少在2个样品中存在的Feature，去除偶然的Feature
qiime feature-table filter-features \
--i-table table.qza \
--p-min-samples 2 \
--o-filtered-table sample-contingency-filtered-table.qza

# 去除频率总数小于10个Feature的样品(根据具体情况，有些样品微生物种类极低，可能是异常，如服用过抗生素或PCR扩增出现问题)，一般也要筛选后再分析
qiime feature-table filter-samples \
--i-table table.qza \
--p-min-features 10 \
--o-filtered-table feature-contingency-filtered-table.qza
``````

基于索引的过滤

Identifier-based filtering

``````# 生成一样需要保留或剔除的样品列表(也可以手动编写文本文件)
echo SampleID > samples-to-keep.tsv
echo L1S8 >> samples-to-keep.tsv
echo L1S105 >> samples-to-keep.tsv

# 只保留指定的两个样品L1S8和L1S105
qiime feature-table filter-samples \
--i-table table.qza \
--o-filtered-table id-filtered-table.qza
``````

基于元数据的筛选

``````# 筛选某个条件下一类：元数据Subject列中，名为subject-1的所有样品
qiime feature-table filter-samples \
--i-table table.qza \
--p-where "Subject='subject-1'" \
--o-filtered-table subject-1-filtered-table.qza
``````

``````# 筛选某个条件下多类：身体取样部分中左掌和右掌的样品
qiime feature-table filter-samples \
--i-table table.qza \
--p-where "BodySite IN ('left palm', 'right palm')" \
--o-filtered-table skin-filtered-table.qza
``````

`--p-where`表达式可以使用`AND``OR`组合关键字。对sample-metadata.tsv中的样本进行筛选，这里`--p-where`参数指定我们只保留其分组为`subject-1`且其`bodysite`位于`gut`的样品。对于要保留的示例，使用`AND`关键字时，要计算的两个表达式都必须为`true`。这意味着其身体部位是肠道但其Subject是subject-2的样本不会出现在结果表中。同样，subject为subject-1但其身体部位不是肠道的样本也不会出现在结果表中。

``````# 同时筛选两个条件共有(和关系/交集)：Subject列中subject-1组且在BodySite中的gut
qiime feature-table filter-samples \
--i-table table.qza \
--p-where "Subject='subject-1' AND BodySite='gut'" \
--o-filtered-table subject-1-gut-filtered-table.qza
``````

`OR`关键字语法类似于`AND`关键字语法，但指定对于要保留的样本，两个表达式中的任何一个都可以为true即可。由于缺少与此处使用的示例数据更相关的应用，本示例中的`OR`关键字将应用于保留`BodySite``gut`或在`sample-metadata.tsv``reportedantibiocusage``yes`的所有样品。与`AND`和不同的是，这意味着，如果样本的身体部位是肠道，但其报告的抗生物库容为“否”，则会出现在结果表中。同样，报告的抗生物活性是“是”，但其身体部位不是肠道的样本也将出现在结果表中。

``````# 同时筛选两个条件并集：BodySite例为gut或ReportedAntibioticUsage为Yes
qiime feature-table filter-samples \
--i-table table.qza \
--p-where "BodySite='gut' OR ReportedAntibioticUsage='Yes'" \
--o-filtered-table gut-abx-positive-filtered-table.qza
``````

``````# 使用非NOT进行条件筛选：subject-1组中非肠道的样品
qiime feature-table filter-samples \
--i-table table.qza \
--p-where "Subject='subject-1' AND NOT BodySite='gut'" \
--o-filtered-table subject-1-non-gut-filtered-table.qza
``````

基于物种过滤表和序列

Taxonomy-based filtering of tables and sequences

``````qiime taxa filter-table \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--p-exclude mitochondria \
--o-filtered-table table-no-mitochondria.qza
``````

``````qiime taxa filter-table \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--p-exclude mitochondria,chloroplast \
--o-filtered-table table-no-mitochondria-no-chloroplast.qza
``````

``````qiime taxa filter-table \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--p-include p__ \
--o-filtered-table table-with-phyla.qza
``````

``````qiime taxa filter-table \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--p-include p__ \
--p-exclude mitochondria,chloroplast \
--o-filtered-table table-with-phyla-no-mitochondria-no-chloroplast.qza
``````

``````qiime taxa filter-table \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--p-mode exact \
--p-exclude "k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Rickettsiales; f__mitochondria" \
--o-filtered-table table-no-mitochondria-exact.qza
``````

过滤序列

Filtering sequences

`q2-taxa`插件提供了一种方法`filter-seqs`，用于根据功能的分类注释过滤功能`FeatureData[Sequence]`。该功能与`qiime taxa filter-table`中提供的功能非常相似，因此您应该参考上面的`qiime taxa filter-table`示例，以了解有关基于分类筛选的更多信息。简单地说，`filter-seqs`可以保留包含门级注释的所有特征，但在其分类注释中排除包含线粒体或叶绿体的所有特征。

``````qiime taxa filter-seqs \
--i-sequences sequences.qza \
--i-taxonomy taxonomy.qza \
--p-include p__ \
--p-exclude mitochondria,chloroplast \
--o-filtered-sequences sequences-with-phyla-no-mitochondria-no-chloroplast.qza
``````

`q2-feature-table`插件还具有一个`filter-seqs`方法，允许用户根据各种标准删除序列，包括特征表中存在哪些物种。

过滤距离矩阵

Filtering distance matrices

``````# 按样品名过滤
qiime diversity filter-distance-matrix \
--i-distance-matrix distance-matrix.qza \
--o-filtered-distance-matrix index-filtered-distance-matrix.qza
``````

``````# 按实验设计中的某条件中的组过滤
qiime diversity filter-distance-matrix \
--i-distance-matrix distance-matrix.qza \
--p-where "Subject='subject-2'" \
--o-filtered-distance-matrix subject-2-filtered-distance-matrix.qza
``````

Reference

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet C, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope E, Da Silva R, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley G, Janssen S, Jarmusch AK, Jiang L, Kaehler B, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MG, Lee J, Ley R, Liu Y, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton J, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson, II MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CH, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, Caporaso JG. 2018. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science. PeerJ Preprints 6:e27295v2 QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science [PeerJ Preprints]

1 Like