What type of sequencing data is used in this tutorial?

[Edited by moderator to show Google translation into English]:
Thanks to developers and translators, the quality of the tutorial translation is great! When starting the tutorial, I hope that the developer or translator can also explain the structure of the read in the sequencing data involved in the tutorial, whether the data is single-ended or double-ended, the length of the read and the position of the barcode in the read, and the library is in the coverage area of the 16S genome , There is no explanation on the structure of the sequencing data in the tutorial. Even if the learner can run through the process, the choice of filtering methods and parameters is not clear, and it is not clear what genomic region the data covers.

Original text:
感谢开发者和翻译者,教程翻译质量很棒!入门学习教程的时候,希望开发者或者翻译者也能解释教程中涉及的测序数据中read的结构,数据是单端还是双端,read长度和barcode在read的位置,文库在16S基因组的覆盖区域,教程里没有对测序数据的结构做解释,即便学习者能把流程跑通,但对过滤方法和参数的选择却没有底,也不清楚这些数据到底覆盖了什么基因组区域。

1 Like

Welcome to the forum, @sensan!

Thank you for your kind words! All credit goes to @Yong-Xin_Liu for translating so many of our tutorials :grin:

Note that this translation is of a rather old document (from 2017) so the most up-to-date information and tutorials will be found at docs.qiime2.org, though these documents will be in English. For example, here is the latest version of this tutorial, and it explains many of these details earlier on:

https://docs.qiime2.org/2020.11/tutorials/moving-pictures/

The data used here are single-end (that does not necessarily mean that the original study does not have paired-end reads, only single-end reads are used here for simplicity but other tutorials describe how to analyze paired-end reads).

150 nt, this is mentioned lower down in the tutorial:

This is explained in the latest version of this tutorial but perhaps not the older version that was translated. This tutorial is using data generated using the EMP protocols, you can find more details here: https://earthmicrobiome.org/protocols-and-standards/

With the EMP protocol, the primer is not found in the sequence read. This is using the V4 primers 515f and 806r.

These sorts of decisions are described in more detail in the current tutorials. The genomic region etc is actually of little consequence; this general protocol is quite applicable to different marker-gene sequence targets.

Good luck!