Metadata file questions

Hi,

Basically I have two questions regarding metadata file. I am using qiime2-2018.6. Each patient has two fasq files (forward and reverse) obtained from the Illumina paired-end technique. The compressed fasq files are already demultiplexed, with examples listed below.

EW_30_1_S25_L001_R1_001.fastq EW_30_1_S25_L001_R2_001.fastq
EW_30_2_S26_L001_R1_001.fastq EW_30_2_S26_L001_R2_001.fastq
EW_30_3_S27_L001_R1_001.fastq EW_30_3_S27_L001_R2_001.fastq
EW_30_4_S28_L001_R1_001.fastq EW_30_4_S28_L001_R2_001.fastq

LH_60_1_S61_L001_R1_001.fastq LH_60_1_S61_L001_R2_001.fastq
LH_60_2_S62_L001_R1_001.fastq LH_60_2_S62_L001_R2_001.fastq
LH_60_3_S63_L001_R1_001.fastq LH_60_3_S63_L001_R2_001.fastq
LH_60_4_S64_L001_R1_001.fastq LH_60_4_S64_L001_R2_001.fastq

Here there are 2 patients (EW_30 and LH_60) and each patient has samples collected from 4 time points. I use the Casava 1.8 paired-end technique to import the files into QIIME2.

My questions are listed below.

  1. I tried to create a metadata file as a .txt file from Excel. Does QIIME2 allow a .txt file as a metadata file? If not, how to convert a .txt file to a .tsv file in QIIME2?

  2. When preparing metadata (.tsv or .txt), how to name elements in the SampleID column in the metadata? How do I know elements in the SampleID column of the metadata file match those in the imported file listed above? Are they matched sequentially regardless the value in the SampleID column of the metadata file? For the example listed above, should I use values as EW_30 and LH_60 sequencially in the column SampleID of the metadata file? Not sure if they match those in the imported file from Casava.

Many thanks in advance!

Hi @xjyang69,

These are great questions!

QIIME 2 doesn’t really care about the extension of files, the important thing is the contents. I believe Excel saves TSV formatted files with a .txt extension. So assuming when you look at your .txt file in something simple like Notepad and you see a bunch of whitespace (the tab) between your values, you should be good to go.

IDs (of both samples and features) matched by their exact name, so generally you never have to worry about the order of anything.

What will happen in the case above is the following:

An Illumina filepath has these pieces:

<sample id>_<barcode id>_<lane #>_<orientation (R1/R2/R3)>_001.fastq[.gz]

QIIME 2 knows this, and so it will match the segments. It also knows that sample IDs often have underscores, so it will match from right to left.

The file EW_30_1_S25_L001_R1_001.fastq will be parsed as:

orientation: R1
lane: L001
barcode ID: S25

and so what remains must be the sample ID:

sample ID: EW_30_1

This sample ID (EW_30_1) is what will be matched against in your metadata file. So it should have the exact same spelling (capitalization and all).

Hope that helps!

Thank you very much Evan!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.