How to build sample metadata.tsv in demux paired-end data?

Bing · February 23, 2017, 11:26pm

Continuing the discussion from Next steps following Importing demultiplexed sequences data in Casava format:

Here are two examples for the names of our demultiplexed paired-end data:
> DBR21181_GYN6002Visit5_12mosPost_RTAliquot_1_S4_L001_R1_001.fastq

DBR21181_GYN6002Visit5_12mosPost_RTAliquot_1_S4_L001_R2_001.fastq

DBR21181-GYN6004Visit5-12mosPost-RTAliquot-1_S3_L001_R1_001.fastq
DBR21181-GYN6004Visit5-12mosPost-RTAliquot-1_S3_L001_R2_001.fastq

These are two samples. After the denoise-paired step, I noticed that DBR21181 as sample ID in table.qza, does that mean both samples will have a repeated sample ID? Do I need to fix the names of raw fastq data by deleting all the DBR21181 in the sample names so that the sample ID can be different from each other? If that is the case, then #SampleID in the sample metadata can be identified as GYN6002Visit5 and GYN6004Visit5 respectively. Then the sample metadata can be connected based on these revised #SampleID, right?

ebolyen · February 24, 2017, 4:24pm

What should be happening is QIIME 2 should use DBR21181_GYN6002Visit5_12mosPost_RTAliquot_1 as the sample ID of your first example, and DBR21181-GYN6004Visit5-12mosPost-RTAliquot-1 as the sample ID of your second. (R1 and R2 are joined by qiime dada2 denoise-paired so we don’t need to keep those as seperate IDs in the table).

But it sounds like this might not be the case. Would you be able to provide the results of qiime feature-table summarize for that table? It should have the sample IDs as QIIME 2 sees them at that step (and it’s easier to upload).

Bing · February 24, 2017, 10:58pm

Yes, I did the test using one paired sample and got the results as below:

It looks like that QIIME automatically takes the all the names before the first _ as Sample ID. As all of our data are beginning with DBR21181 before the first dash, I guess all the sample IDs created in table.qza will be the same. I think the best way to do it is to change all the names so that they can create different sample IDs. Then, I can keep the consistency of Sample IDs between table.qza and sample metadata.tsv. What do you think?

Thanks,
Bing

ebolyen · February 27, 2017, 4:12pm

Thanks @Bing! That screenshot is perfect.

You are correct, renaming everything is probably the best thing to do right now.

This shouldn’t be necessary however, so I’ve filed a bug here.

Thanks for noticing this, we’re thinking about sample-ids and how we handle them right now in QIIME 2, so this is a really useful example for us!