Feature Table / Metadata Index Mismatch

KEDIR_HAMZA · September 19, 2017, 2:40pm

Hi again,
Like Lisa, I was stuck at Denoise step and tackled it successfully. Now, stuck again at the feature-table!
I have encountered similar problem as stated above. But in my case I have already created sample-metadata.tsv at the start of the analysis. The first column of the sample-metadata is Sample ID.
When I tried to run

docker run -t -i -v C:\Users\ha1254ke\KHH006_Analysis:/data qiime2/core:2017.8 qiime feature-table summarize --i-table table.qza --o-visualization table.qzv --m-sample-metadata-file sample-metadata.tsv

It gives me the following error

[31me[1mPlugin error from feature-table:

  "None of [Index(['KHH006-11', 'KHH006-7', 'KHH006-10', 'KHH006-9',
  'KHH006-12',\n       'KHH006-8', 'KHH006-1', 'KHH006-5', 'KHH006-4',
  'KHH006-2', 'KHH006-6',\n       'KHH006-3'],\n      dtype='object')]
  are in the [index]"

I checked in the forum for a solution and I found out that the most probable reason might be
related to the discrepancy between Sample Name (in the demux-paired-end.qzv the first column) versus Sample_ID (in the sample metadata). I saw a recommendation of renaming the sample ID into sample name which is not clear to me.
Could you please, elaborate it a little bit more? Does it mean renaming the sample metadata in the google sheet? Is it possible to use sample ID instead of sample name in first column of demux-paired-end.qzv?
For your information my samples are already demultiplexed when I got them from our illumina seq facility.

ebolyen · September 19, 2017, 11:08pm

Hi @KEDIR_HAMZA,

Essentially. We need the IDs to match. There's no actual difference between "Sample Name" and "Sample ID" those are the same thing to QIIME 2, so whatever you call them, so long as the rows in your sheet match what your sequences files are named, everything will be fine.

Would you be able to provide a couple of the filenames? I would be able to tell you what QIIME 2 thinks your Sample IDs should be, this is probably where the mismatch is happening.

KEDIR_HAMZA · September 20, 2017, 10:16am

Hi @ebolyen,
Here is the screenshot showing sample name from demux-paired-end.qzv

and screenshot of sample-metadata.tsv

N.B. I did not assign any sample name. I was using only sample ID. The sample names were assigned by the Qiime2 itself with the following format (Sample ID + strings of number + sample ID= Sample name)

ebolyen · September 20, 2017, 10:46pm

Thanks for the screenshots @KEDIR_HAMZA!

I think you've run into a bug we keep trying to track down, it looks like the importer messed up your Sample IDs, they should be what you have in the mapping file (although I think with a dash).

Could you provide the import command you used (if you still have it)?

Thanks so much!

KEDIR_HAMZA · September 21, 2017, 8:05am

Luckily, I was saving the scripts:slight_smile:

docker run -t -i -v C:\Users\ha1254ke\KHH006_Analysis:/data qiime2/core:2017.7 qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path casava-18-paired-end-demultiplexed --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza # importing paired-end demultiplexed fastq.

docker run -t -i -v C:\Users\ha1254ke\KHH006_Analysis:/data qiime2/core:2017.7 qiime demux summarize --i-data demux-paired-end.qza --o-visualization demux-paired-end.qzv # to generate a summary of the demultiplexed results.

Best regards,
Kedir

jairideout · September 23, 2017, 12:40am

Hi @KEDIR_HAMZA! Would you be able to provide your table.qza and sample-metadata.tsv files? You can send me those in a direct message. Can you also send me the output from running ls on the directory of fastq files you imported?

ls casava-18-paired-end-demultiplexed/

Thanks!

KEDIR_HAMZA · September 24, 2017, 1:37pm

hi @jairideout
I have sent all the files through direct message.

Many Thanks!

jairideout · September 25, 2017, 9:14pm

Thanks for the files and screenshot @KEDIR_HAMZA! From the screenshot of your import directory (casava-18-paired-end-demultiplexed/), there may be sub-directories in there that are confusing the import step and causing QIIME 2 to detect the wrong sample IDs.

Each of the entries is marked with a Mode of d, meaning it is a directory and not a file:

In order to import your sequence data, you'll need to have all of your per-sample FASTQ files in a single directory, without any sub-directories or extra files hanging around. Once you've worked through that, you have a couple of options:

If your per-sample FASTQ files match the Casava 1.8 naming convention, you can just import the directory like you've already been doing.
If your filenames don't match the Casava 1.8 naming convention, you can use one of the "fastq manifest" importers to import your data into QIIME 2.

To get a sense of the data formats QIIME 2 is expecting here, you might try downloading the example data from the importing tutorial I linked to above and playing around with that -- I often find this approach helpful when importing sequence data.

ebolyen · September 29, 2017, 7:36pm

QIIME 2 2017.9 now has improved validation for these formats. So it'll now complain about the directory structure instead of error-ing (or guessing your sample IDs incorrectly) in mysterious ways later on!

KEDIR_HAMZA · September 30, 2017, 12:44pm

Thanks a lot @jairideout! The suggestion was very helpful and it helped me to solve the problem. The sample name in demux-paired-end.qzv summary and Sample ID in table.qzv summary are now showing the same thing.
@ebolyen I really appreciate for the incorporation of this feature in QIIME2 2017.9.

Many thanks for QIIME team!!

system · November 2, 2017, 2:50pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.