Plugin error from feature-table summarize

Craig_Ruaux · June 2, 2017, 4:50pm

Hi All,

I'm following the steps of the FMT Transplant tutorial, but using a subset of data from one of my own studies. At the feature-table summarize step, I am getting the following error [truncated for brevity]

Plugin error from feature-table:

"None of [Index(['S22-indexN719-B-S508-B-GCGTAGTA-CTAAGCCT-F3',\n
'S03-indexN716-B-S505-B-ACTCGCTA-GTAAGGAG-C1',\n
'S04-indexN716-B-S506-B-ACTCGCTA-ACTGCATA-D1',\n
'S18-indexN719-B-S503-B-GCGTAGTA-TATCCTCT-B3',\n
'S01-indexN716-B-S502-B-ACTCGCTA-CTCTCTAT-A1',\n

[...]

dtype='object')] are in the [index]"

These are the beginnings of the fastq file names from a MiSeq run, which I had imported to qiime2 using the steps for Casava 1.8 paired end demultiplexed fastq laid out in the "Importing Data" tutorial. The only thing that had happened to these files is they were renamed with a sample ID at the front of the file name (i.e. S01, S02 etc etc). The sample ID's in the mapping file are also S01, S02 etc. The mapping file validates with Keemi with no problems.

In the full file name, there is also a sample ID towards the end of the name, as illustrated below for S01

S01-indexN716-B-S502-B-ACTCGCTA-CTCTCTAT-A1_S1_L001_R1_001.fastq.gz

The original files from the sequencing facility all started with a common lane designator, as below

lane1-s001-indexN716-B-S502-B-ACTCGCTA-CTCTCTAT-A1_S1_L001_R1_001.fastq.gz

Could this be a problem due to the renaming of the read files? Did I actually use the wrong import methodology?

I have analyzed these data from these files using qiime 1 with no problems.

Thanks in advance for any suggestions or advice.

ebolyen · June 2, 2017, 5:05pm

Hey @Craig_Ruaux,

If you were to export your SampleData[SequencesWithQuality] artifact (the demuxed data) what does the MANIFEST file say? The IDs in that should match your mapping file. I suspect that they still have lane1 prefixed to the ID which is why nothing is matching.

thermokarst · June 2, 2017, 5:16pm

Hi @Craig_Ruaux! I just wanted to point out that you can also view a list of the parsed Sample IDs by generating and viewing the demux summarize visualization: the bottom of the “Overview” tab has a section titled “Per-sample sequence counts.” An example of that can be seen here. Either solution for viewing the Sample IDs should work just fine. Thanks!

Craig_Ruaux · June 2, 2017, 8:58pm

Thank you both for your replies

In the visualization summary the Sample IDs are listed as the first part of the file name, as given in the error log. I guess the issue is that in the mapping file, #SampleID is only the number of the sample, i.e. S01, S02 etc etc. I notice in the mapping file given for the FMT tutorial that they are using long sequences of numbers/letters for #SampleID, but then a subject-id column to identify the patient(s). I don’t have my data set up exactly like that, as I don’t have repeated sampling of individuals, so S01 = subject-id 01.

Is this a requirement of qiime2 that the mapping file #SampleID field needs to also be the file name of the sequences? I can see how that may be useful for tracking provenance, but it also is a little counterintuitive (to me at least).

I’ll remake my mapping file with #SampleID derived from the file names and a new field to identify the individuals and see what happens.

ebolyen · June 2, 2017, 9:11pm

That is almost correct. The Sample IDs don't need to match the filename, but the format they are stored in places the Sample IDs in the filename.

This format is just one way of storing your data, and we basically chose to match Casava's naming convention by default. So when you imported via the casava format, it assumed your Sample IDs were the first section (like you've observed).

You could also use one of the fastq manifest formats if you wanted to easily rename your samples (this will make your .qza store the new Sample IDs in the first section of the filenames).

Craig_Ruaux · June 2, 2017, 9:12pm

I did as indicated above, changed #SampleID to the file names and put in a new individual identifier field, and feature-table summarize ran to completion this time.

Thanks again for the suggestions.

system · July 4, 2017, 3:12am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.