Identifying Data Format

First time user here.

I want to preface this by saying that the tutorials are wonderful; I have learned a lot by following the Moving Pictures tutorial, the Importing Data Tutorial, and the Atacama Soil Microbiome tutorial. Each was well done and very helpful. And each worked perfectly as I ran through each demonstration; they filled me with confidence to tackle my own data. …which didn’t go well…at all.

  • Version of QIIME 2 you are running, and how it is installed (e.g. Virtualbox, conda, etc.)

       Running version 2019.4 on Virtualbox
    

-File name format: RG01-Woods0104_R1_001.fastq.gz
RG01-Woods0104_R2_001.fastq.gz

-Data format:
@GWNJ-0478:651:GW1907152308th:2:1101:6263:2083 1:N:0:GCAGAAGA+GGCTCAGA
CGATCCCTACGGGTGGCAGCAGTGGGGAATCTTAGACAATGGGGGAAACCCTGATCTAGCCATGCCGCGTGAGCGATGAAGGCCTTAGGGTTGTAAAGCTCTTTCGTGGGGGAAGATAATGACTGTACCCCAAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCACGTAGGCGGACCGGAAAGTCAGAGGTGAAATC
+
DDDDDIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIHIIIIIIIIIIIIHHHIIGHIIIIIIIIHIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIHIIHHHHHIIIIIIIIIIIHIIIIIIGHFHIIIIHIIHIIIIIIIIIIIIIIIGHIIIIIIIIHIIB

  • Based on the descriptions in the Importing Data Tutorial, I believe I have Casava 1.8 paired-end demultiplexed fastq, but I am hoping to get confirmation of that. I have also attempted importing as a Fastq Manifest format...but that went very badly, so I am back to Casava 1.8.

  • What is the exact command or commands you ran? Copy and paste please.

qiime tools import \

--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path casava-18-paired-end-demultiplexed
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path demux-paired-end.qza

  • What is the exact error message?

    There was a problem importing casava-18-paired-end-demultiplexed:

    Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'

Any direction you can provide would be greatly appreciated.

1 Like

Hi @rogergold,

This is not Casava1.8 format — the error message is telling you the pattern that it is looking for and not finding (which means it can't parse out info it needs like sample ID):

Tell us what you tried and the error message — a manifest would be most appropriate for importing your data so let's start there.

1 Like

Thank you for your reply. Following your advice I am adding information about my attempt to import the data as a manifest.

  1. The manifest file was created in Google Sheets with three columns (sample-id, forward-absolute-filepath, reverse-absolute-filepath) and provided with the appropriate information. I then validated the file as a Qiime2 metadata file using the Keemei add-on, and saved as pe-33-manifest.tsv by downloading as a .tsv file. This was then uploaded into the parent directory in the terminal.

Here is a link to the pe-33-manifest file: https://drive.google.com/open?id=1ybBemXkxVd9GzxaDGXvYDQriP90R3zp5

  1. The following command was used:

qiime tools import \

--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path pe-33-manifest
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2

  1. The following error report was received (remember, I warned you: it is quite scary):

There was a problem importing pe-33-manifest:

pe-33-manifest is not a(n) PairedEndFastqManifestPhred33V2 file:

Found unrecognized ID column name '@font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:local('Roboto Italic'),local('Roboto-Italic'),url(//fonts.gstatic.com/s/roboto/v18/KFOkCnqEu92Fr1Mu51xIIzc.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:300;src:local('Roboto Light'),local('Roboto-Light'),url(//fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmSU5fBBc9.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:400;src:local('Roboto Regular'),local('Roboto-Regular'),url(//fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu4mxP.ttf)format('truetype');}@font-face{font-family:'Roboto';font-style:normal;font-weight:700;src:local('Roboto Bold'),local('Roboto-Bold'),url(//fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmWUlfBBc9.ttf)format('truetype');}pe-33-manifest.tsv - Google Drive;this.gbar={CONFIG:[[[0,"www.gstatic.com","og.qtm.en_US.BJnnYjmnrk0.O","com","en","25",0,[4,2,".40.40.40.40.40.40.","","1300102,3700326","258077665","0"],null,"Bkg2XfbTJ4aPtAaJv6TQDQ",null,0,"og.qtm.11g357o8skakg.L.X.O","AA2YrTtujFLE_agOX4XQ6pkcmKh-1gCckg","AA2YrTtH1K0x-XRIz8VugsQA3BVE8zV85g","",2,1,200,"USA",null,null,"25","25",1],null,null,null,[0,0,0,null,"","","",""],[0,0,"",1,0,0,0,0,0,0,0,0,0,null,0,0,null,null,0,0,0,"","","","","","",null,0,0,0,0,0,null,null,null,"rgba(32,33,36,1)","rgba(255,255,255,1)",0,0],null,null,["1","gci_91f30755d6a6b787dcc2a4062e6e9824.js","googleapis.client:plusone:gapi.iframes","","en"],null,null,null,null,["m;//scs/abc-static//js/k=gapi.gapi.en.JNa9MntajDY.O/d=1/rs=AHpOoo_db4DX0hhorP4qsjM6Ki5qzOgeUA/m=features","https://apis.google.com","","","","",null,1,"es_plusone_gc_20190630.0_p0","en",null,0],[0.009999999776482582,"com","25",[null,"","w",null,1,5184000,1,0,"",0,1,"",0,0,0,0,0,0],null,[["","","0",0,0,-1]],null,0,null,null,["5061451","google\\.(com|ru|ca|by|kz|com\\.mx|com\\.tr)$",1]],[1,1,0,27043,25,"USA","en","258077665.0",8,0.009999999776482582,0,0,null,null,0,0,"",null,null,1,"Bkg2XfbTJ4aPtAaJv6TQDQ"],[[null,null,null,"https://www.gstatic.com/og/_/js/k=og.qtm.en_US.BJnnYjmnrk0.O/rt=j/m=q_d,qmutsd/exm=qaaw,qabr,qadd,qaid,qebr,qein,qhaw,qhbr,qhch,qhga,qhid,qhin,qhpr/d=1/ed=1/rs=AA2YrTtujFLE_agOX4XQ6pkcmKh-1gCckg"]],null,null,[""]]],};this.gbar=this.gbar_||{};(function(_){var window=this;' while searching for header. The first column name in the header defines the ID column, and must be one of these values:

Case-insensitive: 'feature id', 'feature-id', 'featureid', 'id', 'sample id', 'sample-id', 'sampleid'

Case-sensitive: '#OTU ID', '#OTUID', '#Sample ID', '#SampleID', 'sample_name'

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Find details on QIIME 2 metadata requirements here: Metadata in QIIME 2 — QIIME 2 2019.4.0 documentation

Thinking that hidden information may have been carried over when I was copying and pasting, or that I was missing some important bit of formatting, I made another attempt at creating a manifest file by downloading the pe-64-manifest file from the Importing Data Tutorial, and replacing the provided information with my own sample names and absolute filepaths. This manifest file produced same error message.

Again, I sincerely appreciate any direction you can provide.

Holy Mackerel! You weren't kidding! It looks like your manifest file is in html format, for some reason. Could you please send the pe-33-manifest.tsv that you are using in your command? (not just a link to the google spreadsheet... I want to confirm that something is going wrong between downloading and importing)

1 Like

Your download client is messing this up for you, take a look at the contents of the downloaded manifest for supporting evidence:

head pe-33-manifest

:candle:

1 Like

Gentlemen, you are amazing! Thank you for all your help with this. Because I was generating the manifest file in Google Drive, I had simply been storing the file there and downloading using:

curl -sL
https://drive.google.com/open?id=1Cj9o4domhaGOo7IWcH7TXvcL91gXXQUT” >
“pe-33-manifest”

…but you are absolutely correct: the file was being messed up in the process. When I copied the manifest file to the shared folder on the Desktop and then moved it to the home directory from there, the subsequent import was successful!

This is all a new experience for me and I am sure to encounter every hurdle in the book, but thanks to the forum archives and your help, I really believe I will get to where I need to be! THANK YOU!

3 Likes