I have paired-end, demultiplexed data from Illumina GaIIx sequencing. Similar to above, the samples are spread across multiple lanes, and I plan to create a FeatureTable of each lane and then merge. However, I am unsure of how to import the data, as the file structure and name are not like the tutorials.
File structure and name: each lane file contains a separate file for each sample. Each sample file contains with file names similar to: s_2_1_sequence.txt, and the same file name is present for each sample (sample a bin 1 and sample b bin 2 have the same named files with presumably different info). I would also really rather not rename all of these files by hand.
What is the best way for me to import these files? (Casava?)
Thanks for your help! However, I was mistaken - these reads are mulitplexed paired-end fastq files, and do not seem to be compatible with a Fastq Manifest import. I’m considering trimming in Qiime 1 (per How to demultiplex fastq file that still includes Barcodes and LinkerPrimer?) but that seems to leave me in the same position I’m in now. (Data structure will be difficult to deal with, and it does not seem that qiime1 has a Fastq manifest function?)
SO: I altered my data structure, renamed all files, concatenated, and renamed file type so that the data matched the EMP protocol. The issue I’m having now is that when I try to import, I get the following error: There was a problem importing SD_Lane1: SD_Lane1/barcodes.fastq.gz is not a(n) FastqGzFormat file: Header on line 5 is not FASTQ, records may be misaligned
I’ve checked the header and this is what I have:
SampleID BarcodeSequence LinkerPrimerSequence
BM1Na CGTGAT caagcagaagacggcatacgagat
BM1Fa ACATCG caagcagaagacggcatacgagat
BM2Na GCCTAA caagcagaagacggcatacgagat
BM2Fa TGGTCA caagcagaagacggcatacgagat
BM3Na CACTGT caagcagaagacggcatacgagat
BM3Fa ATTGGC caagcagaagacggcatacgagat
BM4Na GATCTG caagcagaagacggcatacgagat
BM4Fa TCAAGT caagcagaagacggcatacgagat
BM6Na CTGATC caagcagaagacggcatacgagat
Any thoughts? Should I add more info to my barcodes file?
gah - I was right the first time, and then misread. These are demultiplexed fastq files, and I have begun to upload them using the fastQ manifest tutorial, removing the adapter on the forward read using cutadapt as you suggested, and trimming. This all seems to be going fine.
However, it takes over 8 hours to import a lane. I think that this is because I am running a UBUNTU subsystem on Windows 10 home, and I’ve read that this does not work well with qiime2. Is this correct? I’ve been using this set up on smaller data sets (a 454 project) but did not encounter this issue. I’ve already looked into using Docker, but that package seems to require Windows 10 Pro.