using a manifest file to import data or just instruct what the forward and reverse file is

Maybe I'm missing something, but when I am importing data before demuxing I always have to copy my original filenames to forward.fastq.gz and reverse.fastq.gz.

I sometimes have multiple files for a library and in that case it would be handy if one could use a manifest file to specify the files to load. One of the reasons would be easily exclude files with bad quality sequences

something like

qiime tools import --type MultiplexedPairedEndBarcodeInSequence --input-manifest ./manifest.tsv --output-path ./qiime_artifacts/xyz

where the manifest file would look something like

forward_reads                reverse_reads
<path>/F123_1_1.fastq.gz     <path>/F123_1_2.fastq.gz
<path>/F123_2_1.fastq.gz     <path>/F123_2_2.fastq.gz
<path>/F123_3_1.fastq.gz     <path>/F123_3_2.fastq.gz

1 Like

Hello @fenny,

What do you do now instead when you have multiple multiplexed files?

We currently have to concatenate all forward files and all reverse files before feeding them to the import.

For large projects it would be handy to have a manifest files with library code as well. E.g :

forward_reads                reverse_reads               libraryNumber
<path>/F123_1_1.fastq.gz     <path>/F123_1_2.fastq.gz    L1
<path>/F123_2_1.fastq.gz     <path>/F123_2_2.fastq.gz    L1
<path>/F123_3_1.fastq.gz     <path>/F123_3_2.fastq.gz    L2

As far as I'm aware, demuxing should be done on a 'per library' basis, given that barcodes might be shared among libraries. In the case below Sample_1 and Sample_3 will become mixed in the current setup.

Adding library awareness should be able to demux the samples correctly based on the file manifest data and the demultiplexing sample sheet if it contains the library number

#SampleID	forwardBarcodeSequence	reverseBarcodeSequence	LibraryNumber
Sample_1	AACCAGAA	            AACCAGAA	            L1
Sample_2	AACCATGC	            AACCATGC	            L1
Sample_3	AACCAGAA	            AACCAGAA	            L2
Sample_4	AACCATGC	            AACCATGC	            L2

A setup like this would facilitate automated analysis across many samples sequenced over a larger amount of time, in one go.

Hi @fenny,
Thanks for your great description!
Would you mind opening up an issue on our github: GitHub - qiime2/q2-metadata? That way this will stay on our radar for when we have time to implement this.
Thank you again!