Importing demultiplexed sequence data from QIIME 1.9.1

wasade · February 9, 2017, 9:44pm

I'm attempting to import demultiplexed sequence data from QIIME 1.9.1's split_libraries_fastq.py, either the seqs.fna or the seqs.fastq. I've taken a few stabs at qiime tools import but haven't figured out anything yet. Is this type of import possible at this time?

The reason I'd like to import at this stage is because I'm sourcing data from Qiita where the data have already been demultiplexed. It is possible to go back to the raw files but I'd prefer to avoid that if possible. The reason I'm not sourcing precomputed BIOM tables is because I want to take the data and run it through both q2-deblur and q2-dada2.

wasade · February 10, 2017, 12:20am

I had an out-of-band chat with @ebolyen and it sounds like this may not be currently possible.

As a stop-gap, below is a simple script which takes an existing seqs.fastq file from split_libraries_fastq.py and creates a directory which can be imported as a demux object. IMPORTANT this assumes your data are Illumina 1.8 (i.e., PHRED+33; most likely the case), and assumes the data are forward read only. The script can also consume seqs.fasta, however at this time, there is not a SequencesWithoutQuality property for the SampleData semantic type.

To use:

$ # switch to an environment with QIIME 1.9.1
$ sh make_importable.sh <path/to/your/seqs.fastq>
$ # switch to a QIIME2 environment
$ qiime tools import --input-path q2_importable --type SampleData[SequencesWithQuality] --output-path demux

The resulting demux.qza can be fed into either q2-deblur or q2-dada2.

The make_importable.sh script is below.

set -e

input=$1
mkdir -p q2_importable

if [[ ${input: -6} == ".fasta" ]];
then
    format=fasta
else
    format=fastq
fi

split_sequence_file_on_sample_ids.py -i ${input} --file_type ${format} -o q2_importable

pushd q2_importable > /dev/null

echo "sample-id,filename,direction" > MANIFEST

if [[ ${format} == 'fastq' ]];
then
    # this is not universally true, but likely is accurate
    echo "{phred-offset: 33}" > metadata.yml
fi

for f in *.${format}
do
    # there are filename expectations, so we need to munge to conform
    filename=${f}_IGNORED_L000_R1_001.${format}
    mv ${f} ${filename}
    gzip ${filename}
    echo "$(basename ${f} .${format}),${filename}.gz,forward" >> MANIFEST
done
popd > /dev/null

thermokarst · March 15, 2018, 12:50pm

An off-topic reply has been split into a new topic: Importing seqs.fna

Please keep replies on-topic in the future.