I’m attempting to import demultiplexed sequence data from QIIME 1.9.1’s split_libraries_fastq.py, either the seqs.fna or the seqs.fastq. I’ve taken a few stabs at qiime tools import but haven’t figured out anything yet. Is this type of import possible at this time?
The reason I’d like to import at this stage is because I’m sourcing data from Qiita where the data have already been demultiplexed. It is possible to go back to the raw files but I’d prefer to avoid that if possible. The reason I’m not sourcing precomputed BIOM tables is because I want to take the data and run it through both q2-deblur and q2-dada2.
I had an out-of-band chat with @ebolyen and it sounds like this may not be currently possible.
As a stop-gap, below is a simple script which takes an existing seqs.fastq file from split_libraries_fastq.py and creates a directory which can be imported as a demux object. IMPORTANT this assumes your data are Illumina 1.8 (i.e., PHRED+33; most likely the case), and assumes the data are forward read only. The script can also consume seqs.fasta, however at this time, there is not a SequencesWithoutQuality property for the SampleData semantic type.
To use:
$ # switch to an environment with QIIME 1.9.1
$ sh make_importable.sh <path/to/your/seqs.fastq>
$ # switch to a QIIME2 environment
$ qiime tools import --input-path q2_importable --type SampleData[SequencesWithQuality] --output-path demux
The resulting demux.qza can be fed into either q2-deblur or q2-dada2.
The make_importable.sh script is below.
set -e
input=$1
mkdir -p q2_importable
if [[ ${input: -6} == ".fasta" ]];
then
format=fasta
else
format=fastq
fi
split_sequence_file_on_sample_ids.py -i ${input} --file_type ${format} -o q2_importable
pushd q2_importable > /dev/null
echo "sample-id,filename,direction" > MANIFEST
if [[ ${format} == 'fastq' ]];
then
# this is not universally true, but likely is accurate
echo "{phred-offset: 33}" > metadata.yml
fi
for f in *.${format}
do
# there are filename expectations, so we need to munge to conform
filename=${f}_IGNORED_L000_R1_001.${format}
mv ${f} ${filename}
gzip ${filename}
echo "$(basename ${f} .${format}),${filename}.gz,forward" >> MANIFEST
done
popd > /dev/null