I have 17 samples that I want to analyze in 3 goups or experiments (ExpA= 4 samples; ExpB= 8 samples and ExpC= 5 samples). Now I'd like to work with ExpA samples (4 samples).
The barcodes of these 4 samples are:
GAGTAGAC
GAGTAGTG
GAGTCACT
GAGTCAGA
I get them from the mapping.file provided by the sequence service.
The platform were came from the sequences is Illumina MiSeq and they are paired, so I have 2 files with the sequences from the 17 samples!!! That is why I want to cut the 4 samples from ExpA.
SAM1-17_S5_L001_R1_001.fastq and SAM1-17_S5_L001_R2_001.fastq are 2.1 Gb each. I get this 2 files from Illumina Basespace.
I extracted the ExpA sequences from the SAM..... R1 and SAM.....R2 files using the barcodes named before.
AHHHH!!! Now I'm realize!!! The extracts aren't in .fastq format!! Are only sequences... Sorry...
How I can get the barcode. gz file from the SAM....R1 and SAM....R2 files, in order to can go ahead?
I tried the Qiime1 comand:
extract_barcodes.py -f inseqs_R1.fastq -r inseqs_R2.fastq -c barcode_paired_end --bc1_len 6 --bc2_len 6 -o processed_seqs
And the results were:
Traceback (most recent call last):
File "/home/rosana/miniconda3/envs/qiime1/bin/extract_barcodes.py", line 4, in
import('pkg_resources').run_script('qiime==1.9.1', 'extract_barcodes.py')
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/pkg_resources/init.py", line 1438, in run_script
exec(code, namespace, namespace)
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/qiime-1.9.1-py2.7.egg-info/scripts/extract_barcodes.py", line 12, in
from skbio.util import create_dir
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/skbio/init.py", line 15, in
import skbio.io
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/skbio/io/init.py", line 309, in
import_module('skbio.io.clustal')
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/skbio/io/clustal.py", line 123, in
from skbio.alignment import Alignment
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/skbio/alignment/init.py", line 230, in
from ._alignment import Alignment, SequenceCollection, StockholmAlignment
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/skbio/alignment/_alignment.py", line 21, in
from skbio.stats.distance import DistanceMatrix
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/skbio/stats/init.py", line 45, in
from ._subsample import subsample, subsample_counts, isubsample
File "/home/rosana/miniconda3/envs/qiime1/lib/python2.7/site-packages/skbio/stats/_subsample.py", line 22, in
from .__subsample import _subsample_counts_without_replacement
File "init.pxd", line 155, in init skbio.stats.__subsample (skbio/stats/__subsample.c:3964)
ValueError: numpy.dtype has the wrong size, try recompiling
What can I do in order to demultiplex the 17 samples, but could work in 3 independent experiments of 4, 8 and 5 samples each?
I'll appreciate very much your help.
THANKS A LOT
Ro