Importing big data into qiime2

Hi,

I am using QIIME2 2021.2 in an Ubuntu 20.04LTS server and I am trying to import fastq files corresponding to 1000 samples that I have downloaded from Human Microbiome Project database.

I have tried to use the Manifest import with the following command:

qiime tools import --type ‘SampleData[PairedEndSequencesWithQuality]’ --input-path “path/MANIFEST” --output-path “path/reads” --input-format PairedEndFastqManifestPhred33V2

here was a problem importing ./MANIFEST.txt:

MANIFEST.txt is not a(n) PairedEndFastqManifestPhred33V2 file:

Filepath on line 1 and column “forward-absolute-filepath” could not be found (path/HMP2_J79832_1_ST_T0_B0_0120_ZM8YXDM-12_B64B9_S141_R1.fastq) for sample “HMP2_J79832_1_ST_T0_B0_0120_ZM8YXDM-12_B64B9_S141”

I have tried to import with the 64V2 format and received the same error.

Since I have many files with the typical Cassava labeling L001_R1.fastq I have tried the Cassava 1.8 import but it printed:

There was a problem importing …/input/:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: ‘.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz’

So, how can I know the Fastq format of my demultiplexed and uncompressed fastq files? It depends on the labeling of the files or it depends on the content?

An example of one of my files tittle is:

HMP2_J45119_1_ST_T0_B0_0120_ZRB0F6P-6016_B86HB_L001_R1_001.fastq

And the head of this file is:

@MISEQ:143:000000000-B86HB:1:1101:16852:1065 1:N:0:TTCTTGGGACAGTTCC
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACATGCAAGTCGAGGGGCAGCATGAACTTAGCTTGCTAAGTTTGATGGCGACCGGCGCACGGGTGAGTAACACGTATCCAACCTGCCGATGACTCGGGGATAGCCTTTCGAAAGAAAGATTAATACCCGATGGCATAGTTCTTCCGCATGGTAGAACTATTAAAGAATTTCGGTCATCGATGGGGATGCGTTCCATTAGGTTGTTGGCGGGGTAAAGGCCCACCAAGCCTTCGATGGATAGGGGTTCTGAGAGGAA
+
[email protected]GGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGFGGGGGGGGGGFFFGG>EGFDGCFGGGGGGGGGAFGGGFEGGGGFFGAGGC;@7:>8FGDDGGGGGGGGGGGGCGGGGGGGGGGFFGGGGGGGG>C9CG7788CF79<CFEGG=/9CDGGEC7+AFFG>FG=G6C<DGG5*:+++3;CEEGCFFC<3?F75*2:C4<),7>ECD<A44<A070(
@MISEQ:143:000000000-B86HB:1:1101:16870:1066 1:N:0:TTCTTGGGACAGTTCC
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACATGCAAGTCGAGGGGCAGCATGAACTTAGCTTGCTAAGTTTGATGGCGACCGGCGCACGGGTGAGTAACACGTATCCAACCTGCCGATGACTCGGGGATAGCCTTTCGAAAGAAAGATTAATACCCGATGGCATAGTTCTTCCGCATGGTAGAACTATTAAAGAATTTCGGTCATCGATGGGGATGCGTTCCATTAGGTTGTTGGCGGGGTAACGGCCCACCAAGCCTTCGATGGATAGGGGTTCTGAGAGGAA
+

In case that I have files with different formats, is it possible to import them in an easy way or how can I proceed?

Thank you very much for your help in advance.

Hi @apc,

Welcome to the :qiime2: forum!

The error message is telling you what's wrong:

The computer cannot find your file at th address, path/HMP2_J79832_1_ST_T0_B0_0120_ZM8YXDM-12_B64B9_S141_R1.fastq. You instead need to supply the full path. You can get this in linux by going into the folder where you have all your files and running pwd. Then, you ned to replace the "path" in all your locations with the full path.

Best,
Justine

2 Likes

I put path here to avoid showing my computer’s path, but the path is correct:

There was a problem importing ./MANIFEST.txt:

MANIFEST.txt is not a(n) PairedEndFastqManifestPhred64V2 file:

Filepath on line 1 and column “forward-absolute-filepath” could not be found (/home/Arnau/Diabetes/input/HMP2_J79832_1_ST_T0_B0_0120_ZM8YXDM-12_B64B9_S141_R1.fastq) for sample “HMP2_J79832_1_ST_T0_B0_0120_ZM8YXDM-12_B64B9_S141”.

I guess there is something related to the fastq format or I do not know… I have tried everything.

I have used qiime several times but never in a remote server so I do not know if the problem is related to that.

Thanks for the help in any case @jwdebelius

Hi @apc,

The message says file does not exist at that location. If that is the correct location, you probably want to check that you’re working in the correct location. If you’re working on a remote server, check and make sure the files are on that server. The server may not be able to see your files, so you’ll either need to transfer the data or load the file system, or whatever is appropriate for your system. (I’d check with your sys admins.)

Best,
Justine

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.