Hello everyone! I am new to doing bioinformatics stuff. I have little to no background on it and I am assigned to help a professor do some taxonomic analysis. I installed QIIME2 v2024.2 amplicon. FASTQ files are from American Gut Project and they are I believe Human Gut Metagenome, or Human Metagenome.
Here is the process I am going through:
Download FASTQ 600 files (paired end so 1200 FASTQ files R1, R2)
Creating a Demux file "CasavaOneEightSingleLanePerSampleDirFmt"
Trim paired-end
a. Forward Primer: "GTGCCAGCMGCCGCGGTAA"
b. Rear Primer: "GGACTACHVGGGTWTCTAAT"
I got the primer information on qiita when I received deblur error that I have to get rid of Phix or adapters from my sequences. I am not even sure if these primers are correct.
Merge the pairs
Run deblur process (trim length 125). I am getting mainly this error:
Plugin error from deblur:
No sequences passed the filter. It is possible the trim_length (125) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.
See above for debug info.
The other error is (This is what I get occasionally):
Plugin error from deblur:
max() arg is an empty sequence
See above for debug info.
I am using a class computer with a lot of cores. I have a python script that saves the files in each of them by their distinct ID based folders. And run each file separately on parallel so that I can troubleshoot each of them individually.
I haven't tried DADA2 process.
When I ran single end files, I didn't trim anything using cutadapt, or didn't have to merge anything. I simply ran deblur process and it ran just fine. I ran on 100 files just to test and all of them worked perfectly. The issue keeps happening to paired end files. I ran the process on 300 files on 3 cluster computers (100 each) but after 24 hours, I got successful output of like 10. A lot of them gave me error output (failed to pass 125 trim length) which I saved as a log file. A lot of the the others were still running until the cluster computing time ran out.
Any perspective on trimming or possible issues would be greatly appreciated.
I have attached two more demux.qzv files that failed to run deblur.
All the failed ones that I have attached, gave me this error:
Plugin error from deblur:
No sequences passed the filter. It is possible the trim_length (125) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.
It looks like some samples do not have sequences long enough for the trim length specified. The primers have already been removed for these data so that is not be needed. I've never attempted merging R1/R2 for these data -- is it possible that one or a few of the samples had no successfully stitched reads?
For context, I'm the Scientific Director for the American Gut Project.
Please note too that if you'd like pre-computed Deblur feature tables, they can be obtained using redbiom against study 10317.
Thank you for your reply. I was able figure out the issue. I was trying to deblur paired ends but same subject already has single end files and running the single end files simply solved all the problem.
I am facing same issue and your expertise and insights on the sequencing data challenges are truly appreciated. Your proactive approach and offer of pre-computed Deblur feature tables showcase your dedication. Thank you for your valuable contributions to the American Gut Project.