While importing my data, I encountered an error saying " These samples do not have matching pairs of forward and reverse reads" together with the IDs of the samples. I tried with a subset of the problematic samples, but it worked fine. Any ideas what the problem could be?
While importing my data, I encountered an error saying " These samples do not have matching pairs of forward and reverse reads" together with the IDs of the samples. I tried with a subset of the problematic samples, but it worked fine. Any ideas what the problem could be?
Could you tell me a little more about how your imported your data? Like, what command did you run and what files did you pass? Did your sequencing core or another researcher do any other filtering on the data before you imported it using Qiime 2?
So I am importing the data using a manifest file with this command:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest.tsv
--output-path demux.qza
--input-format PairedEndFastqManifestPhred64V2
I have already imported other files (from other cohorts in the same study) from the same sequencing center with a similar approach. I also tried with a toy subset of my files with a manifest and it worked just fine.
Because you are imported paired reads, Qiime assumes you have a forward and reverse file for each sample. It also assumes that the forward and reverse file has the same number of reads.
Looks like some of the files are missing reads from one or both ends. Lots of things could cause this, including a partial download, or a prefiltering step that removed reads from one file but not the other. (Some programs also check that the read names match, but I'm not sure if that's checked here.)
This is great! We know the script it working on your system. Now we just to figure out which files are missing reads and why.
Thank you.
The thing that I don't understand is that, I got the error for the following samples (which are not all of my files, but a big part of them):
{'13', '34', '33', '12', '1', '38', '32', '17', '24', '15', '10', '18', '19', '35', '26', '30', '27', '22', '21', '29', '14', '37', '36', '16', '20', '11', '3', '28', '25', '31', '23', '2'}
However, when I tried with the first five samples ('13', '34', '33', '12', '1'), it worked just fine. So, I was wondering what could be the reason that they popped up as problematic when importing all of the files, but they were fine separately?
Because when the error specifies file IDs, I assume there should be a problem with all of the reported IDs.
You could possibly have an "off-by-one" type of error, which basically cascades. Could you share your manifest file for us, maybe we can tune up the validation routine a bit to provide a cleaner error message. Thanks!
Yes, the sequencing center has already done some quality filtering (trimming, removal of adapters,...). However, I haven't had any problems with the reads of other cohorts that have been sequenced in the same center.