Hi,
I have a set of shotgun metagenomics data and I used sortmeRNA to extract the 16S region out and started to work with it using QIIME2. Thanks to this post (Problem with importing data) I was able to import the extracted fastq file into QIIME2, and my QIIME2 version is 2022.8.
Could you give me some suggestions of how to work with this type of data in qiime2 please? I'm involved in a project where there were 6 cohorts of gut microbiome samples from mice, but only one cohort had shotgun data instead of 16S. So I am trying to work with this extracted 16S to make the microbiome compositions comparable among all cohorts.
So far after importing the files using --
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path import-manifest-file.tsv --output-path paired-end-demux.qza --input-format PairedEndFastqManifestPhred33V2
looking into the visualization of imported file, the seqs are of pretty high quality score on both the forward and reverse reads. It make sense because those reads were extracted after quality control steps in the kneaddata pipeline.
I then tried to join the pairs in q2-vsearch so I might be able to proceed to otu-picking following Clustering sequences into OTUs using q2-vsearch
qiime vsearch join-pairs --i-demultiplexed-seqs sortmerna-mac-out-paired-end-demux.qza --p-truncqual 20 --o-joined-sequences vsearch-joined-pairs.qza
But the log indicated most of the reads could not be merged
Running external command line application. This may print messages to stdout
and/or stderr.
The command being run is below. This command cannot be manually re-run as it
will depend on temporary files that no longer exist.
Command: vsearch --fastq_mergepairs /var/folders/5r/s_dfhmpd21q479v2lf5cz3g40
000gp/T/qiime2/sl1070/data/3056ef3e-b376-42f7-9a80-aca5f4517264/data/FF054657
19_0_L001_R1_001.fastq.gz --reverse /var/folders/5r/s_dfhmpd21q479v2lf5cz3g40
000gp/T/qiime2/sl1070/data/3056ef3e-b376-42f7-9a80-aca5f4517264/data/FF054657
19_35_L001_R2_001.fastq.gz --fastqout /var/folders/5r/s_dfhmpd21q479v2lf5cz3g
40000gp/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-3sn8tlbn/FF05465719_0_L0
01_R1_001.fastq --fastq_ascii 33 --fastq_minlen 1 --fastq_minovlen 10 --fastq
_maxdiffs 10 --fastq_qmin 0 --fastq_qminout 0 --fastq_qmax 41 --fastq_qmaxout
41 --fasta_width 0 --fastq_truncqual 20 --threads 1
vsearch v2.21.1_macos_x86_64, 32.0GB RAM, 10 cores
https://github.com/torognes/vsearch
Merging reads 100%
54571 Pairs
1 Merged (0.0%)
54570 Not merged (100.0%)
Pairs that failed merging due to various reasons:
2660 reads too short (after truncation)
47442 too few kmers found on same diagonal
4460 alignment score too low, or score drop too high
1 overlap too short
7 staggered read pairs
Statistics of all reads:
139.56 Mean read length
Statistics of merged reads:
254.00 Mean fragment length
0.00 Standard deviation of fragment length
0.03 Mean expected error in forward sequences
0.07 Mean expected error in reverse sequences
0.07 Mean expected error in merged sequences
0.00 Mean observed errors in merged region of forward sequences
0.00 Mean observed errors in merged region of reverse sequences
0.00 Mean observed errors in merged region
Any suggestions and instructions is appreciated!
Best,
Shuqi