Working with extracted 16S fastq files

Shuqi · April 12, 2023, 8:00pm

Hi,

I have a set of shotgun metagenomics data and I used sortmeRNA to extract the 16S region out and started to work with it using QIIME2. Thanks to this post (Problem with importing data) I was able to import the extracted fastq file into QIIME2, and my QIIME2 version is 2022.8.

Could you give me some suggestions of how to work with this type of data in qiime2 please? I'm involved in a project where there were 6 cohorts of gut microbiome samples from mice, but only one cohort had shotgun data instead of 16S. So I am trying to work with this extracted 16S to make the microbiome compositions comparable among all cohorts.

So far after importing the files using --

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path import-manifest-file.tsv --output-path paired-end-demux.qza --input-format PairedEndFastqManifestPhred33V2

looking into the visualization of imported file, the seqs are of pretty high quality score on both the forward and reverse reads. It make sense because those reads were extracted after quality control steps in the kneaddata pipeline.

I then tried to join the pairs in q2-vsearch so I might be able to proceed to otu-picking following Clustering sequences into OTUs using q2-vsearch

qiime vsearch join-pairs --i-demultiplexed-seqs sortmerna-mac-out-paired-end-demux.qza --p-truncqual 20 --o-joined-sequences vsearch-joined-pairs.qza

But the log indicated most of the reads could not be merged

Running external command line application. This may print messages to stdout
and/or stderr.
The command being run is below. This command cannot be manually re-run as it
will depend on temporary files that no longer exist.

Command: vsearch --fastq_mergepairs /var/folders/5r/s_dfhmpd21q479v2lf5cz3g40
000gp/T/qiime2/sl1070/data/3056ef3e-b376-42f7-9a80-aca5f4517264/data/FF054657
19_0_L001_R1_001.fastq.gz --reverse /var/folders/5r/s_dfhmpd21q479v2lf5cz3g40
000gp/T/qiime2/sl1070/data/3056ef3e-b376-42f7-9a80-aca5f4517264/data/FF054657
19_35_L001_R2_001.fastq.gz --fastqout /var/folders/5r/s_dfhmpd21q479v2lf5cz3g
40000gp/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-3sn8tlbn/FF05465719_0_L0
01_R1_001.fastq --fastq_ascii 33 --fastq_minlen 1 --fastq_minovlen 10 --fastq
_maxdiffs 10 --fastq_qmin 0 --fastq_qminout 0 --fastq_qmax 41 --fastq_qmaxout
 41 --fasta_width 0 --fastq_truncqual 20 --threads 1

vsearch v2.21.1_macos_x86_64, 32.0GB RAM, 10 cores
https://github.com/torognes/vsearch

Merging reads 100%
     54571  Pairs
         1  Merged (0.0%)
     54570  Not merged (100.0%)

Pairs that failed merging due to various reasons:
      2660  reads too short (after truncation)
     47442  too few kmers found on same diagonal
      4460  alignment score too low, or score drop too high
         1  overlap too short
         7  staggered read pairs

Statistics of all reads:
    139.56  Mean read length

Statistics of merged reads:
    254.00  Mean fragment length
      0.00  Standard deviation of fragment length
      0.03  Mean expected error in forward sequences
      0.07  Mean expected error in reverse sequences
      0.07  Mean expected error in merged sequences
      0.00  Mean observed errors in merged region of forward sequences
      0.00  Mean observed errors in merged region of reverse sequences
      0.00  Mean observed errors in merged region

Any suggestions and instructions is appreciated!

Best,
Shuqi

colinbrislawn · April 16, 2023, 1:50pm

Hello Shuqi,

As you have noticed, most of the QIime2 tutorials use 16S V4 data, but there are plugins for working with shotgun data just like you have! Check out q2-shogun and this discussion of upcoming shotgun data support within Qiime2.

Your import and quality control looks good! OTU clustering was built for amplicons, so it may not be a great fit for shotgun reads.

Shuqi · April 18, 2023, 1:22am

Hi Colin,

Thank you so much for your suggestion! I plan to try the q2-shotgun plugin. But I have noticed it requires

--i-query ARTIFACT PATH FeatureData[Sequence]

Do you know a way to import fast sequences as "FeatureData[Sequence]"?

So far I have tried below three methods but all didn't work out.

(qiime2-2022.8) sl@SHUQI-only-one-U 01-Input % qiime tools import --input-path sortmerna_rRNA/FF05465719_rRNA.fq --output-path FF05465719_rRNA.qza --type 'FeatureData[Sequence]'
There was a problem importing sortmerna_rRNA/FF05465719_rRNA.fq:

sortmerna_rRNA/FF05465719_rRNA.fq is not a(n) DNAFASTAFormat file:

First line of file is not a valid description. Descriptions must start with '>'

(qiime2-2022.8) sl@SHUQI-only-one-U 01-Input % qiime tools import --input-path sortmerna_rRNA/FF05465719_rRNA.fq --output-path FF05465719_rRNA.qza --type 'FeatureData[AlignedSequence]'
There was a problem importing sortmerna_rRNA/FF05465719_rRNA.fq:

sortmerna_rRNA/FF05465719_rRNA.fq is not a(n) AlignedDNAFASTAFormat file:

First line of file is not a valid description. Descriptions must start with '>'

(qiime2-2022.8) sl@SHUQI-only-one-U 01-Input % qiime tools import --input-path sortmerna_rRNA/FF05465719_rRNA.fq --output-path FF05465719_rRNA.qza --type 'FeatureData[PairedEndRNASequence]'
There was a problem importing sortmerna_rRNA/FF05465719_rRNA.fq:

Importing 'PairedRNASequencesDirectoryFormat' requires a directory, not sortmerna_rRNA/FF05465719_rRNA.fq

Best,
Shuqi

colinbrislawn · April 18, 2023, 2:33pm

This file uses the .fq file extension, which is short for .fastq. This includes feature name, sequence, and quality score.

Depending on how you want to do this, you can import those Fastq files with their quality scores, or convert fastq to fasta by dropping the quality scores, then import using the commands you had above.

Shuqi · April 21, 2023, 4:33pm

Hi Colin,

Thanks so much! I wonder is there a way to import multiple fasta files into one FeatureData[Sequence] Artifact to work with in qiime2? For now I only managed to import my original multiple fastq files using "manifest" method as a type of "SampleData[PairedEndSequencesWithQuality]" and I don't know how to convert it to a FeatureData[Sequence] artifact so that I can continue the analysis in q2-shogun plugin.

Best,
Shuqi

Oddant1 · April 25, 2023, 5:36pm

Hello @Shuqi. You can combine all of the fasta files you want in a single artifact into one large fasta file and then import that; however, q2-shogun is pretty out of date at this point. @Nicholas_Bokulich's lab is currently working on improved shotgun functionality through QIIME 2.

system · May 26, 2023, 11:37pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.