Using the 'reads-deblur' that are not 'unique-reads-hit-reference'

Tyler_Carrier · April 2, 2019, 4:34pm

Hello,

I am attempting to assign taxa to my ITS reads. I, currently, have between ~2,000 and ~11,000 reads (and ~500,000 total) for my samples post-deblur but only one for a single sample is a 'unique-reads-hit-reference,' leaving only that sample with reads in the RepSeq output file.

What approach can be taken to use these 500,000 reads that don't hit reference?

My q2 code used for this was:
qiime deblur denoise-other
--i-demultiplexed-seqs "reads post pair, qc, and name modificiation"
--p-trim-length 400
--i-reference-seqs "sh_refs_qiime_ver8_99_02.02.2019.qza"
--o-representative-sequences "RepSeq.qza"
--o-table "Table.qza"
--p-sample-stats
--o-stats "Stats.qza"

I've also been trying to use "qiime feature-classifier classify-consensus-blast" but can't find which file should be used for "--I-query"

Regards,
Tyler

wasade · April 4, 2019, 8:20pm

Hi @Tyler_Carrier,

Right now, you'd need to run deblur directly (detail here) and not through qiime2. One of the outputs it produces is the "all" table, which does not apply the positive filtering. This will require qiime tools export of your demultiplexed data, and the resulting .biom files should be reimportable through qiime tools import. Happy to provide a concrete example if you'd like.

One reason so many sequences may be getting dropped out us that the positive filter reference sequences aren't reflective if the ITS region in your data?

All the best,
Daniel

Tyler_Carrier · April 8, 2019, 7:54pm

Hi Daniel,

Ah, yes that was what I feared would have to be done. The animal I am working with (as well as many relatives) has certainly not been sequenced before and the only common hits to UNITE are super common fungi.

I assume that Deblur would have to be run in R, or can this be ran on the command line? I ask because I have no experience with R. if the former, could you please provide a concrete example.

Regards,
Tyler

Nicholas_Bokulich · April 8, 2019, 8:15pm

I suspect the sequences that are being thrown out could be non-fungal; most fungal ITS primers also amplify non-fungal DNA, e.g., host and dietary (plant, animal) ITS sequences, as well as non-fungal eukaryotes living in the gut.

deblur only uses the reference sequences for a rough positive filter — effectively, anything that looks remotely similar to the reference database will be kept and only very dissimilar sequences will be thrown out. If your sequences really were fungal, they should still pass even if you expect many uncharacterized fungal families or genera to be present! So either your primers are mostly amplifying non-fungal DNA, or as @wasade suggested:

(that does not seem likely, since UNITE covers both ITS1 and ITS2, or at least a mix of both, so this should be pretty straightforward)

https://github.com/biocore/deblur#example-usage

Try this file:

wasade · April 8, 2019, 11:40pm

Thanks @Nicholas_Bokulich! I agree the most likely reason here is that non-fungal DNA are being picked up.

@Tyler_Carrier, deblur can be run from the command line within a QIIME 2 environment, but you'll need to qiime tools export your demultiplexed sequence artifact first.

(base) 16:25:27 (dtmcdonald@here):~$ source activate qiime2-2019.1
(qiime2-2019.1) 16:25:33 (dtmcdonald@here):~$ deblur workflow --help | head
Usage: deblur workflow [OPTIONS]

  Launch deblur workflow

Options:
  --seqs-fp PATH                  Either a Demultiplexed FASTA or FASTQ file
                                  including all samples, or a directory of
                                  per-sample FASTA or FASTQ files. Gzip'd
                                  files are acceptable (e.g., .fastq.gz).
                                  [required]
(qiime2-2019.1) 16:26:14 (dtmcdonald@here):~$

system · May 10, 2019, 5:40am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.