Bioinformatics Beginner Needs help

Hello,

I am just starting to learn bioinformatics and I will be using QIIME2 2018.2 to process my Miseq data. However, coming from basically no experience, I am very confused and frustrated :frowning: . I have a QIIME1 workflow from previous students in my lab, and I am trying to replicate the steps they did but in QIIME2. I am aware the commands are very different and some commands in QIIME2 combine and separate specific commands from QIIME1. That is where my problem is: I’m not sure if I am doing the right things in the right order. I did the Moving Pictures Tutorial and looked over the Atacama Soil Microbiome tutorial and it helped but I’m still lost.

If anyone can please help me that would be really great.

The QIIME1 workflow goes like this:

  • join paired ends with multiple_join_paired_ends.py
    -split libraries with multiple_split_libaries.py with a Qscore threshold of 20
    -identify and remove chimeras with identify_chimeric_seqs.py and filter.fasta.py
    -pick OTUs with pick_open_reference_otus.py
    -set denoise parameters with filter_samples_from otu_table.py

There’s more but that would be a good start for me. Also, my raw reads have already come demultiplexed from BaseSpace.

Anything helps, thank you!

Alan

Hi Alan

I figured I would see if I could help you get started. So enough has changed between QIIME 1 and QIIME2 that it’s probably best to start fresh…completely fresh and work with the QIIME2 tutorials right off the bat - let go of the QIIME1 workflow.

So basically - you have to know what type of data you have:

  1. Did you do paired end sequencing? or single end?
  2. Is your data phred64 or phred33 (more likely phred33 if it’s a recent Illumina instrument). You can always check with the sequencing group.
  3. You said your data is demultiplexed and on BaseSpace. So I’m assuming you have what then? Fastq files or fastq.gz files correct? QIIME2 can import both types.

Basically you start with the import data tutorial: https://docs.qiime2.org/2018.4/tutorials/importing/

Scroll down to “Fastq manifest” formats - you will need to build a manifest.csv file for import into QIIME2. Follow the format exactly…it’s really picky! If you make your manifest in excel be sure you are saving it as a comma separated file (.csv).

Once you have the manifest file scroll to the commands they give you on the tutorial - pick which to run based on whether you did paired end or single end sequencing and making sure you type in the correct phred.

For instance if you have paired end data that’s phred33 the command, per the tutorial, would be:

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path pe-33-manifest \
  --output-path paired-end-demux.qza \
  --source-format PairedEndFastqManifestPhred33

Notice this is identical to the command in the tutorial with the exception that I am using phred33 in the source format, not phred64 and my input path is a pe-33-manifest.

Once you have your .qza - go back to the Moving Pictures Tutorial and start with the second command in demultiplexing sequences - the one that starts with qiime demux summarize this will produce a .qzv file you can look at using the qiime tools view command. This will allow you to assess your data and determine the best trimming/truncating parameters for the next steps.

And continue through the tutorial with your own data that way…per your QIIME1 workflow, you will denoise your data and get rid of chimeras in QIIME2 using dada2 most likely - but again follow the tutorial and best of luck - I hope this helps.

2 Likes

Thank you @mmelendrez for your help!

I echo @mmelendrez's suggestion — it would be easier and better (from a methodological standpoint) to just rewrite a fresh workflow in QIIME2, fully availing the advantages of the new methods in QIIME2 that are not available in QIIME1.

Since it sounds like you have paired-end data, see this tutorial for examples.

And also check the importing tutorial to see which scheme is appropriate for your data (manifest format is the most universal format, as @mmelendrez advised).

Read joining and chimera checking are performed automatically as part of q2-dada (as described in that tutorial). If you do not want to use dada2, join reads as described [here](Alternative methods of read-joining in QIIME 2).

denoising (with dada2 or deblur) takes the place of OTU picking — it is much more effective for removing noisy sequences. If you insist on using OTU picking, use q2-vsearch for OTU picking and chimera filtering.

Check out the other tutorials and available plugins for other commands to use in your analysis downstream... we do not yet have a "qiime1 to qiime2" guide, but many of the command and tutorial names should be fairly clear... if you can't find anything or have more questions, you know where to find us :smile:

@Alan_Chan there’s a link that has floated on the forum before which attempts to compare Qiime1 to Qiime2 functions. To be honest I don’t know who’s it is and I’m not sure if its updated or not. But if you really wanted a comparison between the commands of Qiime1 vs Qiime2, it might be something worth looking at, though @mmelendrez and @Nicholas_Bokulich’s advise are still your best options.

2 Likes

@mmelendrez @Nicholas_Bokulich @Mehrbod_Estaki

Thanks a lot for the help!

In terms of using dada2, I am unsure of the trim/truncate options. I’ve been trying to read up on it but still don’t understand. Specifically, what is the difference between trim and truncate? And how do I decide how much trim/truncate? Any links or explanations would be greatly appreciated!

You trim from the 5' end of the read, and truncate at the 3' end.

See the moving pictures tutorial for some description. There has been a lot of discussion of this topic on the forum, as well — take a look for some examples and let us know if you have more questions.

I hope that helps!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.