Illumina MiSeq Paired End sequences (2X300bp) data preprocessing

Hi
Dear community members,

Sorry for this long posting!

I received fastq.gz files from sequencing company and a seperate excel file including the details of sample id, index7, index5 and noheader column as shown below

sample index7 index5
sample-1 CTCTCTAC CGTCTAAT N704-S506
sample-2 GGACTCCT AAGGAGTA N705-S507

when i open the fastq file, these index7 and index5 sequences are seen at the last in the header as shown below

From the sequencing company i learnt that the BCL(base calls) binary is converted into FASTQ using Illumina package bcl2fastq (in which, the barcodes are supposed to be removed), while stating that adapters are not trimmed from the reads.

So now i wonder that, when i look into the fastq files, the header section contains the index7 + index5,
but when i look into sequence, i am unable to distinguish the adapter and primer sequences.

The following where the adapters + primer sequences
V3-F : TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG+CCTACGGGNGGCWGCAG
V4-R: GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGG+ACTACHVGGGTATCTAATCC

So,considering that my sequences are without adapters (preprocessed), I tried all the QIIME2 steps including qiime cutadapt trim paired, which resulted in 0 trimed sequences followed by DADA2 deniosing and further steps and ended with 4851 features from 24 soil samples.

So whether, my analysis is right or am I missing any filter process.

Once again sorry for this long posting!

At the same time, when i tried with qiime1 steps like given below ended with 70,380 OTUS.

multiple_join_paired_ends.py, split_libraries_fastq.py,
identify_chimeric_seq.py, filter_fasta.py, pick_open_reference_otus.py using usearch and summary_taxa.py

So i doubt that my preprocessing steps are having faults. I would like know how to preprocess my fatsq files that contains index 7 and index 5 in header section. Also, I would like to know, whether the presence of index7 and index5 in the header is affects my end results.

Though , I´ve been following several tutorials found in forums, but I still don´t understand how to process this data. In addition, I do not know how to get the mapping file because of the structure of the sequences provided.

I would like to receive some help on preprocess these fastq files and to proceed further with QIIME1 and to import into QIIME2.

Thank you in advance!.

Thank you in advance.

1 Like

Good afternoon,

Long posts are good! More detail the better!

Is 24 the right number of samples? If so, then I think all is working as intended and you are good to go!

So let's answer some of the smaller questions.

No. The headers can be named anything, and some later steps in the process may rename your headers.

Qiime 1 OTU picking with uclust is very different than denoising with dada2 to make ASVs, so we expect OTU and ASV counts to be different. The biological conclusions should be similar, but ASVs are less noisy and have higher resolution.

As seen in the tutorials, the sample metadata file includes biological information (clinical/environmental) about your samples like timepoint, location, treatment, etc. Are your 24 samples identical, or are they from different groups? If so, you can list which samples are in which groups inside your sample metadata file, and then compare these groups with the qiime 2 plugins.
(Qiime 2 plugins are like Qiime 1 scripts.)

Let us know if you have any more questions!
Colin

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.