Automatic Manifest Maker in R

Micro_Biologist · February 7, 2018, 2:35pm

Here I present my tutorial for making a 'Manifest.csv' automatically using an R conda environment, to be used to import data according to the " “Fastq manifest” formats" part of the Importing data tutorial.

I wrote this script because we intended on running a lot of analyses through QIIME2 and I did not want ot manually make a manifest file to import it every time.

This guide assumes you have a version of conda installed, and that files at names samplename.R1.fastq.gz for forward reads and samplename.R2.fastq.gz for reverse reads. Although it would be very easy to alter the code to work with similar naming schemes.

All your fastq.gz files must be in a folder called "Data" within your current directory.

First you must create an R environment for the script to run in, I did this so that it wouldn't interfere with the QIIME2 version of R. To do this run:

conda create -n R-Env -y

source activate R-Env

This creates a fresh conda environment for us to work in.

Install r-essentials and tidyverse package by running:

 conda install r-essentials -y

conda install -c r r-tidyverse -y

conda install -c r r-gdata -y

Now that the environment is set up you must make your R script.

library(tidyverse)

SamplesF <- list.files(path = "Data", pattern = "*.R1.fastq.gz", all.files = FALSE,
       full.names = TRUE, recursive = FALSE,
       ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

TabF <- as.data.frame(SamplesF)

PathF <- data.frame(lapply(TabF, function(TabF) {gsub("Data/", "$PWD/N/", TabF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("fastq.gz", "fastq.gip", PathF)}))

names(PathF)[names(PathF)=="SamplesF"] <- "absolute-filepath" 

PathF['direction']='forward'

PathF['sample-id']= SamplesF

PathF <- data.frame(lapply(PathF, function(PathF) {gsub("Data/", "sample-", PathF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub(".R1.fastq.gz", "", PathF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("fastq.gip", "fastq.gz", PathF)})) 
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("/N/", "/Data/", PathF)})) 

PathF <- PathF[,c(3,1,2)]


SamplesR <- list.files(path = "Data", pattern = "*.R2.fastq.gz", all.files = FALSE,
       full.names = TRUE, recursive = FALSE,
       ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

TabR <- as.data.frame(SamplesR)

PathR <- data.frame(lapply(TabR, function(TabR) {gsub("Data/", "$PWD/N/", TabR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("fastq.gz", "fastq.gip", PathR)}))

names(PathR)[names(PathR)=="SamplesR"] <- "absolute-filepath" 

PathR['direction']='reverse'

PathR['sample-id']= SamplesR

PathR <- data.frame(lapply(PathR, function(PathR) {gsub("Data/", "sample-", PathR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub(".R2.fastq.gz", "", PathR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("fastq.gip", "fastq.gz", PathR)})) 
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("/N/", "/Data/", PathR)})) 

PathR <- PathR[,c(3,1,2)]

Manifest <- rbind(PathF, PathR)

names(Manifest)[names(Manifest)=="sample.id"] <- "sample-id" 

names(Manifest)[names(Manifest)=="absolute.filepath"] <- "absolute-filepath" 

write_csv(Manifest, "Manifest.csv")

Essentially paste this code into a text editor and save in the home directory as "Taxonomy.R".
Then allow it to execute by performing:

chmod +x ~/Taxonomy.R

Whenever you need to make a manifest for your files run from your working directory:

source activate R-Env

~/Taxonomy.R

And your "Manifest.csv" file will appear as if by magic in your working directory ready for importing your data.

EDIT: Please ask any questions, and I hope I can post this here?!

thermokarst · September 20, 2018, 3:51pm

An off-topic reply has been split into a new topic: Problems with Automatic Manifest Maker (Rlang)

Please keep replies on-topic in the future.

JoseM · October 19, 2018, 10:26am

Hi,

Thank you very much for the script, just wanted to report that I also got the tidyverse conflict you reported here within the R environment

Good news are that it works great on R when run from rstudio with minimal chages, so thanks for sharing!!
Best regards,

Jose

Micro_Biologist · October 20, 2018, 8:10am

Just a heads up, I have made some minor changes to the documentation of this script, hosted here as I can no longer edit the OP.

github.com

Micro-Biology/BasicBashCode/blob/master/BasicScripts/Q2_Manifest_Maker.R

#This is now obsolete and I will not be updating it please see Q2_manifest_maker.py for making your own manifest.csv for qiime 2

#Q2ManifestMaker
#Here I present my tutorial for making a QIIME2 ‘Manifest.csv’ automatically using an R conda environment, to be used to import data according to the " “Fastq manifest” formats" part of the Importing data tutorial, on the QIIME2 website. 

#I wrote this script because we intended on running a lot of analyses through QIIME2 and I did not want to manually make a manifest file to import samples every time.  This guide assumes you have a version of conda installed, and that files at names samplename.R1.fastq.gz for forward reads and samplename.R2.fastq.gz for reverse reads. Although it would be very easy to alter the code to work with similar naming schemes.  All your fastq.gz files must be in a folder called “Data” within your current directory.



# 'Install' Instructions:

#If you already happen to have a conda environment with these R packages installed then you can skip this step if you change some of the code.

    #conda create -n R-Env r-essentials -y

    #conda activate R-Env

    #conda install -c r r-tidyverse -y

This file has been truncated. show original

Instructions are just hashed out in the actual script itself including install instructions ect

JoseM · October 22, 2018, 9:50am

Awesome, thank you for sharing!
Jose

Charlie · September 4, 2019, 12:48am

Hello, you may want to take a look at this plugin which generate metadata and manifest through folder structure :slight_smile
https://library.qiime2.org/plugins/qiime2-manifest-metadata-generator/23/