Here I present my tutorial for making a 'Manifest.csv' automatically using an R conda environment, to be used to import data according to the " “Fastq manifest” formats" part of the Importing data tutorial.
I wrote this script because we intended on running a lot of analyses through QIIME2 and I did not want ot manually make a manifest file to import it every time.
This guide assumes you have a version of conda installed, and that files at names samplename.R1.fastq.gz for forward reads and samplename.R2.fastq.gz for reverse reads. Although it would be very easy to alter the code to work with similar naming schemes.
All your fastq.gz files must be in a folder called "Data" within your current directory.
First you must create an R environment for the script to run in, I did this so that it wouldn't interfere with the QIIME2 version of R. To do this run:
conda create -n R-Env -y
source activate R-Env
This creates a fresh conda environment for us to work in.
Install r-essentials and tidyverse package by running:
conda install r-essentials -y
conda install -c r r-tidyverse -y
conda install -c r r-gdata -y
Now that the environment is set up you must make your R script.
library(tidyverse)
SamplesF <- list.files(path = "Data", pattern = "*.R1.fastq.gz", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
TabF <- as.data.frame(SamplesF)
PathF <- data.frame(lapply(TabF, function(TabF) {gsub("Data/", "$PWD/N/", TabF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("fastq.gz", "fastq.gip", PathF)}))
names(PathF)[names(PathF)=="SamplesF"] <- "absolute-filepath"
PathF['direction']='forward'
PathF['sample-id']= SamplesF
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("Data/", "sample-", PathF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub(".R1.fastq.gz", "", PathF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("fastq.gip", "fastq.gz", PathF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("/N/", "/Data/", PathF)}))
PathF <- PathF[,c(3,1,2)]
SamplesR <- list.files(path = "Data", pattern = "*.R2.fastq.gz", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
TabR <- as.data.frame(SamplesR)
PathR <- data.frame(lapply(TabR, function(TabR) {gsub("Data/", "$PWD/N/", TabR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("fastq.gz", "fastq.gip", PathR)}))
names(PathR)[names(PathR)=="SamplesR"] <- "absolute-filepath"
PathR['direction']='reverse'
PathR['sample-id']= SamplesR
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("Data/", "sample-", PathR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub(".R2.fastq.gz", "", PathR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("fastq.gip", "fastq.gz", PathR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("/N/", "/Data/", PathR)}))
PathR <- PathR[,c(3,1,2)]
Manifest <- rbind(PathF, PathR)
names(Manifest)[names(Manifest)=="sample.id"] <- "sample-id"
names(Manifest)[names(Manifest)=="absolute.filepath"] <- "absolute-filepath"
write_csv(Manifest, "Manifest.csv")
Essentially paste this code into a text editor and save in the home directory as "Taxonomy.R".
Then allow it to execute by performing:
chmod +x ~/Taxonomy.R
Whenever you need to make a manifest for your files run from your working directory:
source activate R-Env
~/Taxonomy.R
And your "Manifest.csv" file will appear as if by magic in your working directory ready for importing your data.
EDIT: Please ask any questions, and I hope I can post this here?!