Automatic Manifest Maker in R

tutorial
import

(Jono Warren) #1

Here I present my tutorial for making a ‘Manifest.csv’ automatically using an R conda environment, to be used to import data according to the " “Fastq manifest” formats" part of the Importing data tutorial.

I wrote this script because we intended on running a lot of analyses through QIIME2 and I did not want ot manually make a manifest file to import it every time.

This guide assumes you have a version of conda installed, and that files at names samplename.R1.fastq.gz for forward reads and samplename.R2.fastq.gz for reverse reads. Although it would be very easy to alter the code to work with similar naming schemes.

All your fastq.gz files must be in a folder called “Data” within your current directory.

First you must create an R environment for the script to run in, I did this so that it wouldn’t interfere with the QIIME2 version of R. To do this run:

conda create -n R-Env -y

source activate R-Env

This creates a fresh conda environment for us to work in.

Install r-essentials and tidyverse package by running:

 conda install r-essentials -y

conda install -c r r-tidyverse -y

conda install -c r r-gdata -y

Now that the environment is set up you must make your R script.

library(tidyverse)

SamplesF <- list.files(path = "Data", pattern = "*.R1.fastq.gz", all.files = FALSE,
       full.names = TRUE, recursive = FALSE,
       ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

TabF <- as.data.frame(SamplesF)

PathF <- data.frame(lapply(TabF, function(TabF) {gsub("Data/", "$PWD/N/", TabF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("fastq.gz", "fastq.gip", PathF)}))

names(PathF)[names(PathF)=="SamplesF"] <- "absolute-filepath" 

PathF['direction']='forward'

PathF['sample-id']= SamplesF

PathF <- data.frame(lapply(PathF, function(PathF) {gsub("Data/", "sample-", PathF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub(".R1.fastq.gz", "", PathF)}))
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("fastq.gip", "fastq.gz", PathF)})) 
PathF <- data.frame(lapply(PathF, function(PathF) {gsub("/N/", "/Data/", PathF)})) 

PathF <- PathF[,c(3,1,2)]


SamplesR <- list.files(path = "Data", pattern = "*.R2.fastq.gz", all.files = FALSE,
       full.names = TRUE, recursive = FALSE,
       ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

TabR <- as.data.frame(SamplesR)

PathR <- data.frame(lapply(TabR, function(TabR) {gsub("Data/", "$PWD/N/", TabR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("fastq.gz", "fastq.gip", PathR)}))

names(PathR)[names(PathR)=="SamplesR"] <- "absolute-filepath" 

PathR['direction']='reverse'

PathR['sample-id']= SamplesR

PathR <- data.frame(lapply(PathR, function(PathR) {gsub("Data/", "sample-", PathR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub(".R2.fastq.gz", "", PathR)}))
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("fastq.gip", "fastq.gz", PathR)})) 
PathR <- data.frame(lapply(PathR, function(PathR) {gsub("/N/", "/Data/", PathR)})) 

PathR <- PathR[,c(3,1,2)]

Manifest <- rbind(PathF, PathR)

names(Manifest)[names(Manifest)=="sample.id"] <- "sample-id" 

names(Manifest)[names(Manifest)=="absolute.filepath"] <- "absolute-filepath" 

write_csv(Manifest, "Manifest.csv") 

Essentially paste this code into a text editor and save in the home directory as “Taxonomy.R”.
Then allow it to execute by performing:

chmod +x ~/Taxonomy.R

Whenever you need to make a manifest for your files run from your working directory:

source activate R-Env

~/Taxonomy.R

And your “Manifest.csv” file will appear as if by magic in your working directory ready for importing your data.

EDIT: Please ask any questions, and I hope I can post this here?!


Importing FASTQ
(Matthew Ryan Dillon) #2

An off-topic reply has been split into a new topic: Problems with Automatic Manifest Maker (Rlang)

Please keep replies on-topic in the future.


(Jose Miguel Seoane Redondo) #3

Hi,

Thank you very much for the script, just wanted to report that I also got the tidyverse conflict you reported here within the R environment


Good news are that it works great on R when run from rstudio with minimal chages, so thanks for sharing!!
Best regards,

Jose


(Jono Warren) #4

Just a heads up, I have made some minor changes to the documentation of this script, hosted here as I can no longer edit the OP.

Instructions are just hashed out in the actual script itself including install instructions ect


(Jose Miguel Seoane Redondo) #5

Awesome, thank you for sharing!
Jose