UGPp

UGPp is a lightweight NGS pipeline, created for the Utah Genome Project (UGP)

Download as .zip Download as .tar.gz View on GitHub

UGPp - UGP pipeline

VERSION This document describes version 1.4.0

SYNOPSIS
./UGPp -cfg file.cfg -il interval_file > command_review.txt
./UGPp -cfg file.cfg -il interval_file --run
./UGPp -cfg file.cfg -il interval_file -f file_list --run
./UGPp -ec
./UGPp -c

DESCRIPTION UGPp is a NGS pipeline created for the Utah Genome Project.

Currently UGPp incorporates the following tools:

FastQC

BWA

Samblaster

Sambamba

SAMtools

GATK 3.0+

Tabix

WHAM

GATKs best practices (with some modifications) are followed throughout this pipeline please refer to their site and the UGP wiki for more information.

INSTALLATION Perl Modules: which UGPp requires and can installed using the provided Build script.

Moo

Parallel::ForkManager

Config::Std

IPC::System::Simple

External required software:

FastQC

BWA

samblaster

Sambamba

SAMtools

GATK

R

Tabix

WHAM

CONTENTS
Each script contain usage statements.
UGPp's file structure:

UGPp/data:

exome.analysis.sequence.index

UGPp.cfg - This is the main template configure file use with machine_config. This will create a new config template per server/system.

Region_File/exon_Region.list - Region list files for Exome.
Region_File/Region.list - Region list files for WG.
UGPp/bin:

UGPp - main script

machine_config - Interactive script which creates a hostname.cfg config file, which contains all path locations for a given machine.
project_config - Can use the above generated config file as a template for each dataset to run.
UGPp/bin/ugp_tools:

RegionMaker

Thousand_genome_recreator.pl

Thousand_genome_all_individuals.pl

UGP_Result_Cleanup.pl

RUNNING UGPp:
After downloading and installing all dependences, a typical setup and run would follow these steps:

Setting up the config file:
machine_config has been created to help complete new configure files as needed. Often many of the values in the config file can be set on a per-machine basis, creating essentially a new master file (hostname.cfg). Examples of these would be known indel files, VQSR VCFs, BAM background files, and software paths. project_config is used to create a new project based on the master file information. The script will output a new .cfg file with fastq paths, output directories, worker number and ugp_id.

Running UGPp:
It is recommend that the unix command screen be used. When UGPp runs it will create a number of log, list, error and report file. One of these will be PROGRESS. This file will keep track of each step of the order process, and is one that will be used and reviewed often throughout the pipeline; typically if you have failed runs. Furthermore, a .log-txt file will be generated which will keep track of the times of each command and the command lines used. Error tracking is usually done by reviewing error, log, progress and report files.

INCOMPATIBILITIES
None know, although not tested on Microsoft or OSX.
BUGS AND LIMITATIONS
Please report any bugs or feature requests to: shawn.rynearson@gmail.com
AUTHOR
Shawn Rynearson <shawn.rynearson@gmail.com>