UGPp

UGPp is a lightweight NGS pipeline, created for the Utah Genome Project (UGP)

Download as .zip Download as .tar.gz View on GitHub

UGPp - UGP pipeline

VERSION This document describes version 1.4.0

SYNOPSIS
./UGPp -cfg file.cfg -il interval_file > command_review.txt
./UGPp -cfg file.cfg -il interval_file --run
./UGPp -cfg file.cfg -il interval_file -f file_list --run
./UGPp -ec
./UGPp -c

DESCRIPTION UGPp is a NGS pipeline created for the Utah Genome Project.

Currently UGPp incorporates the following tools:
  • FastQC
  • BWA
  • Samblaster
  • Sambamba
  • SAMtools
  • GATK 3.0+
  • Tabix
  • WHAM
GATKs best practices (with some modifications) are followed throughout this pipeline please refer to their site and the UGP wiki for more information.

INSTALLATION Perl Modules: which UGPp requires and can installed using the provided Build script.
  • Moo
  • Parallel::ForkManager
  • Config::Std
  • IPC::System::Simple
External required software: CONTENTS
Each script contain usage statements.
UGPp's file structure:

UGPp/data:
  • exome.analysis.sequence.index
  • UGPp.cfg - This is the main template configure file use with machine_config. This will create a new config template per server/system.
  • Region_File/exon_Region.list - Region list files for Exome.
  • Region_File/Region.list - Region list files for WG.
UGPp/bin:
  • UGPp - main script
  • machine_config - Interactive script which creates a hostname.cfg config file, which contains all path locations for a given machine.
  • project_config - Can use the above generated config file as a template for each dataset to run.
UGPp/bin/ugp_tools:
  • RegionMaker
  • Thousand_genome_recreator.pl
  • Thousand_genome_all_individuals.pl
  • UGP_Result_Cleanup.pl

RUNNING UGPp:
After downloading and installing all dependences, a typical setup and run would follow these steps:

Setting up the config file:
machine_config has been created to help complete new configure files as needed. Often many of the values in the config file can be set on a per-machine basis, creating essentially a new master file (hostname.cfg). Examples of these would be known indel files, VQSR VCFs, BAM background files, and software paths. project_config is used to create a new project based on the master file information. The script will output a new .cfg file with fastq paths, output directories, worker number and ugp_id.

Running UGPp:
It is recommend that the unix command screen be used. When UGPp runs it will create a number of log, list, error and report file. One of these will be PROGRESS. This file will keep track of each step of the order process, and is one that will be used and reviewed often throughout the pipeline; typically if you have failed runs. Furthermore, a .log-txt file will be generated which will keep track of the times of each command and the command lines used. Error tracking is usually done by reviewing error, log, progress and report files.

INCOMPATIBILITIES
None know, although not tested on Microsoft or OSX.
BUGS AND LIMITATIONS
Please report any bugs or feature requests to: shawn.rynearson@gmail.com
AUTHOR
Shawn Rynearson <shawn.rynearson@gmail.com>