VCF Filter: Available Documentation

This modules provides a form interface so users can custom filter existing VCF files and export in a variety of formats. The form simply provides an interface to VCFtools and uses the Tripal Download API to provide the filtered file to the user.

Features

  • User “Filter VCF” form providing well documented filter options (includes examples) and a variety of formats.
    • Basic filter options include: Only bi-allelic SNPs, Minimum SNP Call Read Depth, Minor Allele Frequency, Maximum Missing Count, Maximum Missing Frequency.
    • More filter options include: regions and germplasm.
    • Export Formats include: VCF, Quality Matrix (read depth only), A/B Biparental Matrix, Hapmap, Bgzipped VCF.
  • All filtering and format conversion is done within a Tripal Job to support large files.
  • Administrative interface for exposing VCF files to users. Extensive configuration options allow comprehensive description of each VCF file, which can offer great user experience.
    • In addition to specifying the path to the VCF file to expose, record helpful information like a friendly name, assembly aligned to, number of SNPs.
    • The information of the methods used in generating each VCF file, a statistic summary and more description can be included.
    • All germplasm names and Chromosome name format can be included as more helpful information.
  • Per VCF file permissions allowing you to restrict access to a given file to specific users or roles.

Various Filter Options

Many filter options are available in this module. Each filter option is well documented with description, example, or even warning as users may not familiar with all filter options.

Restrict dataset to specific germplasm or regions

  • This section will be collasped if no file is selected.
  • Germplasm names from the file are provided to the user, who can then make changes and copy those they want to the textarea below.
  • Users can follow the example format provided to keep only sites in one specific region or multiple regions.
  • Help information can be configured to improve user experience.
_images/filter_options.1.RegGerm.png

Basic Filtering Options

Basic filter options include:
  • Bi-allelic
  • Read Depth
  • Minor Allele Frequency
  • Site Missing Count
  • Site Missing Frequency
_images/filter_options.2.basic.png

Note

Filter of VCF files is achieved by using bioinoformatic tool VCFtools.

Configuration Options

As shown in the screenshot below, a particular description is given to a file to help users. It is achieved by the configuration options in VCF Filter:
  • name of the file, assembly it was aligned to and the number of SNPs
  • a description which could include a basic introduction, but also details of the file
  • a statistic summary could be included to give user a intuitive expression for choosing filter criterias
  • chromosome name format can be provided for filter with regions
  • germplasm names are provided for filter with specific germplasm
_images/configuration_options.1.display.png

Restrict Access by Permissions

Per file access can be managed in Home » Administration » Tripal » Extensions » VCF Filter.

_images/restrick_access.1.png

Installation

Note

It is recommended to clear cashes regularly in this installation processes.

Download VCF Filter

The module is availabe as one repository for Pulse Bioinformatics, University of Saskatchewan on GitHub. Recommended method of downloading and installation is using git:

cd [your drupal root]/sites/all/modules

git clone https://github.com/UofS-Pulse-Binfo/vcf_filter.git

Dependencies

Required dependencies for VCF Filter
  • Tripal Core (utilizes the Tripal API)
  • Tripal Donwload API

We can check status of modules in “Home » Administration » Tripal » Modules”.

_images/install.2.dependency.png

In this example, it is clear that Trpdownload_api is required but not available in system. Trpdownload_api is availabe on GitHub, and can be installed with following commands:

cd [your drupal root]/sites/all/modules

git clone https://github.com/tripal/trpdownload_api.git

drush pm-enable trpdownload_api

Note

VCFtools is required for VCF Filter.

Enable VCF Filter

After all dependencies are installed and enabled, VCF Filter can be enabled to use in “Home » Administration » Tripal » Modules” of your site.

Also, VCF Filter can be enabled by drush command:

drush pm-enable vcf_filter

This command will enable the module after which we should able to find it in Home » Administration » Tripal » Extensions.

_images/install.1.menubar.png

Configuration

The module can be configured in Home » Administration » Tripal » Extensions » VCF Filter by edit a file.

Required information for Adding a file

Only site admins can configure VCF Filter in Home » Administration » Tripal » Extensions » VCF Filter. The following information is required for adding a VCF file:
  • Absolute path of the file
  • Human-readable Name
  • Number of SNPs (sites) of the file
  • Backbone
_images/required_info.1.blank.png

Optional information for Adding a file

The module can work without optional configuration, but it is highly recommended to provide it for better user experience. Instructions are provided for each configuration option.

The following screenshot is an example:

_images/optional_info.1.filled.png

Description

What we could include in description:
  • Backgroud information about project/experiment and researchers/institution could help for better understanding of the file
  • Bioinformatic tools and correlated parameters that have been applied in generating the VCF file
  • Number of germplasm (individuals) included in the file, and names for maternal parent and paternal parent
  • A filter criteria related statistic summary (the summary in example can be generated by a PHP script)

Germplasm From Header

The names of all germplasm (individuals) in this vcf file. The germplasm list must be new line separated without any header or empty lines.

Note

If this textarea is not filled, the module is able to find the list from selected VCF fiels. However, waiting time of extracting germplasm list from a selected file can be sifnificant for large VCF files. Loading time for a 10G VCF file will be about 3 seconds.

Since the germplasm list can be generated, it’s not necessary to generate such a list for configuration otherwise. We can leave this section blank, select this file and copy generated list back to configuration.

Chromosome format

  • Chromosome name can have various format, for example, chromosome 1 for one lentil cultivar could be chr1, Chr1, CHR1, LcChr1, Lcchr, and so on. Therefore, it is important to provide this information so users can filter vcf file by regions properly.

Test before Publication

An comprehensive test of your configuration is recommended before making this module public to users. Some good things to check include:
  • test if all files added are downloadable
  • test if download files have proper contents
  • test if accesses are given to proper groups and/or individuals

Note

It is recommended to give permissions to site admins for testing before release.

Note

We appreciate if you can report issues found while using this module. You can reach us at knowpulse@usask.ca or report the issue on GitHub. It will be more appreciated if you can include screenshots and an informative descrition of the issue.

Thank you for using VCF Filter!

Have a wonderful day!

After configuration, description of one file can be very informative and helpful for filtering options.

_images/configuration.1.example.png