VCF Filter: Available Documentation¶
This modules provides a form interface so users can custom filter existing VCF files and export in a variety of formats. The form simply provides an interface to VCFtools and uses the Tripal Download API to provide the filtered file to the user.
Features¶
- User “Filter VCF” form providing well documented filter options (includes examples) and a variety of formats.
- Basic filter options include: Only bi-allelic SNPs, Minimum SNP Call Read Depth, Minor Allele Frequency, Maximum Missing Count, Maximum Missing Frequency.
- More filter options include: regions and germplasm.
- Export Formats include: VCF, Quality Matrix (read depth only), A/B Biparental Matrix, Hapmap, Bgzipped VCF.
- All filtering and format conversion is done within a Tripal Job to support large files.
- Administrative interface for exposing VCF files to users. Extensive configuration options allow comprehensive description of each VCF file, which can offer great user experience.
- In addition to specifying the path to the VCF file to expose, record helpful information like a friendly name, assembly aligned to, number of SNPs.
- The information of the methods used in generating each VCF file, a statistic summary and more description can be included.
- All germplasm names and Chromosome name format can be included as more helpful information.
- Per VCF file permissions allowing you to restrict access to a given file to specific users or roles.
Various Filter Options¶
Many filter options are available in this module. Each filter option is well documented with description, example, or even warning as users may not familiar with all filter options.
Restrict dataset to specific germplasm or regions¶
- This section will be collasped if no file is selected.
- Germplasm names from the file are provided to the user, who can then make changes and copy those they want to the textarea below.
- Users can follow the example format provided to keep only sites in one specific region or multiple regions.
- Help information can be configured to improve user experience.

Configuration Options¶
- As shown in the screenshot below, a particular description is given to a file to help users. It is achieved by the configuration options in VCF Filter:
- name of the file, assembly it was aligned to and the number of SNPs
- a description which could include a basic introduction, but also details of the file
- a statistic summary could be included to give user a intuitive expression for choosing filter criterias
- chromosome name format can be provided for filter with regions
- germplasm names are provided for filter with specific germplasm

Restrict Access by Permissions¶
Per file access can be managed in Home » Administration » Tripal » Extensions » VCF Filter.

Installation¶
Note
It is recommended to clear cashes regularly in this installation processes.
Download VCF Filter¶
The module is availabe as one repository for Pulse Bioinformatics, University of Saskatchewan on GitHub. Recommended method of downloading and installation is using git:
cd [your drupal root]/sites/all/modules
git clone https://github.com/UofS-Pulse-Binfo/vcf_filter.git
Dependencies¶
- Required dependencies for VCF Filter
- Tripal Core (utilizes the Tripal API)
- Tripal Donwload API
We can check status of modules in “Home » Administration » Tripal » Modules”.

In this example, it is clear that Trpdownload_api is required but not available in system. Trpdownload_api is availabe on GitHub, and can be installed with following commands:
cd [your drupal root]/sites/all/modules
git clone https://github.com/tripal/trpdownload_api.git
drush pm-enable trpdownload_api
Note
VCFtools is required for VCF Filter.
Enable VCF Filter¶
After all dependencies are installed and enabled, VCF Filter can be enabled to use in “Home » Administration » Tripal » Modules” of your site.
Also, VCF Filter can be enabled by drush command:
drush pm-enable vcf_filter
This command will enable the module after which we should able to find it in Home » Administration » Tripal » Extensions.

Configuration¶
The module can be configured in Home » Administration » Tripal » Extensions » VCF Filter by edit a file.
Required information for Adding a file¶
- Only site admins can configure VCF Filter in Home » Administration » Tripal » Extensions » VCF Filter. The following information is required for adding a VCF file:
- Absolute path of the file
- Human-readable Name
- Number of SNPs (sites) of the file
- Backbone

Optional information for Adding a file¶
The module can work without optional configuration, but it is highly recommended to provide it for better user experience. Instructions are provided for each configuration option.
The following screenshot is an example:

Description¶
- What we could include in description:
- Backgroud information about project/experiment and researchers/institution could help for better understanding of the file
- Bioinformatic tools and correlated parameters that have been applied in generating the VCF file
- Number of germplasm (individuals) included in the file, and names for maternal parent and paternal parent
- A filter criteria related statistic summary (the summary in example can be generated by a PHP script)
Germplasm From Header¶
The names of all germplasm (individuals) in this vcf file. The germplasm list must be new line separated without any header or empty lines.
Note
If this textarea is not filled, the module is able to find the list from selected VCF fiels. However, waiting time of extracting germplasm list from a selected file can be sifnificant for large VCF files.
Loading time for a 10G VCF file will be about 3 seconds.
Since the germplasm list can be generated, it’s not necessary to generate such a list for configuration otherwise. We can leave this section blank, select this file and copy generated list back to configuration.
Chromosome format¶
- Chromosome name can have various format, for example, chromosome 1 for one lentil cultivar could be chr1, Chr1, CHR1, LcChr1, Lcchr, and so on. Therefore, it is important to provide this information so users can filter vcf file by regions properly.
Test before Publication¶
- An comprehensive test of your configuration is recommended before making this module public to users. Some good things to check include:
- test if all files added are downloadable
- test if download files have proper contents
- test if accesses are given to proper groups and/or individuals
Note
It is recommended to give permissions to site admins for testing before release.
Note
We appreciate if you can report issues found while using this module. You can reach us at knowpulse@usask.ca or report the issue on GitHub. It will be more appreciated if you can include screenshots and an informative descrition of the issue.
Thank you for using VCF Filter!
Have a wonderful day!
After configuration, description of one file can be very informative and helpful for filtering options.
