Skip to content

Project 9: Selection of tag SNPs for an African SNP array by LD and haplotype based methods #2

@ttimbers

Description

@ttimbers

Project:
The genetic diversity in Africa is immense. The diversity across the entire continent has not previously been captured on any commercial SNP array to date. Developing a cost-efficient and representative genotype array with SNPs that provide good coverage across the African continent is key to conducting large-scale medical genetic studies in Africa I propose a project in which we write a SNP selection algorithm applicable to whole genome sequence (WGS) data. This will involve writing an algorithm that chooses SNPs that tag other SNPs most efficiently across individuals from several African populations. This will be applied in conjunction with a commercial lists of pre-approved SNPs, lists of SNPs of general interest and various lists with ranking of SNPs in order to select a set of tag SNPs that can be put on a commercial SNP array. We intend to write the code in Python. Other tag SNP selection algorithms exist, but none of these are geared towards handling WGS data efficiently. By making use of random access to block gzipped files we intend to write a memory efficient algorithm applicable to WGS data. We envision this algorithm being used in combination with existing imputation methods to make use of haplotype (multi-marker) tagging in addition to simple pairwise LD. We aim to provide a fully functional piece of software and a list of tag SNPs at the end of hackseq.

Project Lead: Tommy Carstensen / @tommycarstensen / Bioinformatician / Wellcome Trust Sanger Institute

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions