HLA Variation Project
Purpose
The purpose of this project is to investigate the genetic variation of HLA genes and to determine levels of variation between genes in the Major Histocompatibilty Complex (MHC). This is important as it can help us understand the level of variation within the different genes.This is a small side project that I have been doing in order to apply and improve both my Bioinformatics and Python skills.
Academic Background
A number of genes within the MHC, such as HLA-A, HLA-B, HLA-C and HLA-DRB1, are amongst the most well-studied genes in the human genome due to their purpose in the immune ststem and impact on transplant compatibility. From extensive study these genes have been shown to have incredible levels of variation and have been the first genes to be described as hyperpolymorphic. Variants of these genes are recorded in the IPD-IMGT/HLA database and all allele names are maintained by the WHO Nomenclature Committee for Factors of the HLA System.
In contrast to this other genes within the MHC, such as HLA-E, HLA-F and HLA-G, are less well-studied and have previously been described as monomorphic or have low levels of polymorphism. Polymporphism has been found in these genes, but at a much lower level, shown by the low number of alleles found for these genes in the IPD-IMGT/HLA Database. This project aims to investigate the levels of variation in these genes in comparison to the more well-studied genes using data from the IPD-IMGT/HLA Database.
HLA variation is a key factor in the immune response and is important for understanding disease susceptibility and treatment outcomes. This variation can impact the ability of the immune system to recognize and respond to pathogens, as well as the effectiveness of immunotherapies. Understanding levels of variation between genes in the MHC can help us to better understand the genotypes of different populations and of those who are undergoing haematopoietic stem cell transplantation (HSCT).
Future Work
The current version of this code only produces Jsons containing bootstrapped subsampling data from IPD-IMGT/HLA alignment files. It currently does not calculate final statistics for direct comparison between genes.Code
The code for this project is publicly available in a GitHub Repository linked below. For instructions on using this tool please follow the README.md file.References
- Robinson J, Guethlein LA, Cereb N, Yang SY, Norman PJ, et al. (2017) Distinguishing functional polymorphism from random variation in the sequences of >10,000 HLA-A, -B and -C alleles. PLOS Genetics 13(6): e1006862. https://doi.org/10.1371/journal.pgen.1006862
- Robinson J, Malik A, Parham P, Bodmer JG, Marsh SGE. IMGT/HLA - a sequence database for the human major histocompatibility complex. Tissue Antigens. 2000 55:280-7
- Barker D, Maccari G, Georgiou X, Cooper M, Flicek P, Robinson J, Marsh SGE. The IPD-IMGT/HLA Database Nucleic Acids Research. (2023) 51 (D1): D948-D955
- Robinson J, Barker D, Marsh SGE. 25 years of the IPD-IMGT/HLA Database. HLA. (2024) 103 (6): e15549
Disclaimer
The code for this project is a work in progress and is absolutely not intended for clinical use.The code is provided as is and I take no responsibility for any errors or omissions in the code or the results produced by the code.
The code is provided for educational purposes only and should not be used for any other purpose.