A Study of the Structure of the UAE Population Using Ancestry SNPs

Sallam, Mariam Ahmed kasim (2019) A Study of the Structure of the UAE Population Using Ancestry SNPs. Doctoral thesis, University of Central Lancashire.

Single Nucleotide Polymorphisms (SNPs) have become a gold standard this is due to their wide spread across the human genome, small amplicon sizes, biallelic and significant differences in their allele frequencies among various ethnic groups. The combination of these markers into a large multiplex have further increased their power of discrimination, thereby aiding the possibility of individual classification into a specific population and identification based on ethnicity. Next Generation Sequencing (NGS) technologies has been accepted to type these SNP markers. Many markers could be incorporated into a single multiplex kit for forensic analyses. One of the commercially available kits for ancestry inference is the HID Ion AmpliSeq Ancestry Panel (Thermo Fisher Scientific, Waltham, MA, USA). This comprises a total of 165 autosomal SNPs, including 55 markers from Kidd’s panel and 128 markers from Seldin’s set with some of the markers overlapping. In this study, the Ancestry Informative Single Nucleotide Polymorphisms (AISNPs) using Next Generation Sequencing (NGS) technologies was assessed to provide biogeographic information for three Middle Eastern populations (UAE, Oman, Yemen,) and two Indian subcontinent populations (India, and Pakistan) for forensic purposes. A population genetics study was also conducted. 414 buccal swabs samples were collected from healthy, unrelated volunteers who gave an informed consent. These included samples from UAE (n=224), Omani (n=74), Yemeni (n=56), Indian (n=29), and Pakistani (n=30) samples. The samples were assessed using the 165 Ancestry Informative Markers (AIMs) in the HID Ion AmpliSeq™ Ancestry Panel and sequenced on the Ion PGM™ and Ion S5™ platforms. Sequencing on both platforms generated high coverage and read length with over 99% aligned accuracy. The primary sequencing analysis of DAT files was performed on the Torrent Suite™ Software Version 4.4.3 and Version 5.2.1 (Thermo Fisher Scientific. Biogeographic ancestry assignments for the five populations indicated that most samples were designated to South-West Asian origin. All Indian samples were reported to have Asian ancestry. Arlequin software was used to perform population genetic analyses. A few significant departures from HWE (p<0.05) were observed in some samples. However, upon Bonferonni correction, all markers were in HWE except rs4984913, rs2702414 & rs6464211 SNPs It was also confirmed that the Middle Eastern populations are genetically close, and it is difficult to distinguish them using the 165 AISNPs though the Indian subcontinent populations could be clearly separated from the Middle Eastern populations. However, the current population databases included in the Thermo Fisher and Illumina software’s need to be enhanced with appropriate databases from world populations. This study has contributed to these efforts and has successfully developed and analysed three Middle Eastern and two Indian sub-continent population databases for 165 AISNPs for the first time.

