Sequence variations, flanking region mutations, and allele frequency at 31 autosomal STRs in the central Indian population by next generation sequencing (NGS)

Dash, Hirak Ranjan; Kaitholia, Kamlesh; Kumawat, R. K.; Singh, Anil Kumar; Shrivastava, Pankaj; Chaubey, Gyaneshwer; Das, Surajit

doi:10.1038/s41598-021-02690-5

Download PDF

Article
Open access
Published: 01 December 2021

Sequence variations, flanking region mutations, and allele frequency at 31 autosomal STRs in the central Indian population by next generation sequencing (NGS)

Hirak Ranjan Dash¹,
Kamlesh Kaitholia¹,
R. K. Kumawat²,
Anil Kumar Singh¹,
Pankaj Shrivastava³,
Gyaneshwer Chaubey⁴ &
…
Surajit Das⁵

Scientific Reports volume 11, Article number: 23238 (2021) Cite this article

1937 Accesses
10 Citations
Metrics details

Subjects

Abstract

Capillary electrophoresis-based analysis does not reflect the exact allele number variation at the STR loci due to the non-availability of the data on sequence variation in the repeat region and the SNPs in flanking regions. Herein, this study reports the length-based and sequence-based allelic data of 138 central Indian individuals at 31 autosomal STR loci by NGS. The sequence data at each allele was compared to the reference hg19 sequence. The length-based allelic results were found in concordance with the CE-based results. 20 out of 31 autosomal STR loci showed an increase in the number of alleles by the presence of sequence variation and/or SNPs in the flanking regions. The highest gain in the heterozygosity and allele numbers was observed in D5S2800, D1S1656, D16S539, D5S818, and vWA. rs25768 (A/G) at D5S818 was found to be the most frequent SNP in the studied population. Allele no. 15 of D3S1358, allele no. 19 of D2S1338, and allele no. 22 of D12S391 showed 5 isoalleles each with the same size and with different intervening sequences. Length-based determination of the alleles showed Penta E to be the most useful marker in the central Indian population among 31 STRs studied; however, sequence-based analysis advocated D2S1338 to be the most useful marker in terms of various forensic parameters. Population genetics analysis showed a shared genetic ancestry of the studied population with other Indian populations. This first-ever study to the best of our knowledge on sequence-based STR analysis in the central Indian population is expected to prove the use of NGS in forensic case-work and in forensic DNA laboratories.

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Exome-wide analysis implicates rare protein-altering variants in human handedness

Article Open access 02 April 2024

Dick Schijven, Sourena Soheili-Nezhad, … Clyde Francks

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

The All of Us Research Program Genomics Investigators

Introduction

Exploration of the targeted STRs using capillary electrophoresis (CE) has been currently considered as the gold standard technology in the forensic DNA analysis. This technology employs the polymerase chain reaction (PCR) followed by CE for the detection of individual-specific length variations at the STR markers. Despite many advantages, CE technology is inexpedient in the analysis of multiple genetic polymorphisms (STRs/SNPs) in a single reaction, simultaneous generation of sequence information of STR alleles, loss of a generation of valuable genetic information from the degraded samples, and generation of low-resolution results in mtDNA and mixture analysis¹. Since the CE technology does not provide the information on the base-pair variations at the STR alleles, it underestimates the genetic diversity and variations present at that genetic locus. Besides, homoplasmy i.e., similar-sized DNA fragments with varied sequence compositions can be misinterpreted as homozygous due to the generation of a single peak in CE results².

In context to abovesaid drawbacks associated with CE, Next-generation sequencing (NGS) appears to be a suitable alternative technique. It provides information from numerous STRs and SNPs simultaneously. Sequencing of STR alleles provides in-depth genetic information in terms of internal sequence variation and mutations in the samples. NGS is also useful in mtDNA sequencing which is expedient in degraded samples due to the presence of thousands of copies of mtDNA per cell³. In addition to the use in routine forensic identification, the NGS technology promises many other applications of forensic relevance such as age estimation⁴, body fluid identification⁵, forensic genealogy⁶, DNA phenotyping⁷, detection of geographic origin and ancestry of an individual⁷. The use of NGS technology decreases the probability of false-positive matches in the DNA profiling due to high resolution in distinguishing between DNA mixtures⁸. Based on these merits, the technology has been considered as the future of forensic DNA analysis.

The major advantage of NGS over CE technology is that there exists no limitation in the number of STR markers to be multiplexed in a single reaction. Therefore, many new STR marker sets have been included in the commercially available sequencing kits besides the recommended 20 core CODIS STR loci. However, before their forensic application, these loci and their aptness at the population level should be understood utterly. The inclusion of more markers could increase the discrimination power of a multiplex system. However, a limited number of genetic markers can be accommodated in a single multiplex reaction due to the involvement of different dye sets and limited channels for detection. This could be overcome by NGS analysis where numerous genetic markers can be analyzed simultaneously.

Several attempts have been made to assess the sequence-based allele frequency data for the autosomal STR markers. Most of the studies such as for the US population⁹, Native Americans from West-Central Arizona¹⁰, Yavapai native Americans¹¹, White British and British Chinese populations¹² and Danish population¹³ have used ForenSeq DNA Signature Prep Kit on a MiSeq FGx instrument (Illumina, San Diego, CA). On the contrary, limited studies are available for sequence-based allele data using Precision ID Global Filer™ NGS STR panel (Thermo Scientific, US) for the Spanish population¹⁴ and Han population¹⁵. Indian population has not yet been explored for their sequence-based STR allelic data. Therefore, an attempt was made in the present study to analyze 31 autosomal STR markers simultaneously i.e., D12S391, D13S317, D8S1179, D21S11, D3S1358, D5S818, D1S1656, D2S1338, vWA, D2S441, D5S2800, D7S820, D16S539, D6S474, D12ATA63, D4S2408, D6S1043, D19S433, D14S1434, CSF1PO, D10S1248, D18S51, D1S1677, D22S1045, D2S1776, D3S4529, FGA, Penta D, Penta E, TH01 and TPOX in the central Indian population (Fig. 1). Madhya Pradesh is the second largest geographical state of India and the fifth largest in terms of population. Being located in the middle of India, Madhya Pradesh shares its boundary with five other states including Uttar Pradesh in North, Chhattisgarh in East, Maharashtra in south, and Gujarat and Rajasthan in West. For this region, the state experiences an admixed of populations to represent mini-India. Understanding the genetic diversity of central Indian population gives a representation of the genetic print pan-India. The study aimed to generate sequence-based allele frequency data, population-specific characteristics, sequence variations, and SNPs in the flanking regions for the forensic casework applications in the studied population.

Results and discussion

Sequencing performance of precision ID NGS STR panel v2

Quality control parameters such as Locus balance (LB), Heterozygous balance (HB) and Stutter ratio of the 31 autosomal STR markers have been mentioned in Fig. 2. Out of all the STR markers, D4S2408 showed the most perfect average LB value (0.992) whereas, D16S539 showed greatest deviation from the ideal LB value (1.0), with an average value of 1.925. Other markers which showed a greater deviation from the ideal LB value included D18S51 (0.394), D2S1338 (0.411), D3S1358 (1.513), FGA (0.371), Penta D (0.167), Penta E (0.371), TH01 (1.708), and TPOX (1.579). With an ideal value of 1.0, STR markers showed HB value in the range of 1.031 (D8S1179) and 1.722 (TH01). Out of 31 STR markers tested, relatively higher heterozygous imbalance was observed in the D12ATA63 (1.396), D19S433 (1.307), D1S1656 (1.376), D22S1045 (1.325), and TH01 (1.722). None of the markers showed a deviation for the threshold set for the stutter ratio i.e., 1.4. The occurrence of the stutter products was observed to be highest in the number for D1S1656 and null stutter product was observed for D3S4529. The average value of stutter ratio ranged from 0.104 (D16S539) to 0.127 (D6S474). As the use of NGS technology is still at its nascent stage in the forensic DNA applications, quality issues of some STR markers need to be addressed by the kit manufacturers prior to their efficient use in routine forensic casework.

Concordance study, allele frequency, forensic and paternity parameters

Out of 31 autosomal STR markers viz. CSF1PO, D10S1248, D12ATA63, D12S391, D13S317, D14S1434, D16S539, D18S51, D19S433, D1S1656, D1S1677, D21S11, D22S1045, D2S1338, D2S1776, D2S441, D3S1358, D3S4529, D4S2408, D5S2800, D5S818, D6S1043, D6S474, D7S820, D8S1179, FGA, Penta D, Penta E, TH01, TPOX and vWA analyzed in this study; 22 overlapped STRs were compared with the length-based allele data obtained by the CE analysis. For all the samples, the length-based allele data was found to be consistent irrespective of the CE analysis or NGS data. To the best of our knowledge, this is the first report wherein sequence-based analysis of the 31 STR markers has been carried out on studied markers in any Indian population. Besides, this is also the first allelic report on nine STR markers i.e., D12ATA63, D14S1434, D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S1043, and D6S474 in the Indian population. The calculated length-based allele frequency values are given in the Supplementary Table S1. Forensic and paternity parameters of the length-based and sequence-based alleles have been provided in Table 1. The average total allele number of all the genetic markers was calculated as 9.26 and the highest number of size-based alleles (18) was observed on marker Penta E, whereas, D1S1677, D4S2408, and D6S474 showed the lowest number of alleles i.e., 6 (Fig. 3). The newly analyzed markers i.e., D12ATA63, D14S1434, D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S1043, and D6S474 generated a total allele number of 8, 7, 6, 8, 7, 6, 8, 11, and 6 respectively. Besides, Penta E showed the highest power of discrimination (0.978), polymorphic information content (0.90), Expected Heterozygosity (0.905) value, and the lowest matching probability (0.022), whereas, FGA showed the highest value for Power of Exclusion (0.778), Typical Paternity index (4.60) and observed heterozygosity (0.891). These findings suggested the usefulness of Penta E and FGA marker in the central Indian population based on the length-based analysis of alleles. D2S441 showed its least usefulness in the terms of polymorphic information content (0.64), power of exclusion (0.329), typical paternity index (1.35), observed and expected heterozygosity (0.630 and 0.690). Similarly, the calculated power of discrimination (0.855) and matching probability (0.145) values did not advocate the usefulness of the D5S818 marker in the studied population. On the contrary, when sequence-based forensic and paternity parameters were calculated in 31 autosomal STR markers, D2S1338 emerged to be the most useful marker in the studied population with the highest values of power of discrimination (0.984), polymorphic information content (0.920), power of exclusion (0.822), and typical paternity index (5.75), and the lowest matching probability (0.016). This suggested that the individual markers should be assessed on the basis of sequence-based alleles to get a clear idea on their usefulness in a specific population.

Table 1 Calculated forensic and paternity parameters of the 31 autosomal STR based on length-based (LB) and sequence-based (SB) alleles in the central Indian population (n = 138).

Full size table

The previous studies also suggested the utility of the Penta E marker with higher forensic and paternity parameters in the Indian population^16,17,18. This marker has already been established with high forensic efficiency for its effective use in the personal identification in the Portuguese population¹⁹, Austrian Caucasian population²⁰, Northern Italy population²¹ and Mexican population²². When the newly inducted STR markers i.e., D12ATA63, D14S1434, D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S1043, and D6S474 were analyzed, they showed a similar allelic range and other statistical parameters in the limited published literature from Inner Mongolia, China²³, Tujia population²⁴.

Out of 81 male samples, four samples were found to be of AMELY deletion cases; where, AMELY could not be amplified, but a positive amplification was present in three alternative sex-determining markers i.e., DYS391, SRY, and Y InDel. This result was found to be consistent with the corresponding CE data. Allele no. 10 was found to be present dominantly in 63 samples followed by allele 11 (16 samples) and allele 9 (2 samples). Similarly, Y InDel showed allele 2 in 74 samples and allele 1 in only 7 male samples. AMELY deletion is a global problem²⁵ and simultaneous amplification of the alternative sex-determining markers^26,27 is highly useful in assigning the sex of a sample appropriately as evidenced in four samples of the current study.

Increment in allele number by sequencing

A huge increase in the sequence-based allele number was detected in the studied STRs in comparison to the length-based allele numbers (Fig. 3). It has been previously studied that the presence of SNPs in STR flanking regions and allele sequence variation with similar length, majorly contribute to such increment in the allele numbers²⁸. Substantial gain in allele numbers has been detected at D13S317, D16S539, D1S1656, D5S2800, D5S818, D7S820, and vWA with D5S2800 showing a significant increase in allele numbers due to the variation in flanking region and D3S1358 showed the highest allele gain due to the differing repeat sequence conditions. On the contrary, the genetic markers which showed no gain in allele numbers either by SNPs in flanking regions or sequence length variation included CSF1PO, D18S51, D19S433, D1S1677, D22S1045, D22S1045, D3S4529, D6S1043, FGA, Penta D, Penta E, and TPOX. Besides, the markers which showed an increment in allele number only due to SNPs in flanking regions were D10S1248, D13S317, D14S1434, and D7S820. The increased allele number in D12ATA63, D12S391, D21S11, D2S1338, D3S1358, D4S2408, D8S1179, and TH01, was due to the variation in the repeat sequences only.

Short nucleotide polymorphism (SNPs) associated with the flanking region of STRs has widely been reported throughout the globe^13,29,30. The SNP-STR links SNPs with the STR polymorphism which allows the generation of an STR allele subtype, based on the observed SNP allele in the flanking region. Although many other marker combinations such as deletion-insertion polymorphisms amplified with STRs (DIP-STR) are used widely, a recent study advocated the use of SNP-STRs for forensic application, where an imbalanced DNA mixture is expected³¹. In this regard, the current study depicted the existence of many SNPs in the flanking region of STRs in the studied population (Table 2). rs25768 showed the highest occurrence in the central Indian population associated at upstream of D5S818 marker, whereas, rs73250432, rs369257353, and rs561924992 located at upstream of D13S317, downstream of D5S818, and downstream of D16S539 respectively showed their least occurrence.

Table 2 Observed SNPs and their characteristics in flanking regions of STRs in the current study.

Full size table

Detection of alleles with identical size but different internal sequence variation has been acknowledged as one of the advantages of using NGS for studying STRs^32,33. The marker-wise isoalleles observed in the central Indian population have been reported in the Table S2. Out of 31 autosomal STR markers analyzed in this study, the isometric heterozygous pattern was observed at only 16 loci i.e., D3S1358, D21S11, vWA, D5S2800, D6S474, D2S441, D12ATA63, D2S1338, D1S1656, D16S539, D8S1179, D12S391, D2S1776, TH01, D5S818, and D4S2408. Allele no. 15 of D3S1358, allele no. 19 of D2S1338, and allele no. 22 of D12S391 showed a maximum number of isoalleles with the same size and different intervening sequences (Fig. 4).

A previous report has suggested a correlation between the allele number and various paternity and forensic parameters of an STR marker such as total possible genotypes, Power of discrimination, Matching probability, Polymorphic information content, power of exclusion, total paternity index, and gene diversity¹⁸. Keeping this in view, a substantial increase in sequence-based allele numbers in the STRs as observed in the present study increased their evidentiary value. With the increase in the allele number, the potential forensic and paternity applications of the STR markers are substantially increased. An increase in the allele number has further been correlated with the increase in heterozygosity of an STR marker which also increased its informativeness⁹.

Population genetics

When the observed size-based allelic data were compared at 15 consistent STR markers of the different populations and a neighbor-joining tree was constructed (Fig. 5a), the dendogram showed two distinct branches of the population clusters. One cluster included the population of Tibet, Nepal, China Han population from Yunnan Province, Southwest China, northeastern Thai people of Thailand, Hainan Li population from China, Kathmanduand Newar population, Nepal. The studied Central Indian population showed a close affinity with the population of Rajasthan, India, and the population of Odisha, India. Further, a consistent result was obtained in PCA plot based on the component 1 and component 2 (Fig. 5b), where, clustering of populations from Madhya Pradesh (Gond), Jharkhand, Uttar Pradesh, Tamilnadu, Rajasthan, Himachal Pradesh and Odisha states was observed. Therefore, the genetic sharing largely mimiced the geographical clustering. The heat map drawn using Nei’s Da distance matrix has been shown in Fig. 6. The overall result of the heat map was found in concordance with the outcomes of the NJ and PCA plot for the interpopulation comparison.

Conclusions

This first report to the best of our knowledge of sequence-based allelic data on the Central Indian population holds prominent usefulness in the forensic case works. Data obtained in this study further emphasized the implementation of NGS-based studies of STRs for forensic application. The size-based alleles showed concordance between the CE analysis as well as the NGS data. Some STR markers demonstrated a substantial variation in the repeat motifs as well as SNPs in the STR flanking regions in this study. A significant increase in the allele number further increased the statistical values of the studied forensic and paternity parameters of the STRs, thus, increasing their usefulness in the forensic applications. As per the recommendations of the ISFG, it is utmost importance to enrich the allelic data of the sequence-based STR genotypes. An increase in the allele number as evidenced in the present study also suggested the population-specific and sequence-based studies of the STR markers. In this context, the present study would be useful for providing the pioneer sequence-based data on the central Indian population.

Materials and methods

Sample collection and ethical statement

The current study and the experimental protocols were approved by the Ethics Committee of Banaras Hindu University, Varanasi, India (Ref. No. I.Sc./ECM-XII/2018–19/06). All the experimental procedures were carried out in accordance with the relevant guidelines and regulations laid by the ethical committee. Before the collection of the blood samples, written informed consent was obtained from each sample donor. Peripheral blood samples of 138 unrelated adult individuals consisting of 81 males and 57 females were collected in K₂EDTA vacutainers and were stored at 4ºC till further use. Such samples were considered from the routine forensic cases at DNA Fingerprinting Unit, Forensic Science Laboratory, Bhopal, Madhya Pradesh, India and included in this study. The study was conducted following all the required quality control measures at Forensic Science Laboratory, Bhopal, M.P., India.

DNA extraction and quantification

Genomic DNA was extracted using PrepFiler Express™ Forensic DNA Extraction Kit (Thermo Scientific, US) following the manufacturer’s guidelines. The extracted DNA was quantified using Quantifiler® Trio DNA Quantification Kit (Thermo Scientific, US) and QuantStudio™ 3 Real-Time PCR System (Thermo Scientific, US) according to the manufacturer’s instructions. Further, the concentration of DNA samples wwas adjusted to 1.0 ng/µl using TE buffer and stored at − 20 °C until further use. The authors have passed the Academia Iberoamericana de Criminalística y EstudiosForenses (AICEF) DNA Proficiency test of the de BIOLOGIA y QUÍMICA FORENSE (GITAD), Spain (http://gitad.ugr.es/principal.htm).

Library preparation and quantitation

Genomic DNA isolated from the sample was converted to a sequencing library by targeted amplification of the regions of interest by using Precision ID DL8 kit and Precision ID GlobalFiler™ NGS STR panel v2 (Thermo Scientific, US) following manufacturer’s protocol on HID Ion Chef™ System (Thermo Scientific, US). Before that, each DNA sample was normalized to 1 ng in 15 µl volume followed by transfer of 15 µl normalized DNA into one of eight wells (A1-H1 position) of the IonCode™ Barcode Adapters plate. Subsequently, the plate with loaded DNA and other consumables was loaded at the designated places in the HID Ion Chef™ System to start the process of library preparation. The library preparation was carried out similarly for other 18 runs. Each library contained eight samples except the 18^th library which had only 2 samples. Once the library preparation was completed, they were stored at -20° till further use. Further, the pooled libraries were quantified on the QuantStudio 5 Real-Time PCR system using Ion Library TaqMan® Quantitation kit (Thermo Scientific, US) following the manufacturer’s recommendations.

Template preparation, sequencing, and data analysis

Libraries that were prepared by automation were clonally amplified on the Ion Chef System by emulsion PCR of library molecules captured on the beads. The pooled libraries were diluted to 50 pM and mixed according to the group of barcode adaptors to accommodate 32 samples. 25 µl of each diluted library pool was loaded onto the Position A and Position B of the Ion S5™ Precision ID Chef Reagents along with other recommended plastic wares and reagents at the designated places onto the Ion Chef™ system. The Ion Chef System automated all template preparation steps, including creating the emulsion mixture, performing the PCR, carrying out the post-PCR purifications, and finally loading the purified templated beads onto the two Ion 530 chips accordingly using the manufacturer’s guidelines.

Sequencing

A sequencing run on the Ion S5 systems was initiated by loading a reagent cartridge, buffer, cleaning solution, and waste container as per the Ion S5™ Precision ID Sequencing Kit protocol of the manufacturer. The Ion S5 chip was then loaded and the run started using 200 bp chemistry with 650 flow according to the human identification GlobalFiler™ NGS STR sequencing format.

The raw data was extracted from the S5 Torrent Server v5.10.0 (Thermo Fisher Scientific) and were input into the Converge™ software v2.1 (Thermo Fisher Scientific) for sequence analysis with Homo sapiens hg19 genome. The HID Genotyper plugin v2.1 (Thermo Fisher Scientific) was applied to the analysis procedure at the default thresholds, in which the relative analytical and stochastic thresholds were both 0.05 and the stutter ratio was set as 0.14. Further sequencing performance of Precision ID NGS STR panel v2 was assessed by analyzing locus balance (LB), heterozygous balance (HB), and stutter ratio of the obtained sequences following Avila et al.³⁴ and Brookes et al.³⁵.

Concordance analysis with capillary electrophoresis (CE)

All the 138 samples were studied to assess the concordance between CE-STR data and NGS-STR data. All these samples were analyzed using the PowerPlex Fusion 6C System (Promega, USA) following the manufacturer’s guidelines. 0.5–1.0 ng of genomic DNA was used to amplify the samples on Veriti 96 well Thermal Cycler (Thermo Scientific, USA). Capillary electrophoresis of the amplified DNA fragments was performed using a 3500xL Genetic Analyzer (Thermo Scientific, USA). The generated STR fragments were analyzed using GeneMapper ID-X v.1.5 software maintaining a threshold of 200 RFU for all the dye sets. The CE based allelic values were compared with the sequencing-based allelic values at 23 consistent loci between Fusion 6C System and GlobalFiler NGS STR panel i.e., CSF1PO, D10S1248, D12S391, D13S317, D16S539, D18S51, D19S433, D1S1656, D21S11, D22S1045, D2S1338, D2S441, D3S1358, D5S818, D7S820, D8S1179, DYS391, FGA, Penta D, Penta E, TH01, TPOX, vWA, and sex determining marker Amelogenin.

Statistical analysis

Obtained sequence and allele data were evaluated for the presence of isometric heterozygous alleles and the presence/absence of SNPs in the flanking regions. Besides, various forensic and paternity parameters such as Allele frequency, Power of Discrimination (PD), Polymorphism information content (PIC), Power of exclusion (PE), Typical paternity index (PI), Observed (Ho), Matching Probability (Pm) were calculated using GenAlEx 6.5 software³⁶, Arlequin v3.5 software³⁷ and AMOVA for both length-based and sequence-based alleles. The observed size-based allele frequencies of the 15 consistent genetic markers were compared with the data obtained in the previously published literature by using the Fst pairwise distance.

The compared allele frequency data of the published populations included Balmiki population, Punjab, India³⁸, Konkanastha Brahmin population, Maharashtra, India³⁸, Naikpod Gond, Andhra Pradesh, India³⁹, Gond, Madhya Pradesh, India⁴⁰, Population of Jharkhand, India⁴¹, Populations of Uttar Pradesh, India⁴², population of Himachal Pradesh, India⁴³, Tamil population, Tamil Nadu, India⁴⁴, Tibetan population, Nepal⁴⁵, population of Newar, Nepal⁴⁶, population of Rajasthan, India⁴⁷, population of Odisha, India⁴⁸, Nepalese population⁴⁹_, Chinese Han population from Yunnan Province, Southwest China⁵⁰, northeastern Thai people of Thailand⁵¹ and Hainan Li population from China⁵².

References

Yang, Y., Xie, B. & Yan, J. Application of next generation sequencing technology in forensic science. Genom. Proteom. Bioinform. 12, 190–197. https://doi.org/10.1016/j.gpb.2014.09.001 (2014).
Article Google Scholar
de Knijff, P. From next generation sequencing to now generation sequencing in forensics. Forensic Sci. Int. Genet. 38, 175–180. https://doi.org/10.1016/j.fsigen.2018.10.017 (2019).
Article CAS PubMed Google Scholar
Butler, J. M. The future of forensic DNA analysis. Philos. Trans. R. Soc. B 370, 20140252. https://doi.org/10.1098/rstb.2014.0252 (2015).
Article CAS Google Scholar
Vidaki, A. et al. DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing. Forensic Sci. Int. Genet. 28, 225–236. https://doi.org/10.1016/j.fsigen.2017.02.009 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dørum, G. et al. Predicting the origin of stains from next generation sequencing mRNA data. Forensic Sci. Int. Genet. 37, 37–48. https://doi.org/10.1016/j.fsigen.2018.01.001 (2018).
Article CAS Google Scholar
Erlich, Y., Shor, T., Pe’er, I. & Carmi, S. Identity inference of genomic data using long-range familial searches. Science 362, 690–694. https://doi.org/10.1126/science.aau4832 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Schneider, P. M., Prainsack, B. & Kayser, M. The use of forensic DNA phenotyping in predicting appearance and biogeographic ancestry. Dtsch. Arztebl. Int. 116, 873–880. https://doi.org/10.3238/arztebl.2019.0873 (2019).
Article Google Scholar
Bruijns, B., Tiggelaar, R. & Gardeniers, H. Massively parallel sequencing techniques for forensics: A review. Electrophoresis 39, 2642–2654. https://doi.org/10.1002/elps.201800082 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gettings, K. B. et al. population data for 27 autosomal STR loci. Forensic Sci. Int. Genet. 37, 106–115. https://doi.org/10.1016/j.fsigen.2018.07.013 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wendt, F. R. et al. Genetic analysis of the Yavapai Native Americans from West-Central Arizona using the Illumina MiSeq FGx™ forensic genomics system. Forensic Sci. Int. Genet. 24, 18–23. https://doi.org/10.1016/j.fsigen.2016.05.008 (2016).
Article CAS PubMed Google Scholar
Wendt, F. R. et al. Flanking region variation of ForenSeq™ DNA Signature Prep Kit STR and SNP loci in Yavapai Native Americans. Forensic Sci. Int. Genet. 28, 146–154. https://doi.org/10.1016/j.fsigen.2017.02.014 (2017).
Article CAS PubMed Google Scholar
Devesse, L. et al. Concordance of the ForenSeq™ system and characterisation of sequence-specific autosomal STR alleles across two major population groups. Forensic Sci. Int. Genet. 34, 57–61. https://doi.org/10.1016/j.fsigen.2017.10.012 (2018).
Article CAS PubMed Google Scholar
Hussing, C. et al. Sequencing of 231 forensic genetic markers using the MiSeq FGx™ forensic genomics system—An evaluation of the assay and software. Forensic Sci. Res. 3, 111–123. https://doi.org/10.1080/20961790.2018.1446672 (2018).
Article PubMed PubMed Central Google Scholar
Barrio, P. A. et al. Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power. Forensic Sci. Int. Genet. 42, 49–55. https://doi.org/10.1016/j.fsigen.2019.06.009 (2019).
Article CAS PubMed Google Scholar
Wang, Z. et al. Massively parallel sequencing of 32 forensic markers using the Precision ID GlobalFiler™ NGS STR Panel and the Ion PGM™ System. Forensic Sci. Int. Genet. 31, 126–134. https://doi.org/10.1016/j.fsigen.2017.09.004 (2017).
Article CAS PubMed Google Scholar
Dixit, S. et al. Forensic genetic analysis of population of Madhya Pradesh with PowerPlex Fusion 6C™ multiplex system. Int. J. Leg. Med. 133, 803–805. https://doi.org/10.1007/s00414-019-02017-0 (2019).
Article Google Scholar
Dash, H. R., Shrivastava, P. & Das, S. Expediency of tetra- and pentanucleotide repeat autosomal STR markers for DNA typing in central Indian population. Proc. Natl. Acad. Sci., India, Sect. B Biol. Sci. 90, 819–824. https://doi.org/10.1007/s40011-019-01156-z (2020).
Article CAS Google Scholar
Dash, H. R., Rawat, N., Vajpayee, K., Shrivastava, P. & Das, P. Useful autosomal STR marker sets for forensic and paternity applications in the central Indian population. Ann. Hum. Biol. 48, 37–48. https://doi.org/10.1080/03014460.2021.1877353 (2021).
Article PubMed Google Scholar
Abrantes, D. et al. Analysis of Penta D and Penta E STR loci in a Northern Portuguese population. Int. Cong. Ser. 1239, 223–223. https://doi.org/10.1016/S0531-5131(02)00344-8 (2003).
Article CAS Google Scholar
Steinlechner, M., Grubwieser, P., Scheithauer, R. & Parson, W. STR loci Penta D and Penta E: Austrian Caucasian population data. Int. J. Leg. Med. 116, 174–175. https://doi.org/10.1007/s004140100231 (2002).
Article CAS Google Scholar
Turrina, S., Ferrian, M., Caratti, S. & Leo, D. D. Evaluation of genetic parameters of 22 autosomal STR loci (PowerPlex® Fusion System) in a population sample from Northern Italy. Int. J. Leg. Med. 128, 281–283. https://doi.org/10.1007/s00414-013-0934-4 (2014).
Article Google Scholar
Gonzalez-Herrera, L. et al. Forensic parameters and genetic variation of 15 autosomal STR loci in Mexican Mestizo populations from the States of Yucatan and Nayarit. Open Forensic Sci. J. 3, 57–63. https://doi.org/10.2174/1874402801003010057 (2010).
Article CAS Google Scholar
Wang, H. et al. Allelic frequency distributions of 21 non-combined DNA index system STR loci in a Russian ethnic minority group from Inner Mongolia, China. J. Zhejiang Univ. Sci. B. 14, 533–540. https://doi.org/10.1631/jzus.B1200262 (2013).
Article PubMed PubMed Central Google Scholar
Zhang, L., Yang, F., Bai, X., Yao, Y. & Li, J. Genetic polymorphism analysis of 23 STR loci in the Tujia population from Chongqing, Southwest China. Int. J. Leg. Med. 135, 761–763. https://doi.org/10.1007/s00414-020-02287-z (2020).
Article Google Scholar
Mitchell, R. J., Kreskas, M., Baxter, E., Buffalino, L. & Van Oorschot, R. A. H. An investigation of sequence deletions of amelogenin (AMELY), a Y-chromosome locus commonly used for gender determination. Ann. Hum. Biol. 33, 227–240. https://doi.org/10.1080/03014460600594620 (2006).
Article CAS PubMed Google Scholar
Masuyama, K., Shojo, H., Nakanishi, H., Inokuchi, S. & Adachi, N. Sex determination from fragmented and degenerated DNA by amplified product-length polymorphism bidirectional SNP analysis of amelogenin and SRY genes. PLoS ONE 12, e0169348. https://doi.org/10.1371/journal.pone.0169348 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dash, H. R., Rawat, N. & Das, S. Alternatives to amelogenin markers for sex determination in humans and their forensic relevance. Mol. Biol. Rep. 47, 2347–2360. https://doi.org/10.1007/s11033-020-05268-y (2020).
Article CAS PubMed Google Scholar
Peng, D. et al. Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing. Sci. Rep. https://doi.org/10.1038/s41598-020-69137-1 (2020).
Article PubMed PubMed Central Google Scholar
Wang, L. et al. SNP–STR polymorphism: A sensitive compound marker for forensic genetic applications. Forensic Sci. Int. Genet. Suppl. Ser. 4, e206–e207. https://doi.org/10.1016/j.fsigss.2013.10.106 (2013).
Article Google Scholar
Gettings, K. B., Aponte, R. A., Kiesler, K. M. & Vallone, P. M. The next dimension in STR sequencing: Polymorphisms in flanking regions and their allelic associations. Forensic Sci. Int. Suppl. Ser. 5, e121–e123. https://doi.org/10.1016/j.fsigss.2015.09.049 (2015).
Article Google Scholar
Wei, T. et al. A novel multiplex assay of SNP-STR markers for forensic purpose. PLoS ONE 13, e0200700. https://doi.org/10.1371/journal.pone.0200700 (2018).
Article PubMed PubMed Central Google Scholar
Alonso, A. et al. Current state-of-art of STR sequencing in forensic genetics. Electrophoresis 39, 2655–2668. https://doi.org/10.1002/elps.201800030 (2018).
Article CAS PubMed Google Scholar
Müller, P. et al. Inter-laboratory study on standardized MPS libraries: Evaluation of performance, concordance, and sensitivity using mixtures and degraded DNA. Int. J. Leg. Med. 134, 185–198. https://doi.org/10.1007/s00414-019-02201-2 (2020).
Article Google Scholar
Avila, E., Felkl, A. B., Graebin, P., Nunes, C. P. & Alho, C. S. Forensic characterization of Brazilian regional populations through massive parallel sequencing of 124 SNPs included in HID ion Ampliseq identity panel. Forensic Sci. Int. Genet. 40, 74–84. https://doi.org/10.1016/j.fsigen.2019.02.012 (2019).
Article CAS PubMed Google Scholar
Fan, H. et al. The forensic landscape and the population genetic analyses of Hainan Li based on massively parallel sequencing DNA profiling. Int. J. Leg. Med. https://doi.org/10.1007/s00414-021-02590-3 (2021).
Article Google Scholar
Peakall, R. O. D. & Smouse, P. E. GENALEX 6: Genetic analysis in Excel. Population genetic software for teaching and research. Mol. Ecol. Resour. 6, 288–295. https://doi.org/10.1111/j.1471-8286.2005.01155.x (2006).
Article Google Scholar
Excoffier, L., Laval, G. & Schneider, S. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol. Bioinform. 1, 47–50 (2005).
Article CAS Google Scholar
Ghosh, T. et al. Genetic diversity of autosomal STRs in eleven populations of India. Forensic Sci. Int. Genet. 5, 259–261. https://doi.org/10.1016/j.fsigen.2010.01.005 (2011).
Article CAS PubMed Google Scholar
Bindu, G. H., Trivedi, R. & Kashyap, V. K. Genotypic polymorphisms at fifteen tetranucleotides and two pentanucleotide repeat loci in four tribal populations of Andhra Pradesh, southern India. J. Forensic Sci. 50, 978–983 (2005).
Article CAS Google Scholar
Shrivastava, P., Jain, T. & Trivedi, V. B. Structure and genetic relationship of five populations from Central India based on 15 autosomal STR loci. Ann. Hum. Biol. 44, 74–86. https://doi.org/10.3109/03014460.2016.1151932 (2017).
Article PubMed Google Scholar
Imam, J., Reyaz, R., Singh, R. S., Bapuly, A. K. & Shrivastava, P. Genomic portrait of population of Jharkhand, India, drawn with 15 autosomal STRs and 17 Y-STRs. Int. J. Leg. Med. 132, 139–140. https://doi.org/10.1007/s00414-017-1610-x (2018).
Article Google Scholar
Srivastava, A. et al. Genetic data for PowerPlex 21^TM autosomal and PowerPlex 23 Y-STR^TM loci from population of the state of Uttar Pradesh, India. Int. J. Leg. Med. 133, 1381–1383. https://doi.org/10.1007/s00414-018-01993-z (2019).
Article Google Scholar
Mohapatra, B. K. et al. A genomic exploration of 15 autosomal STR loci for establishment of a DNA profile database of the population of Himachal Pradesh. Leg. Med. 46, 101719. https://doi.org/10.1016/j.legalmed.2020.101719 (2020).
Article CAS Google Scholar
Balamurugan, K. et al. Genetic variation of 15 autosomal microsatellite loci in a Tamil population from Tamil Nadu, Southern India. Leg. Med. 12, 320–323. https://doi.org/10.1016/j.legalmed.2010.07.004 (2010).
Article CAS Google Scholar
Kido, A. et al. STR data for 15 AmpFLSTR identifiler loci in a Tibetan population (Nepal). Int. Congr. Ser. 1288, 349–351. https://doi.org/10.1016/j.ics.2005.08.037 (2006).
Article Google Scholar
Gayden, T. et al. Genetic insights into the origins of Tibeto-Burman populations in the Himalayas. J. Hum. Genet. 54, 216–223. https://doi.org/10.1038/jhg.2009.14 (2009).
Article CAS PubMed Google Scholar
Kumawat, R. K., Shrivastava, P., Shrivastava, D., Mathur, G. K. & Dixit, S. Genomic blueprint of population of Rajasthan based on autosomal STR markers. Ann. Hum. Biol. 47, 70–75. https://doi.org/10.1080/03014460.2019.1705390 (2020).
Article CAS PubMed Google Scholar
Sahoo, S. et al. Genomic portrait of Odisha, India drawn by using 21 autosomal STR markers. Int. J. Leg. Med. 134, 1671–1673. https://doi.org/10.1007/s00414-020-02281-5 (2020).
Article Google Scholar
Kraaijenbrink, T., van Driem, G. L., Opgenort, J. R. M. L., Tuladhar, N. M. & de Knijff, P. Allele frequency distribution for 21 autosomal STR loci in Nepal. Forensic Sci. Int. 168, 227–231. https://doi.org/10.1016/j.forsciint.2006.02.014 (2007).
Article CAS PubMed Google Scholar
Zhang, X. et al. Population data and mutation rates of 20 autosomal STR loci in a Chinese Han population from Yunnan Province, Southwest China. Int. J. Leg. Med. 132, 1083–1085. https://doi.org/10.1007/s00414-017-1675-6 (2018).
Article Google Scholar
Muisuk, K., Srithawong, S. & Kutanan, W. Allelic frequencies of fifteen autosomal STRs in the northeastern Thai people. Int. J. Leg. Med. 134, 1331–1332. https://doi.org/10.1007/s00414-019-02229-4 (2020).
Article Google Scholar
Huang, Y. et al. Population genetic data for 17 autosomal STR markers in the Hani population from China. Int. J. Leg. Med. 129, 995–996. https://doi.org/10.1007/s00414-015-1176-4 (2015).
Article Google Scholar

Download references

Acknowledgements

The authors are highly acknowledged to Director, State Forensic Science Laboratory, Sagar, M. P., India, and Joint Director, Regional Forensic Science Laboratory, Bhopal, M. P., India for providing infrastructure to carry out the research work. Our sincere thanks to Dr. Atima Agrawal, Dr. Neeraj Chauhan, Dr. Sanjib Dey, and the entire technical team of Thermo Scientific for their constant technical support during the research work.

Author information

Authors and Affiliations

DNA Fingerprinting Unit, Integrated High-Tech Complex, Forensic Science Laboratory, Bhopal, Madhya Pradesh, 462003, India
Hirak Ranjan Dash, Kamlesh Kaitholia & Anil Kumar Singh
DNA Division, State Forensic Science Laboratory, Jaipur, Rajasthan, 302016, India
R. K. Kumawat
DNA Fingerprinting Unit, State Forensic Science Laboratory, Sagar, Madhya Pradesh, 769001, India
Pankaj Shrivastava
Cytogenetics Laboratory, Department of Zoology, Banaras Hindu University, Varanasi, 221005, India
Gyaneshwer Chaubey
Department of Life Science, National Institute of Technology, Rourkela, Odisha, 470001, India
Surajit Das

Authors

Hirak Ranjan Dash
View author publications
You can also search for this author in PubMed Google Scholar
Kamlesh Kaitholia
View author publications
You can also search for this author in PubMed Google Scholar
R. K. Kumawat
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Pankaj Shrivastava
View author publications
You can also search for this author in PubMed Google Scholar
Gyaneshwer Chaubey
View author publications
You can also search for this author in PubMed Google Scholar
Surajit Das
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.R.D. conceived and designed the analysis. H.R.D. and K.K. performed the experiments and collected the data. H.R.D. and R.K.K. performed data analysis. H.R.D., A.K.S., P.S., G.C. and S.D. prepared the manuscript.

Corresponding author

Correspondence to Hirak Ranjan Dash.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Tables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dash, H.R., Kaitholia, K., Kumawat, R.K. et al. Sequence variations, flanking region mutations, and allele frequency at 31 autosomal STRs in the central Indian population by next generation sequencing (NGS). Sci Rep 11, 23238 (2021). https://doi.org/10.1038/s41598-021-02690-5

Download citation

Received: 09 March 2021
Accepted: 18 November 2021
Published: 01 December 2021
DOI: https://doi.org/10.1038/s41598-021-02690-5

This article is cited by

CRISPR-CasB technology in forensic DNA analysis: challenges and solutions
- Hirak Ranjan Dash
- Mansi Arora
Applied Microbiology and Biotechnology (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.