Ancestry Prediction Tool #2

rnmitchell · 2025-01-13T18:50:35Z

Add ancestry prediction using PCA to MixDeR. Reference data from 1000 Genomes is used in PCA.

The method first performs an initial deconvolution using the same method as established in the mixture deconvolution step utilizing the 1000 Genomes Global population allele frequency data. The 54 ancestry SNPs are then extracted from each single source inferred SNP profile, added to the 1000 Genomes reference dataset and PCA is performed for each contributor. The PCA plots are created and saved to the specified output directory. The user can then examine and determine if the contributor's ancestry can be determined. The user can then choose to use the predicted super population allele frequency data in the mixture deconvolution step (which will also be provided in MixDeR) or use the global allele frequency data.

Update Shiny app to allow user to select ancestry prediction step and specify settings for the initial mixture deconvolution and subsequent inferred allele filtering.
Run mixture deconvolution utilizing the settings specified by the user, the global 1000 Genomes allele frequency data, apply the allele 1 & 2 probability thresholds and create the final inferred genotypes for each contributor.
For each contributor separately: extract the ancestry SNPs, merge with the known 1000 Genomes data, run PCA and create the PC1 vs. PC2 plot.

rnmitchell · 2025-10-03T19:48:38Z

This is ready for review @standage. Let's talk next week about it!

standage · 2025-10-15T15:26:12Z

R/centroids.R

+centroids = function(groups, pca, inpath, ID) {
+  dir.create(file.path(inpath, "Centroids_Plots"), showWarnings = FALSE, recursive=TRUE)
+
+  ancestry_colors = read.table("/Users/rebecca.mitchell/Desktop/ancestry_colors.txt", header=T, sep="\t") %>%


To fix, if not already: hard-coded path

standage · 2025-10-15T15:40:45Z

R/ancestry_prediction.R

+  ncols=ncol(geno)
+  geno_filt=geno[,c(7:ncols)]
+  snps = data.frame("snp_id"=colnames(geno_filt))
+  snps = snps %>%


Code autoformatting could give a more consistent style in these files. Something to consider.

standage · 2025-10-15T15:41:11Z

R/centroids.R

+
+  ancestry_colors = read.table("/Users/rebecca.mitchell/Desktop/ancestry_colors.txt", header=T, sep="\t") %>%
+    add_row(id = "Unk", reg = "Unk", population = "Unk", color="red", superpop_color="red") %>%
+    add_row(id= "Centroid", reg = "Centroid", population = "Centroid", color = "black", superpop_color="black")


Showing up as white for some reason?

rnmitchell added 30 commits January 13, 2025 13:40

Add ancestry prediction options/text to shiny app interface

9a348c2

Building in 1000 G genotype data for ancestry SNPs

8caf6bf

began adding ancestry prediction to run_workflow script

57c1664

began integrating ancestry prediction

5dee69a

update run_workflow

6aa6cf1

update .gitignore

db3847c

ancestry prediction running correctly for unconditioned analyses

15ebe59

merge master

8f0e653

ancestry prediction for conditioned analyses

5508300

fix PCA plot title, add ancestry prediction step to config file settings

2e0bca8

update test

5d16bfd

testing ancestry pred with all snps

f48c1bc

3D PCA plots

7fdda0a

merge main

355debc

merge main

2f1e3e7

updated with 3D plotting

8be5bf6

merge main

418a8c4

option to use either ancestry SNPs or all SNPs for PCA

3ed7429

updated shiny app for multiple features with PCA plots

53d088f

including necessary data

a4ac85b

data.R updated with included data in package

a1d6246

updated description/news with new version #

625332e

centroid analysis

705b0ec

fixed bug with loading AF

9c38c51

added superpopulation AF datasets

2d170d9

added line breaks to pop up messages

41c2ca0

removed unnecessary data; cleaned up scripts

f367f6e

begin adding tests for ancestry

a845c4e

added tests

4f093b8

updated config

a8e1c58

rnmitchell added 8 commits July 18, 2025 11:20

updated with test

0ff2530

readthedocs

12998e7

updated readme

b9cbe33

remove readthedocs

3e61cf1

merge main

e953da1

updated docs

d2fc98c

removed hard coded path

d948b6f

update scripts to pass checks

b95d031

rnmitchell marked this pull request as ready for review October 3, 2025 19:48

rnmitchell requested a review from standage October 3, 2025 19:48

standage reviewed Oct 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ancestry Prediction Tool #2

Ancestry Prediction Tool #2

Uh oh!

rnmitchell commented Jan 13, 2025

Uh oh!

rnmitchell commented Oct 3, 2025

Uh oh!

standage Oct 15, 2025

Uh oh!

standage Oct 15, 2025

Uh oh!

standage Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ancestry Prediction Tool #2

Are you sure you want to change the base?

Ancestry Prediction Tool #2

Uh oh!

Conversation

rnmitchell commented Jan 13, 2025

Uh oh!

rnmitchell commented Oct 3, 2025

Uh oh!

standage Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

standage Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

standage Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants