-
Notifications
You must be signed in to change notification settings - Fork 4
Update to docs, allele formatting #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| def __init__(self, name, rsids, index, xrefs=None, source=None): | ||
| self.name = Marker.check_name(name) | ||
| self.source_name = str(self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix part 1
| self.source_name_map[marker.source.name][marker.name] = self.definition_names[marker.posstr()] | ||
| self.source_name_map[marker.source.name][marker.source_name] = self.definition_names[marker.posstr()] | ||
| continue | ||
| else: | ||
| new_name = marker.name | ||
| if len(self.markers_by_definition) > 1: | ||
| new_name = f"{marker.name}.v{len(self.definition_names) + 1}" | ||
| self.definition_names[marker.posstr()] = new_name | ||
| self.source_name_map[marker.source.name][marker.name] = new_name | ||
| self.source_name_map[marker.source.name][marker.source_name] = new_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix part 2
| - 2413 distinct loci | ||
| [frequencies] | ||
| - 59753 haplotypes | ||
| - 59704 haplotypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correcting for frequency records using deprecated marker identifiers
microhapdb/tests/test_frequency.py
Outdated
| def test_marker_names_valid(): | ||
| freq_markers = set(microhapdb.frequencies.Marker) | ||
| markers = set(microhapdb.markers.Name) | ||
| invalid = freq_markers - markers | ||
| print(invalid) | ||
| assert len(invalid) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this regression test
|
Additional issues discovered after running the regression test on the master branch.
|
In this PR I'm updating the binder demo notebook. In the process, I changed the allele formatting from
A|T|T|AtoA:T:T:Ato avoid confusion with conventional genetic notation for haplotype phases. (I would love to have dropped the separators altogether, but some legacy functions of the database still need to handle microhaps with indels correctly.)I also found a bug with how non-1KGP allele frequencies were being renamed post-resolution of locus and allele definition identifiers.
It only affected four allele definitions at two loci, and was resolved with a simple change to the build procedure.None of the standard 1KGP allele frequencies or Ae scores were affected.Update: Actually, after running the new regression test on the master branch, I found three more affected loci—see comment below. As before, the 1KGP allele frequencies remain unaffected.