Skip to content

Aves use case scalability challenge #8

@nfranz

Description

@nfranz

For the current Aves use case, we have single, working input datasets for the entire use case that extend from the root (Class) to the Order level, and also to the Family level. However, at present we seemingly cannot scale to the species level with a single input file and using Euler/X default reasoners, meaning that we need to partition that root-to-species level file into two complementary datasets, provisionally called (each of these is consistent and "solvable"):

(1) 2015-Pala_Neoa_Grade_Species_Complete.txt and
(2) 2015-Acci_Aust_Clade_Species_Complete.txt

Originally there were three species-level partitions (each of these also completes well):

(A) 2015-Pala_Gall_Grade-Species-Complete.txt => 6 kb
(B) 2015-Neoaves-Part-Species-Complete.txt => 22 kb
(C) 2015-Acci_Aust_Clade-Species-Complete.txt => 23 kb

(2) and (C) above are identifical.
(1) above is a merge of (A) and (B), with 174 x 409 and 71,166 MIR. Running (1) on my laptop with "euler2 align" took 10.5 hours but was successful. However, running a merge of (B) and (C) above - called..

(3) 2015-Neoaves-All-Species-Cannot-Process.txt

..produced an "inconsistent/repair" output, I believe also after more than 8-10 hours (overnight). This might mean - assuming that the (3) merge is actually consistent (it should be), that our scalability limits are currently in the interval/complexity range between (1) and (3).

The aforementioned input files, and the successful 10.5 hour run of (1) are in the following DropBox folder:

Dropbox/Euler-Runs/BirdPhylogenies/Scalability-Challenge

Issues:

(i) Can others replicate these results?
(ii) Can we overcome the challenge of scaling to the level of complexity of (3), either with conventional or with custom reasoners?
(iii) Notice that "no coverage" is used 85 times in (3); to account for differential species-level sampling across the two input trees.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions