Skip to content

cleanPath behavior has troubling issues #2401

@opotowsky

Description

@opotowsky

Problem Description

Here's the code:

def cleanPath(path, mpiRank=0):
"""Recursively delete a path.
!!! Be careful with this !!! It can delete the entire cluster.
We add copious os.path.exists checks in case an MPI set of things is trying to delete everything
at the same time. Always check filenames for some special flag when calling this, especially
with full permissions on the cluster. You could accidentally delete everyone's work with one
misplaced line! This doesn't ask questions.
Safety nets include an allow-list of paths.
This makes use of shutil.rmtree and os.remove
Returns
-------
success : bool
True if file was deleted. False if it was not.
"""
valid = False
if not os.path.exists(path):
return True
for validPath in DO_NOT_CLEAN_PATHS:
if validPath in path.lower():
valid = True
if pathlib.Path(context.APP_DATA) in pathlib.Path(path).parents:
valid = True
if not valid:
raise Exception("You tried to delete {0}, but it does not seem safe to do so.".format(path))

Note the variable valid. I assume this means "valid for deletion". The if statement on line 219 backs up this assumption of mine. If valid = False, you get the warning that the path cannot be deleted, and if valid = True, the path is deleted.

But the checks that set valid to True are the opposite of what you'd expect! This caused me to lose a lot of time until I took a 3rd look at the code.

Issue 1

Look at the check on line 212. It is checking for paths in DO_NOT_CLEAN_PATHS:

DO_NOT_CLEAN_PATHS = [
"armiruns",
"failedruns",
"mc2run",
"mongoose",
"shufflebranches",
"snapshot",
"tests",
]

These are actually being processed as YES_CLEAN_THESE_PATHS. And anything under tests should be cleaned (these are always from TempDirChanger uses). But maybe some of these others we'd want to keep around, although I assume they are mostly irrelevant / out of date.

Issue 2

The check on line 216 is problematic. It's checking if APP_DATA is in ANY parent of path. Who cares about APP_DATA at all? Except....it's a default for activateLocalFastPath:

armi/armi/context.py

Lines 177 to 184 in 8586f00

_FAST_PATH = os.path.join(
APP_DATA,
"{}{}-{}".format(
MPI_RANK,
os.environ.get("PYTEST_XDIST_WORKER", ""), # for parallel unit testing,
datetime.datetime.now().strftime("%Y%m%d%H%M%S%f"),
),
)

But, like, it's only a part of the default fast path. OR it's not even fast path at all because much of the time it's set to the cwd:

_FAST_PATH = os.path.join(os.getcwd())
. Either way, this has no bearing on the delete-ability of path.

Also, APP_DATA will always be in parents if they are on the same system. E.g., parents includes the file path /, and if APP_DATA is /tmp/.armi then it's in parents.

Anyway, I think I've made the case that we shouldn't care if APP_DATA is in any parent of path. Can we just remove the check? I don't think it's doing anything and maybe if nothing else could be dangerous?

Issue 3

There's a lot of code here on down in context.py:

def cleanTempDirs(olderThanDays=None):

This is performing a more sweeping service, but still there is overlap with the cleanPath code and it hasn't been changed functionally for 4 years. It is run on Exceptions in main.py (as well as in downstream apps). Seems odd to only clean up on exceptions?

Solution

  1. Obviously, clear up the code that is giving people the wrong idea. Update DO_NOT_CLEAN_PATHS and make sure it actually makes sense for modern ARMI uses. Make sure the code is processing it properly (right now the code is allowing deletions of anything in this list).
  2. I think any TemporaryDirectory path should be valid for deletion, as well as anything under tests. We should streamline these deletions cause they are always going to be temporary meaning WE WANT THEM GONE.
  3. Get rid of the APP_DATA check. Instead, ensure that the path for deletion is underneath _FAST_PATH (this should greenlight any deletion)
    4.The main issue is being careful that we aren't deleting anything outside of a given user's context. So we could consider greenlighting getpass.getuser() as well. (Update: nah prob not needed)
  4. Figure out how to handle issue # 3.

I've already got a feature branch I'm working on to fix this. Just thought the bug should be described thoroughly in a ticket before I made a PR. Obviously this is a change we need to make very carefully.

Metadata

Metadata

Assignees

Labels

bugSomething is wrong: Highest Priority

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions