Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 32 additions & 10 deletions bagger/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ class Status(IntEnum):
DRY_RUN = SUCCESS
```

Code that imports the `Bagger` module can use the name or value of the `Status` object:
Code that imports the `Bagger` module can use the name or value of the `Status` object:

```python
if status == Status.INVALID_PATH:
Expand Down Expand Up @@ -140,6 +140,8 @@ workflow = "default_workflow.json" # Path to the DART workflow file
dart_command = "dart-runner" # Command or path to DART executable
```

There are currently two workflows `default_workflow.json` and `noupload_workflow.json`. The only difference is that the latter instructs dart-runner to not upload files to any remote storage locations.

### Logging

ReBACH-Bagger logs errors, debug messages, and DART output to disk.
Expand All @@ -152,12 +154,12 @@ logfile_prefix = "ReBACH-Bagger" # Log filename prefix

### Wasabi

Both DART and ReBACH-Bagger use the credentials in this section to authenticate to Wasabi.
Both DART and ReBACH-Bagger use the credentials in this section to authenticate to Wasabi.
ReBACH-Bagger checks Wasabi for duplicate bags. See [DART Workflow]("#dart-workflow") for details on how
these variables are used in DART.
these variables are used in DART.

If the `dart_workflow_hostbucket_override` variable is set to `true`
(default), the values of `host` and `bucket` defined here are used in the DART workflow defined in the
If the `dart_workflow_hostbucket_override` variable is set to `true`
(default), the values of `host` and `bucket` defined here are used in the DART workflow defined in the
`workflow` variable above. If set to `false`, the values defined in the workflow itself are used instead. This
option can only be set in the configuration file.

Expand Down Expand Up @@ -192,7 +194,7 @@ the tag.

### Metadata from JSON

Users may also use an inline table to define a dot-notation `tag_path` to the tag's corresponding
Users may also use an inline table to define a dot-notation `tag_path` to the tag's corresponding
key in the package's metadata file. Take the following abbreviated example of a metadata JSON file:

```json
Expand All @@ -214,15 +216,35 @@ key in the package's metadata file. Take the following abbreviated example of a
}
```

To define a set of tags based on this metadata named "First-Author", "License", and "DOI" in the
To define a set of tags based on this metadata named "First-Author", "License", and "DOI" in the
"bag-info.txt" tag file, users can define the following relationships in the config file:

```toml
[Metadata]
bag-info.First-Author = { tag_path = "authors.0.full_name" }
bag-info.License = { tag_path = "license.name" }
bag-info.DOI = { tag_path = "doi" }
bag-info.First-Author = { tag_path = ["authors.0.full_name"] }
bag-info.License = { tag_path = ["license.name"] }
bag-info.DOI = { tag_path = ["doi"] }
```

To extract multiple items and concatenate their values into a single tag, include more list items.

```toml
bag-info.External-Identifier = { tag_path = ["authors.0.full_name", "#hash#"] }
```

Note the special value `#hash#`. This will not extract values from the JSON but instead from the name of the bag that will be created. Available values:

- `#id#`: the article id
- `#version#`: the article version (in `vXX` format where XX is a zero-padded number from 1 to 99)
- `#hash#`: The metadata hash

In the example, the value of External-Identifier will be set to `Brian Avants-<md5>` where `<md5>` is the 32 character MD5 hash computed by bagger for the bag name. To include literal text in the tag, enclose it in `@`.

```toml
bag-info.External-Identifier = { tag_path = ["@azu_@", "authors.0.full_name", "#hash#"] }
```
sets External-Identifier to `azu_Brian Avants-<md5>`


### Metadata Utilities

Expand Down
2 changes: 1 addition & 1 deletion bagger/bag.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def _init_dart(self, package_path: PathLike) -> Union[Status, tuple[str, list]]:
if not self.validate_package(metadata_path):
return Status.INVALID_PACKAGE

metadata_tags = Metadata(self.config, metadata_path,
metadata_tags = Metadata(self.config, metadata_path, article_id, version, metadata_hash,
self.log).parse_metadata()

if not metadata_tags:
Expand Down
9 changes: 5 additions & 4 deletions bagger/config/default.example.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[Defaults]
output_dir = "out"
workflow = "bagger/config/default_workflow.json"
workflow = "bagger/config/noupload_workflow.json"
dart_command = "dart-runner"
overwrite = false
delete = true
Expand All @@ -26,6 +26,7 @@ aptrust-info.Description = { tag_path = "description", strip_html = true }

bag-info.Contact-Name = "ReDATA Administrator"
bag-info.Contact-Email = "redata@arizona.edu"
bag-info.Internal-Sender-Identifier = { tag_path = "doi" }
bag-info.License-Name = { tag_path = "license.name" }
bag-info.Published-Date = { tag_path = "published_date" }
bag-info.Internal-Sender-Identifier = { tag_path = ["doi"] }
bag-info.License-Name = { tag_path = ["license.name"] }
bag-info.Published-Date = { tag_path = ["published_date"] }
bag-info.External-Identifier = { tag_path = ["id", "#version#", "authors.0.first_name", "authors.0.last_name", "#hash#"] }
40 changes: 28 additions & 12 deletions bagger/config/default_workflow.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
"packagePluginId": "BagIt",
"packagePluginName": null,
"bagItProfile": {
"id": "80dda49d-96c9-46dd-91bf-7f57053854d4",
"id": "7f0cd963-4750-4c10-a3de-0c65bb0d0dc5",
"userCanDelete": true,
"required": [
"name",
"id"
],
"errors": {},
"name": "ReDATA BagIt Profile",
"name": "ReDATA",
"description": "BagIt Profile for ReDATA BagIt preservation using ReBACH (based on APTrust BagIt Profile v2.2)",
"acceptBagItVersion": [
"0.97",
Expand All @@ -24,13 +24,13 @@
],
"allowFetchTxt": false,
"bagItProfileInfo": {
"bagItProfileIdentifier": "https://raw.githubusercontent.com/UAL-RE/ReBACH/main/profiles/redata-bagit-dart-v2.2.json",
"bagItProfileIdentifier": "https://raw.githubusercontent.com/UAL-RE/ReBACH/main/bagger/profiles/redata-bagit-dart-v3.0.json",
"bagItProfileVersion": "",
"contactEmail": "redata@arizona.edu",
"contactName": "ReDATA Administrator",
"externalDescription": "BagIt profile for creating bags from ReDATA content. Based on APTrust BagIt profile v2.2.",
"externalDescription": "Profile for ReDATA content. Based on APTrust BagIt profile v2.2. bagItProfileIdentifier points to a DART settings object which contains this profile.",
"sourceOrganization": "redata.arizona.edu",
"version": "2.2"
"version": "3.0"
},
"manifestsRequired": [
"md5"
Expand Down Expand Up @@ -106,14 +106,15 @@
"tagName": "Bag-Count",
"required": false,
"values": [],
"defaultValue": "",
"defaultValue": null,
"userValue": "",
"help": "The number of bags that make up this object. Set this only if you are packaging a single object into multiple bags. See https://wiki.aptrust.org/Bagging_specifications for info on naming multi-part APTrust bags.",
"isBuiltIn": true,
"isUserAddedFile": false,
"isUserAddedTag": false,
"wasAddedForJob": false,
"errors": {}
"errors": {},
"emptyOk": true
},
{
"id": "41b75504-e54d-49a1-aad4-c8a4921d15ce",
Expand Down Expand Up @@ -287,7 +288,7 @@
"errors": {}
},
{
"id": "16772027-e33b-416b-9eb1-9d50b02224f7",
"id": "22f51ea6-de98-494e-a876-dbaeb004cc9f",
"tagFile": "bag-info.txt",
"tagName": "License-Name",
"required": false,
Expand All @@ -305,15 +306,30 @@
],
"defaultValue": "",
"userValue": "",
"help": "The name of the license assigned to the record in ReDATA. For items with multiple licenses,",
"help": "The name of the license assigned to the record in ReDATA.",
"isBuiltIn": false,
"isUserAddedFile": false,
"isUserAddedTag": true,
"wasAddedForJob": false,
"errors": {}
},
{
"id": "317ede75-de6f-4bd5-9af0-67c2a24f9174",
"tagFile": "bag-info.txt",
"tagName": "External-Identifier",
"required": true,
"values": [],
"defaultValue": "",
"userValue": "",
"help": "Format: {id of item on figshare}-{version}-{first author firstname}-{first author lastname}-{hash}",
"isBuiltIn": false,
"isUserAddedFile": false,
"isUserAddedTag": true,
"wasAddedForJob": false,
"errors": {}
},
{
"id": "d9044d6b-818c-4c3b-b7ff-ab867c525249",
"id": "8a264c2d-9a75-4a07-917e-1763e2b57993",
"tagFile": "bag-info.txt",
"tagName": "Published-Date",
"required": false,
Expand Down Expand Up @@ -344,8 +360,8 @@
"id"
],
"errors": {},
"name": "Wasabi Main",
"description": "Main Wasabi storage endpoint",
"name": "Wasabi (ReDATA)",
"description": "Wasabi storage for ReDATA. Override necessary settings using ReBACH bagger config.",
"protocol": "s3",
"host": "***Override***",
"port": 0,
Expand Down
Loading