A Clojure library for parsing and writing PDBx/mmCIF files.
PDBx/mmCIF is the standard format for macromolecular structure data, used by the Protein Data Bank (PDB). This library provides:
- Parsing: Convert PDBx/mmCIF files into nested Clojure maps
- Writing: Convert Clojure maps back to valid PDBx/mmCIF format
- Round-trip support: Comments are preserved as attributes of the elements they precede
Add to your deps.edn:
{:deps {io.github.Schmoho/clj-PDBx-mmCIF {:git/tag "v0.1.0" :git/sha "..."}}}(require '[pdbx-mmcif.core :as mmcif])
;; Parse from string
(def doc (mmcif/parse "data_1ABC\n_entry.id 1ABC\n"))
;; Parse from file
(def doc (mmcif/parse-file "structure.cif"))
;; Write to string
(def output (mmcif/write doc))
;; Write to file
(mmcif/write-file doc "output.cif");; Get the first (or only) data block
(def block (mmcif/first-data-block doc))
;; Get a specific item value
(mmcif/get-item block "_entry.id")
;; => {:value "1ABC" :type :unquoted}
;; Get a loop by category
(def atom-loop (mmcif/get-loop block "atom_site"))
;; Convert loop to sequence of maps
(mmcif/loop-as-maps atom-loop)
;; => ({"id" "1" "type_symbol" "N" ...}
;; {"id" "2" "type_symbol" "C" ...})
;; Group items by category
(mmcif/items-by-category block)
;; => {"entry" [...] "cell" [...] ...}The parsed document has this structure:
{:data-blocks [{:type :data-block
:name "1ABC"
:items [{:type :item
:name "_entry.id"
:category "entry"
:item "id"
:value {:value "1ABC" :type :unquoted}
:comments ["# Optional preceding comment"]}
{:type :loop
:columns [{:name "_atom_site.id" :category "atom_site" :item "id"}
{:name "_atom_site.type_symbol" ...}]
:rows [[{:value "1" :type :unquoted} {:value "N" :type :unquoted}]
[{:value "2" :type :unquoted} {:value "C" :type :unquoted}]]
:comments []}]
:save-frames [{:type :save-frame
:name "frame_name"
:items [...]}]}]
:comments ["# File-level comments"]}Values preserve their original quoting style for round-trip compatibility:
:unquoted- Simple values without spaces:1ABC:single-quoted- Single-quoted strings:'Hello World':double-quoted- Double-quoted strings:"Hello World":semicolon- Multi-line text values between semicolons
Special values:
?- Unknown/missing value.- Inapplicable value
This library supports the full PDBx/mmCIF syntax:
- Data blocks:
data_blockname - Items:
_category.item value - Loops:
loop_ _category.item1 _category.item2 value1a value1b value2a value2b - Save frames:
save_framename ... save_ - Comments: Lines starting with
# - Quoted values: Single quotes, double quotes, or semicolon-delimited for multi-line
See the official mmCIF syntax documentation for more details.
# Run tests
clojure -X:test
# Start a REPL with test paths
clojure -A:devMIT License