Skip to content

Clojure-Bioinformatics-Format-Parsers/clj-PDBx-mmCIF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clj-PDBx-mmCIF

A Clojure library for parsing and writing PDBx/mmCIF files.

Overview

PDBx/mmCIF is the standard format for macromolecular structure data, used by the Protein Data Bank (PDB). This library provides:

  1. Parsing: Convert PDBx/mmCIF files into nested Clojure maps
  2. Writing: Convert Clojure maps back to valid PDBx/mmCIF format
  3. Round-trip support: Comments are preserved as attributes of the elements they precede

Installation

Add to your deps.edn:

{:deps {io.github.Schmoho/clj-PDBx-mmCIF {:git/tag "v0.1.0" :git/sha "..."}}}

Usage

Basic Parsing and Writing

(require '[pdbx-mmcif.core :as mmcif])

;; Parse from string
(def doc (mmcif/parse "data_1ABC\n_entry.id 1ABC\n"))

;; Parse from file
(def doc (mmcif/parse-file "structure.cif"))

;; Write to string
(def output (mmcif/write doc))

;; Write to file
(mmcif/write-file doc "output.cif")

Working with the Parsed Data

;; Get the first (or only) data block
(def block (mmcif/first-data-block doc))

;; Get a specific item value
(mmcif/get-item block "_entry.id")
;; => {:value "1ABC" :type :unquoted}

;; Get a loop by category
(def atom-loop (mmcif/get-loop block "atom_site"))

;; Convert loop to sequence of maps
(mmcif/loop-as-maps atom-loop)
;; => ({"id" "1" "type_symbol" "N" ...}
;;     {"id" "2" "type_symbol" "C" ...})

;; Group items by category
(mmcif/items-by-category block)
;; => {"entry" [...] "cell" [...] ...}

Data Structure

The parsed document has this structure:

{:data-blocks [{:type :data-block
                :name "1ABC"
                :items [{:type :item
                         :name "_entry.id"
                         :category "entry"
                         :item "id"
                         :value {:value "1ABC" :type :unquoted}
                         :comments ["# Optional preceding comment"]}
                        {:type :loop
                         :columns [{:name "_atom_site.id" :category "atom_site" :item "id"}
                                   {:name "_atom_site.type_symbol" ...}]
                         :rows [[{:value "1" :type :unquoted} {:value "N" :type :unquoted}]
                                [{:value "2" :type :unquoted} {:value "C" :type :unquoted}]]
                         :comments []}]
                :save-frames [{:type :save-frame
                               :name "frame_name"
                               :items [...]}]}]
 :comments ["# File-level comments"]}

Value Types

Values preserve their original quoting style for round-trip compatibility:

  • :unquoted - Simple values without spaces: 1ABC
  • :single-quoted - Single-quoted strings: 'Hello World'
  • :double-quoted - Double-quoted strings: "Hello World"
  • :semicolon - Multi-line text values between semicolons

Special values:

  • ? - Unknown/missing value
  • . - Inapplicable value

mmCIF Syntax Reference

This library supports the full PDBx/mmCIF syntax:

  • Data blocks: data_blockname
  • Items: _category.item value
  • Loops:
    loop_
    _category.item1
    _category.item2
    value1a value1b
    value2a value2b
    
  • Save frames: save_framename ... save_
  • Comments: Lines starting with #
  • Quoted values: Single quotes, double quotes, or semicolon-delimited for multi-line

See the official mmCIF syntax documentation for more details.

Development

# Run tests
clojure -X:test

# Start a REPL with test paths
clojure -A:dev

License

MIT License

About

Clojure parser for the PDBx/mmCIF format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published