Skip to content

404Wolf/mdvalidate

Repository files navigation

Welcome to mdvalidate

Mdvalidat is an early stage work in progress!!

MDS is a tiny language for describing how Markdown should look. With mdvalidate, you write schemas that define a shape of Markdown, and MDS checks real documents against them.

It's designed for validating a stream of Markdown via stdin, so you can pipe input (like LLM output) and validate the shape of its response.

mdvalidate schemas consist of many "matcher" patterns, and all matchers have labels. This means that all validated markdown files can produce a JSON of matches found along the way.

We plan to eventually support converting a Markdown schema into a JSON schema describing the shape of the output that it produces once it has validated some Markdown file.

mdvalidate is written in 100% safe rust and is 🔥 blazingly fast 🔥.

You can find the full docs here!

Kitchen Sink Example (current + planned)

Schema:

# Release Notes `version:/v\d+\.\d+\.\d+/`

> Build `build:/[A-F0-9]{7}/` by `_:/\w+/`

## Highlights

- `feature:/[A-Za-z][\w -]+/`{2,4}
  - `detail:/[a-z][\w -]+/`{,2}

Inline: `code`! and `bang`!!

```{lang:/\w+/}
{snippet}
```

```{runtime:/\w+/}
{checked:!python -m py_compile -}
```

`html_block:html`d2

| Key | Value |
| :-- | :---- |
| `key:/\w+/` | `value:/.+/` |
| `key:/\w+/` | `value:/.+/` |{1,3}

```mds
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="h1" type="xs:string"/>
</xs:schema>
```

Input:

# Release Notes v1.4.2

> Build 7A9F3C1 by wolf

## Highlights

- Fast paths
  - fewer allocations
- Safer IO

Inline: `code` and `bang`!

```rust
fn main() {}
```

```python
print("ok")
```

<div>
  <img src="./logo.png" />
</div>

| Key | Value |
| :-- | :---- |
| host | localhost |
| port | 8080 |

<h1>Hello</h1>

Output:

{
  "build": "7A9F3C1",
  "checked": "print(\"ok\")",
  "detail": [
    "fewer allocations"
  ],
  "feature": [
    "Fast paths",
    "Safer IO"
  ],
  "html_block": "<div>\n  <img src=\"./logo.png\" />\n</div>",
  "key": [
    "host",
    "port"
  ],
  "lang": "rust",
  "runtime": "python",
  "snippet": "fn main() {}",
  "value": [
    "localhost",
    "8080"
  ],
  "version": "v1.4.2"
}

Notes:

  • Planned but not implemented yet: html matcher, table row repetition, mds XML schema blocks, execution validation.

Mini Example

Here’s a simple schema that will validate all grocery lists of a specific shape.

# Grocery List

- `item:/[A-Z][a-z]+/`{2,2}
  - `note:/\w+/`{,2}

A passing document:

# Grocery List

- Apples
  - organic
  - local
- Bananas
  - ripe

A failing document (too few sub-notes):

# Grocery List

- Apples
  - organic
  - local

Some examples of what you can match

  • Literal Matching: By default -- if it says # Title, it must match exactly.
  • Matchers: Use `label:/regex/` to define rules for dynamic content.
  • Optional or Repeated Items: Add ? for optional things, + for one or more.
  • Lists & Sublists: Validate nested lists with pattern control.
  • Escaping: Add ! to disable regex interpretation -- great for examples.

Crazy cool recursive schema declaration!

Here's a fun example of a schema that validates multiple list levels and collects labeled matches very deeply!

- `test:/test\d/`{2,2}
- `barbar:/barbar\d/`{2,2}
    + `deep:/deep\d/`{1,1}
        - `deeper:/deeper\d/`{2,2}
        - `deepest:/deepest\d/`{2,}

A passing document:

- test1
- test2
- barbar1
- barbar2
    + deep1
        - deeper1
        - deeper2
        - deepest1
        - deepest2
        - deepest3
        - deepest4

The captured matches:

{
  "barbar": [
    "barbar1",
    "barbar2",
    {
      "deep": [
        "deep1",
        {
          "deeper": [
            "deeper1",
            "deeper2"
          ],
          "deepest": [
            "deepest1",
            "deepest2",
            "deepest3",
            "deepest4"
          ]
        }
      ]
    }
  ],
  "test": [
    "test1",
    "test2"
  ]
}

We're validating:

  • All of the actual list groups, making sure the regex passes
  • The number of list items for each group
  • And capturing it all into a structured output object!

Get started!

Installation

You can build mdvalidate with nix using nix build github:404wolf/mdvalidate.

Alternatively download a pre-built (static) binary from releases for use on x86 or Mac (apple silicon).

It is not officially supported, but you can also build directly with cargo via cargo build --bin mdv.

Using mdvalidate

mdvalidate defines a very simple language for describing the shape of Markdown documents that looks like Markdown itself. You use mdvalidate via a command line tool (CLI).

In every case, you have a schema, in mdschema, Mdvalidate's schema definition language, and an input, which may or may not conform to the schema. You can invoke mdvalidate by running:

mdv path/to/schema.md path/to/input.md
echo $?

Which returns 0 if the validation is successful or 1 if there were errors. Errors are reported to stderr.

You can use - instead of a path to use stdio. If you include a third positional argument, it will also extract data from documents that conform to the schema. For example,

echo "# Hi Wolf" | mdv path/to/schema.md - -
echo $?

For the schema

# Hi `name:/[A-Za-z]+/`

Will return

mdv examples/cli/schema.md examples/cli/input.md - 
{"name":"Wolf"}
0

About

Markdown schema validation engine

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages