Skip to main content

Crate html_generator

Crate html_generator 

Source
Expand description

HTML Generator logo

html-generator

Pure Rust library for transforming Markdown into SEO-optimized, accessible HTML. Zero unsafe code.

Build Crates.io Docs.rs Coverage lib.rs


§Contents


§Install

[dependencies]
html-generator = "0.0.5"

§Optional async support

[dependencies]
html-generator = { version = "0.0.5", features = ["async"] }

§Build from source

git clone https://github.com/sebastienrousseau/html-generator.git
cd html-generator
make          # check + clippy + test

Requires Rust 1.80.0+. Tested on Linux, macOS, and Windows.


§Quick Start

use html_generator::{generate_html, HtmlConfig};

fn main() -> Result<(), html_generator::error::HtmlError> {
    let markdown = "# Hello\n\nThis is **bold** text.";
    let config = HtmlConfig::default();
    let html = generate_html(markdown, &config)?;
    println!("{html}");
    Ok(())
}

§Overview

html-generator converts Markdown into production-ready HTML with a configurable pipeline that applies accessibility, SEO, table of contents, math, diagrams, and minification in a single pass. No raw HTML passthrough by default — safe for untrusted input. Runs natively or as WebAssembly in browsers, Cloudflare Workers, and edge runtimes.

  • Full CommonMark with extensions (tables, strikethrough, task lists, superscript)
  • Front matter extraction from YAML (---), TOML (+++), and JSON ({...})
  • WCAG-compliant output with automatic ARIA attribute injection
  • JSON-LD structured data appended for rich search results
  • Table of contents injected at [[TOC]] placeholder
  • Server-side LaTeX → MathML for $..$ and $$..$$ (no client-side JS needed)
  • Mermaid diagram passthrough for \u{60}\u{60}\u{60}mermaid fenced blocks
  • In-memory minification without disk I/O
  • WebAssembly bindings via wasm-bindgen (browsers, Workers, Edge)
  • Optional async via tokio spawn_blocking (behind async feature)
  • Zero unsafe code via #![forbid(unsafe_code)] at crate root
MetricValue
Source~12,900 lines across 11 modules (src/yaml/ is a vendored snapshot, see FAQ)
Test suite533 unit/integration tests + 163 doctests + 4 WASM smoke tests = 700 total
Coverage98.18% line coverage (cargo llvm-cov); Codecov project ≥95%, patch ≥90% gates
Examples14 branded examples covering every public surface
Dependencies13 native runtime + 1 optional async (tokio) + 2 optional WASM (wasm-bindgen, js-sys)
MSRVRust 1.80.0
WASM bundle5.8 MB raw / 2.0 MB gzipped (after wasm-opt -Os)
CI gates10 distinct checks including end-to-end wasm-pack test --node against Node 20

§Features

Markdown to HTMLFull CommonMark via mdx-gen with extensions: tables, strikethrough, task lists, autolinks, superscript. Custom class blocks via :::class syntax. Image class attributes via ![alt](url).class="cls".
AccessibilityAutomatic ARIA attribute injection for buttons, navs, forms, inputs, tabs, modals, accordions, tooltips. WCAG 2.1 validation (Levels A, AA, AAA). Heading structure checks. Language attribute validation.
Front matterYAML (---), TOML (+++), JSON ({...}) delimiters. extract_front_matter strips metadata and returns body. extract_front_matter_data parses metadata into serde_json::Value.
Table of contentsgenerate_table_of_contents builds <ul> from headings. Pipeline injects at [[TOC]] placeholder when generate_toc is enabled.
SEOMetaTagsBuilder for meta tag generation. generate_structured_data for JSON-LD <script> output with configurable @type and additional properties. HTML entity escaping via escape_html.
Math (MathML)enable_math flag converts $..$ and $$..$$ LaTeX spans to native <math> MathML via pulldown-latex. Server-side, no JS bundle. Conservative regex matchers leave $5 currency literals alone. Behind the math feature (default-on).
Diagrams (Mermaid)enable_diagrams flag rewrites \u{60}\u{60}\u{60}mermaid fenced blocks to <pre class="mermaid"> for the standard client-side mermaid.js bundle. Diagram source flows through verbatim.
MinificationFile-based minify_html(path) and in-memory minify_html_string(html). Preserves HTML semantics, strips comments, minifies CSS/JS. Configurable via MinifyConfig.
WebAssemblywasm feature exposes generateHtml, generateHtmlFullDocument, generateHtmlWithOptions to JavaScript via wasm-bindgen. Build with wasm-pack build --target web --features wasm --no-default-features.
PerformanceRegexes and CSS selectors compiled once into static Lazy. SIMD-backed str::contains short-circuits before any html5ever parse. DOM-aware element replacement handles attribute reordering. 2.09 ms full pipeline on an 8 KB blog payload (comrak parse alone is 172 µs).
AsyncOptional async feature enables async_generate_html via tokio spawn_blocking. Synchronous users pay zero cost — tokio not compiled without the feature.
Security#![forbid(unsafe_code)]. Raw HTML stripped by default (allow_unsafe_html: false). All user-controlled attributes escaped. NUL-byte rejection on file paths. Directory traversal blocked. Input size limits enforced.

§Library Usage

Full pipeline
use html_generator::{generate_html, HtmlConfig};

let config = HtmlConfig {
    add_aria_attributes: true,
    generate_toc: true,
    generate_structured_data: true,
    minify_output: true,
    ..HtmlConfig::default()
};

let markdown = "[[TOC]]\n\n# Introduction\n\nWelcome to the guide.\n\n## Getting Started\n\nFollow these steps.";
let html = generate_html(markdown, &config)?;
// Output includes: ARIA attributes, TOC at [[TOC]], JSON-LD, minified

The pipeline applies steps in order:

  1. Markdown → HTML (with extensions)
  2. Accessibility (ARIA attributes)
  3. Table of contents (inject at [[TOC]])
  4. Structured data (append JSON-LD)
  5. Minification (compress)
Front matter
use html_generator::utils::extract_front_matter_data;

// YAML front matter
let content = "---\ntitle: My Page\nauthor: Jane Doe\n---\n# Hello";
let (metadata, body) = extract_front_matter_data(content)?;
assert_eq!(metadata["title"], "My Page");
assert_eq!(body, "# Hello");

// TOML front matter
let content = "+++\ntitle = \"My Page\"\nauthor = \"Jane Doe\"\n+++\n# Hello";
let (metadata, body) = extract_front_matter_data(content)?;
assert_eq!(metadata["title"], "My Page");

// JSON front matter
let content = "{\"title\": \"My Page\"}\n# Hello";
let (metadata, body) = extract_front_matter_data(content)?;
assert_eq!(metadata["title"], "My Page");
Table of contents
use html_generator::{generate_html, HtmlConfig};

let markdown = "[[TOC]]\n\n# Chapter 1\n\n## Section 1.1\n\n# Chapter 2";
let config = HtmlConfig {
    generate_toc: true,
    ..HtmlConfig::default()
};
let html = generate_html(markdown, &config)?;
assert!(html.contains(r#"<ul>"#));
assert!(html.contains(r#"<a href="\#chapter-1">"#));
SEO and structured data
use html_generator::seo::{MetaTagsBuilder, generate_structured_data, StructuredDataConfig};
use std::collections::HashMap;

// Meta tags
let meta = MetaTagsBuilder::new()
    .with_title("My Page")
    .with_description("A great page")
    .add_meta_tag("author", "Jane Doe")
    .build()?;

// JSON-LD structured data
let html = r#"<html><head><title>My Page</title></head><body><p>Content</p></body></html>"#;
let config = StructuredDataConfig {
    page_type: "Article".to_string(),
    additional_data: Some(HashMap::from([("author".to_string(), "Jane".to_string())])),
    ..Default::default()
};
let json_ld = generate_structured_data(html, Some(config))?;
assert!(json_ld.contains("application/ld+json"));
Accessibility
use html_generator::accessibility::{add_aria_attributes, validate_wcag, AccessibilityConfig};

let html = r#"<button>Submit</button><nav><ul><li>Home</li></ul></nav>"#;

// Enhance with ARIA attributes
let enhanced = add_aria_attributes(html, None)?;
assert!(enhanced.contains("aria-label"));

// Validate WCAG compliance
let config = AccessibilityConfig::default();
let report = validate_wcag(&enhanced, &config, None)?;
println!("Issues found: {}", report.issue_count);
Minification
use html_generator::performance::minify_html_string;

let html = "<html>  <body>  <p>Hello</p>  </body>  </html>";
let minified = minify_html_string(html)?;
assert_eq!(minified, "<html><body><p>Hello</p></body></html>");
Diagnostics

The default generate_html silently degrades when optional steps fail. Use generate_html_with_diagnostics to inspect which steps succeeded:

use html_generator::{generate_html_with_diagnostics, HtmlConfig};

let config = HtmlConfig {
    add_aria_attributes: true,
    generate_toc: true,
    generate_structured_data: true,
    minify_output: true,
    ..HtmlConfig::default()
};

let output = generate_html_with_diagnostics("# Hello", &config)?;
println!("HTML: {} bytes", output.html.len());
for d in &output.diagnostics {
    eprintln!("warning: {d}");
}
Async (optional)

Enable with features = ["async"]:

use html_generator::performance::async_generate_html;

#[tokio::main]
async fn main() -> Result<(), html_generator::error::HtmlError> {
    let html = async_generate_html("# Hello\n\nWorld").await?;
    println!("{html}");
    Ok(())
}

§Configuration

use html_generator::HtmlConfig;

let config = HtmlConfig {
    enable_syntax_highlighting: true,       // Syntax-highlighted code blocks
    syntax_theme: Some("github".into()),    // Highlighting theme
    minify_output: false,                   // Compress output HTML
    add_aria_attributes: true,              // Inject ARIA attributes
    generate_structured_data: false,        // Append JSON-LD
    generate_toc: false,                    // Inject TOC at [[TOC]]
    allow_unsafe_html: false,               // Strip raw HTML (XSS-safe default)
    sanitize_html: false,                   // Sanitize via ammonia (when unsafe is on)
    generate_full_document: false,          // Wrap in HTML5 boilerplate
    max_input_size: 5 * 1024 * 1024,        // 5MB input limit
    max_buffer_size: 16 * 1024 * 1024,      // 16MB I/O buffer
    language: "en-GB".into(),               // Content language (used in html lang attr)
    encoding: "utf-8".into(),               // File I/O encoding
    enable_math: false,                     // LaTeX → MathML for $..$ / $$..$$
    enable_diagrams: false,                 // Mermaid passthrough for ```mermaid blocks
};

Use the builder for validated configuration:

use html_generator::HtmlConfig;

let config = HtmlConfig::builder()
    .with_syntax_highlighting(true, Some("monokai".into()))
    .with_language("en-US")
    .build()?;

§Examples

ExampleDescription
helloHeading, lists, code blocks, links — basic Markdown to HTML
pipelineFull pipeline: ARIA + TOC + JSON-LD + minification in one pass
frontmatterYAML, TOML, JSON front matter extraction and parsing
accessibilityARIA injection for buttons, navs, forms; WCAG validation
seoMeta tags, JSON-LD structured data, HTML entity escaping
tocTable of contents from headings, [[TOC]] placeholder
minifyIn-memory HTML minification with size savings
errorsError variants, type matching, graceful recovery patterns
configHtmlConfig builder, validation, field inspection
headersCustom ID and class generators for heading elements
custom_syntaxTriple-colon blocks (:::warning) and image classes
emojisBundled emoji data, emoji-to-ARIA-label mapping
math_and_diagramsLaTeX → MathML and \u{60}\u{60}\u{60}mermaid passthrough
asyncAsynchronous generation via tokio (requires --features async)

Run any example:

cargo run --example hello
cargo run --example pipeline
cargo run --example accessibility
cargo run --example math_and_diagrams
cargo run --example async --features async

§Performance

Comparative throughput on the same realistic 8 KB blog payload (Apple M-series, criterion --quick, [profile.bench] with opt-level = 3

  • fat LTO):
EngineTime / iterWhat it does
pulldown_cmark (parse only)45 µsPull-parser, no post-processing. Fastest plain CommonMark in Rust.
comrak (parse only)172 µsThe CommonMark/GFM parser this crate wraps.
html_generator (full pipeline)2.09 msParse + ARIA injection + TOC + JSON-LD + minification.

Pure parsers will always be faster — they don’t do ARIA, JSON-LD, TOC, or minification. html-generator does all four in one pass; the ~2 ms overhead is what buys WCAG-compliant output without a downstream post-processing layer. Reproduce with:

cargo bench --bench competitors

§Math and diagrams

Two opt-in post-processors turn ordinary Markdown into rich technical documentation without client-side JavaScript for math:

use html_generator::{generate_html, HtmlConfig};

let md = r"

In a right triangle, $a^2 + b^2 = c^2$.

```mermaid
graph LR
    A --> B
```";
let cfg = HtmlConfig {
    enable_math: true,        // $..$ and $$..$$ → <math> MathML
    enable_diagrams: true,    // ```mermaid → <pre class="mermaid">
    ..HtmlConfig::default()
};
let html = generate_html(md, &cfg)?;
  • Math — server-side LaTeX → MathML via pulldown-latex (gated behind the math feature, on by default). Browsers render MathML natively, so no client-side bundle is required. Parse errors are encoded inline as <merror> markers rather than crashing the build.
  • Diagrams\u{60}\u{60}\u{60}mermaid fenced blocks become <pre class="mermaid">…</pre> so the standard mermaid.js loader picks them up. Drop a single <script type="module">import mermaid from "https://…/mermaid.esm.mjs"; mermaid.initialize({startOnLoad:true});</script> in your page and you’re done.

§WebAssembly

The same pipeline runs in Cloudflare Workers, Vercel Edge, browsers, and Node — without changing API:

cargo build --release --target wasm32-unknown-unknown \
  --features wasm --no-default-features
# or, to publish an npm bundle:
wasm-pack build --target web --features wasm --no-default-features

Three JS-friendly entry points are exposed via wasm-bindgen:

JS nameDescription
generateHtml(markdown)Render Markdown to an accessible HTML fragment with default config.
generateHtmlFullDocument(markdown)Same but wrapped in <!DOCTYPE html><html>…</html>.
generateHtmlWithOptions(markdown, optionsJson)Pass a JSON object configuring add_aria_attributes, generate_toc, enable_math, enable_diagrams, etc.

WASM builds drop mdx-gen’s :::class, image-class, and syntect syntax highlighting (the underlying tokio/onig C dependencies do not compile to wasm32-unknown-unknown); CommonMark + GFM (tables, strikethrough, autolinks, tasklists, superscript) plus the full ARIA / TOC / JSON-LD / math / mermaid post-processing layer renders identically.

Use it from JavaScript:

// pkg/ generated by `wasm-pack build --target web ...`
import init, {
  generateHtml,
  generateHtmlWithOptions,
} from "./pkg/html_generator.js";

await init();

// Simple render with defaults (ARIA on):
const fragment = generateHtml("# Hello, **world**!");

// Render with custom options:
const article = generateHtmlWithOptions(
  "# Math\n\n$$E = mc^2$$",
  JSON.stringify({
    enable_math: true,
    generate_full_document: true,
    language: "en-GB",
  }),
);

From Cloudflare Workers / Vercel Edge: use wasm-pack build --target bundler and import the generated module from your worker entry point. The JS-side API is identical to the browser case.

§Bundle size

Measured wasm-pack build --release --target web output, post wasm-opt -Os:

Feature set.wasm raw.wasm gzipped
--features wasm,math5.8 MB2.0 MB
--features wasm (no math)5.7 MB1.96 MB

The math feature adds ~40 KB gzipped. Both bundles fit comfortably in Cloudflare Workers’ paid plan (10 MB compressed); the free plan (1 MB compressed) requires further trimming — the ammonia, minify-html, and scraper-on-html5ever deps account for the bulk of the binary.

Smoke tests live in tests/wasm_smoke.rs and run under wasm-pack test --node --no-default-features --features wasm,math. The CI’s wasm-build job exercises this exact command on every push.


§FAQ

Why this crate over `comrak`, `pulldown-cmark`, or `markdown-it`?

Those are pure CommonMark parsers — they hand you raw HTML. html-generator is the layer above: parse + ARIA injection + JSON-LD structured data + table of contents + math + mermaid + minification, all in one call. The benchmarks in Performance show the trade-off explicitly. If you only need parse-to-HTML, prefer pulldown-cmark (45 µs on the same payload) and write your own post-processing. If you want a 2026-grade content pipeline that ships WCAG 2.1 + SEO out of the box, this is it.

Is the output really WCAG-compliant?

Yes for the structural conformance items: ARIA labels, roles, landmarks, heading hierarchy, language declarations. validate_wcag(html, &config, None) returns an AccessibilityReport with any remaining issues (missing alt text, color contrast — which html-generator can’t infer from Markdown). Output passes WCAG 2.1 Levels A and AA out of the box; Level AAA requires opt-in via WcagLevel::AAA in the config because some AAA criteria (heading-jump strictness, contrast ratio 7.0:1) reject otherwise-valid documents.

How do I render math without a JavaScript bundle?

Set enable_math: true (it’s behind the math feature, on by default). $..$ and $$..$$ LaTeX spans become <math>...</math> MathML, which modern browsers render natively — no MathJax, no KaTeX, no client-side script tag. Parse errors are encoded inline as <merror> markers so broken LaTeX is visible in the page rather than crashing the build. Currency-style $5 is left literal (the matcher requires a non-digit after the closing $).

Do I have to manage Mermaid rendering myself?

For Mermaid, yes — html-generator only rewrites the markup so the standard mermaid.js bundle finds it. Set enable_diagrams: true and the pipeline emits <pre class="mermaid"> instead of <pre><code class="language-mermaid">. Then drop a single <script type="module">import mermaid from "https://…/mermaid.esm.mjs"; mermaid.initialize({startOnLoad:true});</script> in your page. Server-side mermaid rendering would require running a headless browser or porting the diagram engine to Rust — out of scope for this crate.

Can I run this in Cloudflare Workers / Vercel Edge / a browser?

Yes — wasm-pack build --release --target web --no-default-features --features wasm,math produces a 5.8 MB raw / 2.0 MB gzipped bundle plus ~13 KB of JS bindings. The exposed JS surface is generateHtml, generateHtmlFullDocument, and generateHtmlWithOptions(markdown, optionsJson). Workers’ paid plan allows 10 MB compressed scripts, fitting comfortably; the free tier (1 MB compressed) requires further trimming and is not currently a supported configuration.

What's missing on the WASM target compared to native?

Three things, all from mdx-gen’s extension layer (which doesn’t compile to wasm32-unknown-unknown because of an unconditional tokio dep): :::class custom blocks, image-class syntax (![alt](url).class="…"), and syntect syntax highlighting. CommonMark + GFM (tables, strikethrough, autolinks, tasklists, superscript) plus the full ARIA / TOC / JSON-LD / math / mermaid post-processing renders identically.

Why is raw HTML in Markdown stripped by default?

Untrusted Markdown that contains raw <script> tags is an XSS vector. HtmlConfig::default() sets allow_unsafe_html = false so <script> and friends never make it to the output. If you control the Markdown source (e.g. site authors you trust), set allow_unsafe_html = true. For user-submitted Markdown, set both allow_unsafe_html = true and sanitize_html = true — the pipeline runs ammonia over the final HTML to strip dangerous elements while keeping safe ones.

Why does the same Markdown produce identical HTML on every run now?

Earlier versions (≤ 0.0.4) used uuid::Uuid::new_v4() for auto-generated ARIA IDs, so two runs over the same input produced different HTML — bad for content-addressable caching, deterministic builds, and snapshot testing. v0.0.5 replaced UUIDs with per-call counters so byte-identical input produces byte-identical output. The uuid runtime dependency was dropped in the same commit.

How does the pipeline handle errors gracefully?

Use generate_html_with_diagnostics instead of generate_html. It returns an HtmlOutput with html: String and diagnostics: Vec<Diagnostic>. Each diagnostic records which pipeline step (accessibility, toc, structured_data, minification, etc.) emitted it and at what severity. Non-fatal failures degrade rather than abort — e.g., if ARIA injection fails on malformed HTML the unenhanced HTML is returned with an Error-level diagnostic, and the rest of the pipeline continues.

What's `src/yaml/`? It's massive.

A vendored, pure-Rust YAML parser kept verbatim from upstream (yaml_safe@0.1.0, in turn a fork-and-rename of serde_yml away from the unsound libyml C dependency). It exists as a private mod yaml inside the crate (~2 700 lines) so the crate compiles without taking on the unsound serde_yml registry dependency or its RUSTSEC-2025-0068 advisory. Excluded from coverage and clippy in CI; not part of the public API surface. Will be replaced with the crates.io-published yaml_safe = "0.1" registry dependency once that ships.

Is `cargo publish` supported?

Yes — cargo publish --dry-run succeeds as of v0.0.5. The earlier blocker (path-only crates/yaml_safe/ without a version = field) was closed by inlining the YAML implementation into src/yaml/.

What's the MSRV policy?

Rust 1.80.0 is the floor. Bumps require a minor-version increment and a CHANGELOG entry. Linting and formatting follow the latest stable (cargo fmt --all -- --check and cargo clippy -- -D warnings are expected to pass on the toolchain pinned in mise.toml/the CI config).


§Development

make              # check + clippy + test
make build        # cargo build
make test         # run all tests
make lint         # clippy with strict flags
make format       # rustfmt
make deny         # supply-chain audit
make outdated     # dependency freshness check
make help         # list all targets

§CI

WorkflowTriggerPurpose
ci.ymlpush, PRClippy, fmt, test (all features), coverage, audit
docs.ymlpush to mainBuild and deploy API docs to GitHub Pages
security.ymlpush, PRDependency review, CodeQL, cargo-audit, cargo-deny

See CONTRIBUTING.md for signed commits and PR guidelines.


§Security

  • #![forbid(unsafe_code)] at crate root and in Cargo.toml lints
  • Raw HTML stripped by default — opt-in via allow_unsafe_html: true
  • All user-controlled attributes escaped via escape_html
  • Directory traversal (..) blocked in file path validation
  • Input size limits enforced at all boundaries
  • cargo audit clean (transitive advisory ignores documented in .cargo/audit.toml)
  • cargo deny – license, advisory, and ban checks
  • SPDX license headers on all source files
  • Signed commits enforced via CI

§License

Dual-licensed under Apache 2.0 or MIT, at your option.

Back to Top

Re-exports§

pub use crate::error::HtmlError;
pub use accessibility::add_aria_attributes;
pub use accessibility::validate_wcag;
pub use emojis::load_emoji_sequences;
pub use generator::generate_html;
pub use generator::generate_html_with_diagnostics;
pub use generator::Diagnostic;
pub use generator::DiagnosticLevel;
pub use generator::HtmlOutput;
pub use performance::async_generate_html;
pub use performance::minify_html;
pub use performance::minify_html_string;
pub use seo::generate_meta_tags;
pub use seo::generate_structured_data;
pub use utils::extract_front_matter;
pub use utils::extract_front_matter_data;
pub use utils::format_header_with_id_class;

Modules§

accessibility
Accessibility-related functionality for HTML processing.
constants
Common constants used throughout the library.
elements
HTML5 semantic element builders.
emojis
Emoji Sequences Loader
error
Error types for HTML generation and processing.
generator
HTML generation module for converting Markdown to HTML.
math
Server-side LaTeX → MathML and Mermaid diagram passthrough.
performance
Performance optimization functionality for HTML processing.
seo
Search Engine Optimization (SEO) functionality for HTML processing.
utils
Utility functions for HTML and Markdown processing.
wasm
WebAssembly bindings.

Structs§

HtmlConfig
Configuration options for HTML generation.
HtmlConfigBuilder
Builder for constructing HtmlConfig instances.
MarkdownConfigDeprecated
Legacy configuration type — use HtmlConfig directly instead.

Enums§

ConfigError
Errors that can occur during configuration.
OutputDestination
Output destination for HTML generation.

Functions§

markdown_file_to_html
Converts a Markdown file to HTML.
markdown_to_html
Converts Markdown content to HTML.
validate_language_code
Validates that a language code matches the BCP 47 format (e.g., “en-GB”).

Type Aliases§

Result
Result type alias for library operations.