TikZero: Generate Editable, Precise Scientific Figures from Text—No Paired Training Data Needed

Paper & Code

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

2025 • potamides/DeTikZify

★1650

Creating publication-ready scientific diagrams often requires deep familiarity with vector graphics tools or typesetting systems like LaTeX and TikZ. While sketching an idea is easy, translating a textual description—like “a neural network with two hidden layers”—into an accurate, editable TikZ program is time-consuming and error-prone. Worse, most AI-powered figure generators output only raster images, which lack semantic structure and can’t be edited or scaled without quality loss.

TikZero changes this. Developed as part of the DeTikZify project, TikZero enables zero-shot text-guided synthesis of editable TikZ graphics programs—without needing large datasets of captioned TikZ code. Instead of relying on scarce paired text–TikZ examples, it cleverly bridges two abundant but unaligned data sources: standalone TikZ programs and captioned raster images. By using image representations as an intermediary, TikZero decouples text understanding from program generation, unlocking powerful zero-shot capabilities that rival or even surpass commercial systems like GPT-4o.

How TikZero Works: Decoupling Vision, Language, and Code

At its core, TikZero introduces a novel architectural strategy: graphics program synthesis is separated from direct text-to-code mapping. Traditional approaches require aligned training pairs—each TikZ program must come with a matching caption. But such data is extremely limited in scientific domains.

TikZero sidesteps this bottleneck by training two components independently:

A vision-language model that learns to map textual descriptions to image embeddings (using captioned raster figures).
A program synthesis model that learns to generate TikZ code from visual inputs (using unlabeled TikZ-rendered images).

During inference, a text caption is first converted into a visual representation, which then guides the generation of a TikZ program. This indirect pathway—text → image embedding → TikZ—enables zero-shot generalization to new figure types never seen during training.

Two variants are available:

TikZero: An adapter that plugs into an existing DeTikZify v2 model, enabling text conditioning without retraining the entire system.
TikZero+: A fully fine-tuned version that integrates text guidance end-to-end for even higher fidelity.

Both approaches leverage the underlying strength of DeTikZify—a multimodal model trained on millions of scientific figures—and enhance it with text-driven control.

Key Advantages Over Existing Solutions

No Need for Paired Text–TikZ Data

Most program synthesis models collapse without large aligned datasets. TikZero thrives in data-scarce regimes by fusing separate modalities through a shared visual space.

Editable, Reproducible Outputs

Unlike diffusion models or GANs that produce PNGs or JPEGs, TikZero outputs real TikZ code—fully editable, scalable, and compatible with LaTeX workflows. This is critical for academic publishing, where figures must be revised, localized, or integrated into templates.

Competitive Performance Without Massive Scale

Despite not being a trillion-parameter model, TikZero matches or exceeds the performance of much larger commercial systems on scientific figure synthesis benchmarks. This efficiency makes it practical for labs and individual researchers without access to enterprise-grade AI infrastructure.

Built-in Refinement via MCTS

TikZero inherits DeTikZify’s Monte Carlo Tree Search (MCTS)-based inference, which iteratively refines candidate programs over time. This leads to higher compilation success rates and geometric precision without additional training.

Ideal Use Cases

TikZero is especially valuable in the following scenarios:

Academic writing: Automatically generate LaTeX-compatible figures from paper descriptions (e.g., “a bar chart comparing model accuracy across five datasets”).
Prototyping: Turn hand-drawn sketches or vague textual ideas into precise, compilable diagrams in seconds.
Reproducibility: Reconstruct figures from papers when only raster images are available—producing editable vector code instead of static pixels.
Educational tools: Help students learn TikZ by showing how abstract descriptions map to concrete code structures.

Because outputs are real programs—not black-box images—they integrate seamlessly into version control, collaborative editing, and automated document pipelines.

Getting Started

TikZero is accessible via the DeTikZify Python package and Hugging Face models. Two main pathways exist:

Use TikZero+ directly for text-to-TikZ synthesis:

fig = pipeline.sample(text="A decision tree with three levels.")

Or attach the TikZero adapter to a DeTikZify v2 model if you already use one:

pipeline = DetikzifyPipeline(*load_adapter(*load("nllg/detikzify-v2-8b"), "nllg/tikzero-adapter"))

The system supports MCTS-based refinement for higher-quality outputs and can rasterize successful programs for visual validation. While a web UI exists for sketch-based input, text conditioning is currently only available via the programming interface.

Note: A complete TeX Live 2023 installation, along with Ghostscript and Poppler, is required to compile and render TikZ outputs.

Limitations and Practical Notes

Compilation is not guaranteed: Not every generated TikZ program will compile—common in code synthesis tasks. MCTS helps but doesn’t eliminate this.
Infrastructure requirements: Users must install LaTeX dependencies, which may be nontrivial on some systems.
Training data constraints: Due to arXiv licensing, the public DaTikZ datasets exclude some original figures, though scripts are provided to rebuild them.
No web UI for text mode (yet): Text-guided generation requires coding; the visual UI only supports image/sketch input.

These are trade-offs for a system that prioritizes editability and precision over plug-and-play simplicity.

Summary

TikZero solves a real pain point in scientific communication: turning abstract descriptions into precise, editable graphics without relying on unrealistic amounts of labeled data. By decoupling language understanding from program synthesis through visual intermediaries, it achieves strong zero-shot performance while outputting native TikZ code—enabling reproducibility, scalability, and seamless LaTeX integration. For researchers, educators, and technical writers who depend on vector-quality figures, TikZero offers a practical, open, and powerful alternative to raster-only AI tools.