Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01

Paper & Code

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

2025 • Paper2Poster/Paper2Poster

★2943

Creating professional academic posters from dense, multi-page scientific papers is a universal pain point for researchers, PhD students, and lab teams. The task demands not only deep content understanding but also visual design skills, spatial reasoning, and ruthless prioritization—all under tight conference deadlines. Most researchers lack formal design training, leading to cluttered layouts, inconsistent formatting, and posters that fail to convey core insights effectively.

Paper2Poster directly addresses this challenge by offering a fully automated, end-to-end solution: it takes a raw paper.pdf as input and outputs a fully editable .pptx poster that balances visual clarity, textual coherence, and scientific fidelity. Built on a novel multimodal multi-agent architecture called PosterAgent, the system goes beyond simple summarization or static templating. Instead, it intelligently parses, plans, and refines content using visual-in-the-loop feedback, ensuring the final output is both accurate and aesthetically balanced.

Importantly, Paper2Poster is not just a generator—it’s a rigorously evaluated framework. It introduces the first comprehensive benchmark for scientific poster automation, complete with automated metrics like PaperQuiz (testing knowledge retention) and VLM-as-Judge (assessing layout quality and information design). And perhaps most compelling for budget-conscious labs: open-source variants powered by models like Qwen-2.5 outperform GPT-4o-based pipelines across nearly all metrics while using 87% fewer tokens and costing just (0.005 per poster.

From Paper to Editable Poster in One Click

Unlike tools that output static images or require manual copy-paste into design software, Paper2Poster generates a real PowerPoint (.pptx) file. This means users can immediately open the result in Microsoft PowerPoint or LibreOffice Impress and tweak colors, fonts, positions, or content—without losing design integrity or starting from scratch.

The input is simple: a folder containing a single paper.pdf. The output? A professionally structured poster at customizable dimensions (e.g., 48×36 inches), complete with title, authors, abstract, methodology, results, and conclusions—each section intelligently compressed and visually arranged. This eliminates hours of formatting work while preserving the paper’s logical flow and scientific contribution.

How PosterAgent Works: A Visual-In-the-Loop Multi-Agent Pipeline

Paper2Poster’s intelligence lies in its PosterAgent system—a top-down, three-stage pipeline that mimics how expert designers think:

Parser: Reads the full paper (often 20+ pages) and distills it into a structured asset library—extracting text, figures, tables, and captions while preserving semantic relationships.
Planner: Constructs a binary-tree layout that enforces reading order (top-left to bottom-right) and spatial balance. This ensures panels don’t crowd each other and that visual weight is distributed harmoniously.
Painter-Commenter Loop: Renders each panel as editable PowerPoint shapes, then uses a Vision-Language Model (VLM) to inspect the output. If text overflows, figures are misaligned, or semantics drift, the system iteratively refines the code until visual-textual alignment is achieved.

This closed-loop refinement is what separates Paper2Poster from naive template-fillers: it doesn’t just place content—it validates and corrects it using multimodal understanding.

Built-In Evaluation: Beyond “Looks Nice”

Many generative tools produce visually appealing but content-poor outputs. Paper2Poster combats this with four evaluation pillars:

PaperQuiz: Automatically generates quiz questions from the original paper and tests whether a VLM can answer them correctly using only the poster. This measures knowledge preservation.
VLM-as-Judge: A fine-tuned VLM scores posters on six criteria: layout balance, information hierarchy, visual-text alignment, readability, conciseness, and engagement.
Textual Coherence: Evaluates language fluency and logical flow using perplexity and semantic similarity metrics.
Visual Quality: Compares structural and color similarity against human-designed reference posters.

In evaluations, even GPT-4o-generated posters—while visually polished—often score poorly on PaperQuiz, revealing gaps in content fidelity. Paper2Poster’s open-source versions, by contrast, excel at retaining core paper knowledge without sacrificing design.

Cost, Performance, and Accessibility

Running expensive proprietary models for every poster isn’t scalable. Paper2Poster solves this with flexible model backends:

Use GPT-4o for maximum ease (API-only).
Combine Qwen-2.5-7B-Instruct (LLM) with GPT-4o (VLM) for cost efficiency.
Go fully local with Qwen-2.5-7B-Instruct + Qwen-VL via vLLM—ideal for offline or privacy-sensitive environments.

The fully open-source pipeline uses 87% fewer tokens than GPT-4o-only systems and costs approximately )0.005 per poster. This makes large-scale poster generation feasible even for underfunded research groups.

Deployment and Customization Made Easy

Paper2Poster supports multiple deployment modes:

Local installation with Python and vLLM for open-source models.
Docker container for dependency-free execution.
Cloud API mode using OpenAI keys for rapid prototyping.

Customization is handled via YAML configuration:

Global defaults in config/poster.yaml apply to all posters.
Per-paper overrides let you place a poster.yaml next to paper.pdf for unique styling (fonts, margins, color schemes).

Additionally, the system automatically fetches institutional and conference logos:

It first checks a local logo_store/.
If missing, it searches the web via DuckDuckGo (no API needed) or Google Custom Search (with API keys).
You can also provide custom logo paths to bypass auto-detection entirely.

Who Should Use Paper2Poster—and When

This tool shines in high-throughput, time-sensitive academic scenarios:

Conference submissions: Generate consistent, professional posters for multiple papers in a lab.
Grant applications: Quickly produce visual summaries for review panels.
Teaching and demos: Convert student papers or research prototypes into presentation-ready formats.
Multilingual labs: Automate layout so non-native speakers can focus on content, not design.

It’s ideal for users who value accuracy, speed, and editability over highly artistic, hand-crafted designs.

Limitations and Realistic Expectations

While Paper2Poster significantly reduces manual effort, it’s not a magic bullet:

Reader engagement remains a subtle bottleneck. Human designers often use creative visuals (icons, infographics, custom diagrams) that current AI systems can’t replicate. Paper2Poster prioritizes fidelity over flair.
Local open-source VLMs require a GPU and vLLM setup, which may be a barrier for some users.
Logo auto-detection depends on internet access and search engine reliability—offline use requires pre-downloaded assets.

That said, for the vast majority of academic use cases—where clarity, correctness, and consistency matter most—Paper2Poster delivers exceptional value with minimal overhead.

Summary

Paper2Poster transforms a tedious, skill-intensive task into an automated, reliable workflow. By combining structured parsing, intelligent layout planning, and vision-guided refinement, it generates editable PowerPoint posters that faithfully represent scientific papers—both in content and form. Backed by a rigorous evaluation framework and optimized for low cost and open-source accessibility, it’s a practical, production-ready solution for researchers drowning in deadlines but unwilling to compromise on quality.