Science-Parse / SPv2 (AllenAI)
Parse scientific papers to structured fields (title/author/sections/references)
- Repository
- github.com/allenai/science-parse
Source attribution
- Awesome AI for Science — github.com/allenai/science-parse
Related resources
SOTA multimodal document parsing with 1.2B parameters outperforming GPT-4o, converts PDFs to LLM-ready Markdown/JSON
Comprehensive toolkit for high-quality PDF content extraction with layout detection, formula recognition, and OCR
Neural optical understanding for academic documents, transforms scientific PDFs to Markdown with mathematical formula support
Toolkit for linearizing academic PDFs into LLM-ready text with high accuracy and structure preservation, optimized for scientific literature extraction
Advanced OCR with PP-StructureV3 document parsing, 13% accuracy improvement, supports 80+ languages
Production-grade ETL for transforming complex documents into structured formats, with open-source API