相关标签
htmlmarkdownpdfaiconvertxlsxpdf-converterdocxdocumentspptx

Here are 108 public repositories matching this topic...

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

  • Updated Jun 4, 2026
  • HTML

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

  • Updated Jun 4, 2026
  • Python