Extract multiple FHIR resources from a document
/lang2fhir/document/multiExtracts text from a document (PDF or image) and converts it into multiple FHIR resources, returned as a transaction Bundle. Combines document text extraction with multi-resource detection. Automatically detects Patient, Condition, MedicationRequest, Observation, and other resource types. Resources are linked with proper references (e.g., Conditions reference the Patient).
Patient identifier handling. US Core requires Patient.identifier (a business identifier such as an MRN). When the source text contains an identifier, it is extracted with an appropriate URI system. When the source text does not contain a detectable identifier, a synthetic one is generated with system: "urn:phenoml:lang2fhir-generated-id" and a UUID value so the bundle remains FHIR-valid and US Core conformant. Callers who need a tenant-specific namespace should rewrite the synthetic system after extraction.
Body parameters
versionstringrequiredFHIR version to use
contentstringrequiredBase64 encoded file content. Supported file types: PDF (application/pdf), PNG (image/png), JPEG (image/jpeg). File type is auto-detected from content magic bytes.
providerstringoptionalOptional FHIR provider name for provider-specific profiles
implementation_guidestringoptionalCustom Implementation Guide name. When specified, profiles from this IG are included alongside US Core profiles during resource detection. US Core is always the base layer; custom IG profiles are additive.
detection_effortstringoptionaldefault standardDetection effort. 'standard' runs detection once, 'deep' runs detection multiple times for higher recall.
standarddeepvalidation_methodstringoptionaldefault noneFHIR validation method to apply to the generated bundle. 'none' skips validation (default). 'check' runs the bundle through a FHIR structure validator and includes the results in the response. 'fix' runs validation and attempts to auto-correct errors using an LLM (up to 3 validation passes). The response includes results from each pass. Warning: 'fix' can significantly increase latency due to multiple LLM and validation round-trips.
nonecheckfixconfigobjectoptionalOptional processing configuration shared across document endpoints.
page_filterobjectoptionalConfigures per-page pre-extraction filtering. When set, each page of text extracted from the document is classified by an LLM, and pages classified as irrelevant to the supplied context are dropped before FHIR extraction.
contextstringrequiredNatural-language description of what IS relevant to the extraction goal. Pages that do not match are dropped from downstream FHIR extraction.
Successfully extracted FHIR resources from document
Response fields
successbooleanoptionalmessagestringoptionalbundleobjectoptionalresourceTypestringoptionaltypestringoptionalentryarray<object>optionalfullUrlstringoptionalresourceobjectoptionalrequestobjectoptionalmethodstringoptionalurlstringoptionalresourcesarray<object>optionaltempIdstringoptionalresourceTypestringoptionaldescriptionstringoptionaloriginalTextstringoptionalvalidationobjectoptionalpassesarray<object>optionalissuesarray<object>optionalseveritystringoptionalcodestringoptionaldiagnosticsstringoptionalexpressionarray<string>optionalsourcestringoptionalstatsobjectoptionalresource_typestringoptionalprofile_urlstringoptionalis_custom_profilebooleanoptionalduration_msnumberoptionalfixedbooleanoptionalattemptsintegeroptionalsummarystringoptionalpage_classificationsarray<object>optionalpage_numberintegeroptionalincludebooleanoptionalreasonstringoptional