The Document Center accepts a variety of file formats to be processed. Still there are some limitations.
- To extract data from documents, the API is accepting files of type pdf, jpg, png and tiff.
- The maximum file size is 10 MB.
- The maximum decompressed image size is 178 Mega pixels.
- Documents are stored at least for 30 days on the Document Center after processing has finished. After 30 days, the files might eventually be removed.
Processing and converting files
The following procedure is applied to every incoming file:
- If possible the background is cropped and the image de-skewed.
- If more than one file is provided, these are merged as separate pages into one document.
- Unless a native PDF is provided, OCR is done. OCR is also done for scanned PDFs that already do contain OCR text.
- Unless it is a PDF, all files are converted into a PDF.
- All PDFs are made searchable for the OCR text and saved as PDF/a (Archivable PDF)
- Depending on the agreements PDFs are signed with a self signed signature or a validated certificate.
We use the results of the conversion process for further analysis and data extraction.
Make use of the converted PDFs
You can also use these files in exchange for the originals you have provided. It comes with a couple of benefits:
- You get a PDF no matter what you have sent.
- The PDF is searchable.
- The PDF is of type PDF/a and digitally signed.
The resulting PDF files are provided via the API at your own discretion.
To get documents processed you need to deliver them as public URLs to our API. You can secure these with temporary URLs, basic authentication etc.