PC SOFT

FOROS PROFESIONALES
WINDEVWEBDEV y WINDEV Mobile

Inicio → WINDEV 2024 → [WD17] - Extracting Text from a PDF file
[WD17] - Extracting Text from a PDF file
Iniciado por guest, 13,oct. 2015 06:43 - 6 respuestas
Publicado el 13,octubre 2015 - 06:43
In a software I have to provide facility to import data from a PDF.

I check WD's docs. there is facility to load and show PDF but could not find anything that will allow me to extract text content from a PDF in a well formatted manner.

Any ideas as to how I can extract content of a PDF file?

TIA

YogiYang
Publicado el 13,octubre 2015 - 08:49
Hi, extracting text with knowing its exact position from a PDF depends on a lot of unknown factors. Version number of the used PDF converter and so on. Best ist to scan the PDF (or put it into a picture control if it is already scanned) and use an OCR program / library (BCL ?) to read the text. However, the x/y position of the text within the document is lost then.
Publicado el 13,octubre 2015 - 09:21
Hi,

In WD20 there is PDFToText function but I don't know if it's already in WD17.

see help PDFToText function

Regards,

Joris
Publicado el 13,octubre 2015 - 09:55
Thanks everyone for suggestions.

I wanted to use features offered by WD but it seems there is facility for this in WD17 so I will have to resort to using an ActiveX for this.

TIA

Yogi Yang
Publicado el 13,octubre 2015 - 12:28
Hi,

PDFToText is available from version 14 (see at bottom help PDFToText function).
Publicado el 13,octubre 2015 - 13:22
:spos: You're right!
Publicado el 13,octubre 2015 - 16:52
Thanks I checked on site but it seems to state this for WD20. Or probably I mis read it!