PC SOFT

FOROS PROFESIONALES
WINDEVWEBDEV y WINDEV Mobile

Inicio → WINDEV 2024 → Extract text from PDF
Extract text from PDF
Iniciado por guest, 31,mar. 2011 18:42 - 7 respuestas
Publicado el 31,marzo 2011 - 18:42
Hi,
I need to extract the text from a PDF file.
Afaik there are two types of PFD files. Image and text.
The text-PDF is readable (you can for instance select and copy text from it).
Does anyone know how I can do this?
I found already some libraries on the internet , but maybe someone did this before in WD?
Publicado el 31,marzo 2011 - 18:42
Hi Arie,
in WD16, I suppose you can use PDFToText() for text-PDF. May be, also in WD15 ?
Publicado el 01,abril 2011 - 01:27
Hi Arie,
there is a PDFToText in WinDEV and WebDEV (starting from v14)
1) It wont work with image type PDFs.
For this to work, you must use an OCR library to translate the graphics to text.
2) it doen't work with UNICODE

Steven Sitas
Publicado el 01,abril 2011 - 01:27
What about that - 10x faster again
Thanks
Publicado el 01,abril 2011 - 01:27
Hi Arie,
although WinDEV is a great tool, sometimes it's limitations make me crazy :)
There are many issues with international support (like Greek).
Did you know JAVA generation won't work with Greek (or other NON Western Language)?
Or that JAVA doesn't support Unicode with HF?
and probably the new Linux apps (in V16) won't support NON Western Languages?
They are really close to a PERFECT TOOL, just give a little more time to NON Western language issues ....
Steven Sitas
Publicado el 01,abril 2011 - 01:28
Limitations yes.
The very first PDF I try gives an error "invalid page number in de PDF" :-(
Publicado el 02,abril 2011 - 02:43
is only valid if you dont speek english
Publicado el 15,diciembre 2015 - 14:45
you can try this free online pdf to text converter http://www.online-code.net/pdf-to-word.html to convert pdf to txt online.