PC SOFT

FOROS PROFESIONALES
WINDEVWEBDEV y WINDEV Mobile

Inicio → WINDEV 2024 → Read text from pdf.
Read text from pdf.
Iniciado por guest, 16,may. 2012 08:34 - 1 respuesta
Publicado el 16,mayo 2012 - 08:34
Hi.

I would like to extract text from a PDF file.
There is a table inside the pdf, and when i use pdftotext, the text is read but unsorted. (it reads all the cells of column 1, then all cells of column 2, etc). And i cannot manage to sort it, because doesn't even put a CR (carriage return) character after each row. So after pdftotext I only see a very large string without CR.

If I convert the file using adobe acrobat (file > save as > txt) it converts ok.


I read in the french forum (with google translate) about "abby pdf transformer". It's a dll that I can use in windev, but it cost 1600 $.

Thanks.
Publicado el 24,octubre 2015 - 17:43
I think you should have a fine OCR component to read text from pdf and check some free trial packages of some 3rd party pdf converters: http://www.pqscan.com/convert-pdf/
I hope you success. Good luck.