|
| Iniciado por guest, 02,sep. 2017 14:20 - 1 respuesta |
| |
| | | |
|
| |
| Publicado el 02,septiembre 2017 - 14:20 |
Hi
PDFToText works well when it is one page but it does not work for me for a 2 page scanned PDF.
PDFToText help specifies that specific pages can be converted i.e. PDFTotext(sfilename,"1") or "2" etc.
But it ain't working. Am I doing something wrong?
Thanks in advance.
Ericus Steyn |
| |
| |
| | | |
|
| | |
| |
| Publicado el 02,septiembre 2017 - 14:35 |
Hi Ericus,
you are not doing anything wrong per se, but in that case, there is no text to extract in the pdf.
A pdf can contain text just like a text file, and that is what happens when you print to pdf or save as pdf in word of libreoffice.
However, when you SCAN to pdf, there is NO OCR PERFORMED, so the content of the pdf is an image for each page. And PDFToText is also not performing any OCR, just extracting the text content, if any.
That's why pdftotext will not give you any result when working on a scanned pdf. You will need to do some OCR on the pdf file instead.
Best regards |
| |
| |
| | | |
|
| | | | |
| | |
|