PC SOFT

PROFESSIONAL NEWSGROUPS
WINDEVWEBDEV and WINDEV Mobile

Home → WINDEV 2024 → Tesseract / OCRtools .net or other OCR
Tesseract / OCRtools .net or other OCR
Started by Noel Tanti, Jul., 15 2013 2:51 PM - 12 replies
Posted on July, 15 2013 - 2:51 PM
Hi ,

I am looking to integrate an OCR into my windev app and basically the options are two

tesseract for free. has anybody implemented any of these in windev?

ocrtools . Is it possible to integrate a .net compnent (ocrtools) to windev?

any help appreciated

Regards
Noel
Posted on July, 15 2013 - 8:24 PM
Hi. For simple jobs you can look at http://pumanet.codeplex.com/

Very free, simple to integrate with Windev.


Rubén
Posted on July, 16 2013 - 8:45 AM
I have tried Tesseract OCR but I had used the command line version as it is easier to use.

i would suggest you to shell from your program and execute the command line version with necessary parameters.
Posted on July, 16 2013 - 1:50 PM
Hi Noel

FYI:
- there is an example of tesseract usage in the latest LST...
- I know, it's in french, but you can translate the whole code with 3 clicks in your UI, and the article itself shouldn't be THAT important
- also, you can buy THIS installment only from the french web site, in the LST part

so if you want to save time...

Best regards
Posted on July, 16 2013 - 4:15 PM
@Ruben Sanchez

Please, can you share some example written in Windev that is using the Puma.dll?

Thanks in advance.

Gianni
Posted on July, 18 2013 - 2:16 PM
Hi. This procedure will make all the job:

PROCEDURE OCR(sFichero is string, sIdioma is string="Spanish")

//PumaLanguage
//English(PumaLanguage) = 0
//German(PumaLanguage) = 1
//French(PumaLanguage) = 2
//Russian(PumaLanguage) = 3
//Swedish(PumaLanguage) = 4
//Spanish(PumaLanguage) = 5
//Italian(PumaLanguage) = 6
//RussianEnglish(PumaLanguage) = 7
//Ukrainian(PumaLanguage) = 8
//Serbian(PumaLanguage) = 9
//Croatian(PumaLanguage) = 10
//Polish(PumaLanguage) = 11
//Danish(PumaLanguage) = 12
//Portuguese(PumaLanguage) = 13
//Dutch(PumaLanguage) = 14
//Digits(PumaLanguage) = 15
//Czech(PumaLanguage) = 19
//Romanian(PumaLanguage) = 20
//Hungarian(PumaLanguage) = 21
//Bulgarian(PumaLanguage) = 22
//Slovenian(PumaLanguage) = 23
//Lettish(PumaLanguage) = 24
//Lithuanian(PumaLanguage) = 25
//Estonian(PumaLanguage) = 26
//Turkish(PumaLanguage) = 27

sRes is string = ""
//sFichero = fexedir() + "\Test.png"
pclPumaPage is object "PumaPage" dynamic = new "PumaPage"(sFichero)
pclPumaPage.FileFormat = PumaFileFormat.TxtAnsi
pclPumaPage.EnableSpeller = True

pclPumaPage.Language = LanguageStrings.GetSupportedLanguage(pclPumaPage,sIdioma)
WHEN EXCEPTION IN
Cadena is UNICODE string = pclPumaPage.RecognizeToString()

DO
Info("La imagen no tiene texto reconocible.")
ELSE
sRes = UnicodeToAnsi(Cadena,charsetAnsi)
END


pclPumaPage.Dispose()

delete pclPumaPage

pclPumaPage = Null

RESULT sRes


A procedure for know the languages allowed (you can load the result in a drop box for example):

PROCEDURE LenguajesOCR()

//PumaLanguage
//English(PumaLanguage) = 0
//German(PumaLanguage) = 1
//French(PumaLanguage) = 2
//Russian(PumaLanguage) = 3
//Swedish(PumaLanguage) = 4
//Spanish(PumaLanguage) = 5
//Italian(PumaLanguage) = 6
//RussianEnglish(PumaLanguage) = 7
//Ukrainian(PumaLanguage) = 8
//Serbian(PumaLanguage) = 9
//Croatian(PumaLanguage) = 10
//Polish(PumaLanguage) = 11
//Danish(PumaLanguage) = 12
//Portuguese(PumaLanguage) = 13
//Dutch(PumaLanguage) = 14
//Digits(PumaLanguage) = 15
//Czech(PumaLanguage) = 19
//Romanian(PumaLanguage) = 20
//Hungarian(PumaLanguage) = 21
//Bulgarian(PumaLanguage) = 22
//Slovenian(PumaLanguage) = 23
//Lettish(PumaLanguage) = 24
//Lithuanian(PumaLanguage) = 25
//Estonian(PumaLanguage) = 26
//Turkish(PumaLanguage) = 27

sRes is string = ""

pclPumaPage is object "PumaPage" dynamic = new "PumaPage"

pclLista is object "System.Collections.Generic.iList" dynamic = LanguageStrings.GetSupportedLanguages(pclPumaPage)

FOR n = 0 TO 100
// if pclLista.get_Item(n) = null then
// break
// END
WHEN EXCEPTION IN
sTr is string = pclLista.get_Item(n)
DO
BREAK
END

IF sRes = "" THEN
sRes = sTr
ELSE
sRes += CR + sTr
END

END

pclPumaPage.Dispose()

delete pclPumaPage

pclPumaPage = Null

RESULT sRes


Rubén
Posted on July, 18 2013 - 5:47 PM
Thank you Ruben..

:-)

Gianni
Posted on July, 19 2013 - 10:23 AM
Belated thanks for all your help.
Noel
Registered member
1 message
Posted on July, 28 2015 - 3:45 PM
hI about Puma.dll, there is a possibility to read an PDF File from scan ?
Posted on October, 15 2015 - 5:22 PM
Hello Ruben,

I am only getting garbage from the resulting string after the call to RecognizeToString. I have tried many documents, I have checked the language setting and it should work according to the documentation. I am running WinDev 20 (32 bits) on a Windows 10 x64 PC. Any idea why the OCR engine outputs garbage (but no error)?


FUNCTION OCR(sNomFichier is a string, sLangue is a string="French")

//PumaLanguage
//English(PumaLanguage) = 0
//French(PumaLanguage) = 2

sRes is a string
sMess is a string
sUConversion is a UNICODE string
nErrCode is an integer
pclOCRException is an object "RecognitionEngineException" dynamic = new "RecognitionEngineException"
pclPumaPage is an object "PumaPage" dynamic = new "PumaPage"(sNomFichier)

pclPumaPage.FileFormat = PumaFileFormat.TxtAnsi
pclPumaPage.EnableSpeller = True
pclPumaPage.Language = LanguageStrings.GetSupportedLanguage(pclPumaPage,sLangue)
//pclPumaPage.set_AutoRotateImage(vrai)
//pclPumaPage.ImproveFax100=vrai
//pclPumaPage.LoadImage(sNomFichier)
//pclPumaPage.RecognizePictures=faux
//pclPumaPage.RecognizeTables=faux
//pclPumaPage.UseTextFormating=faux

WHEN EXCEPTION IN

sUConversion = pclPumaPage.RecognizeToString()

DO

nErrCode=pclOCRException.get_ErrorCode()
sMess=pclOCRException.get_Message()
Error(sMess,nErrCode)
sRes=""

ELSE

sRes = UnicodeVersAnsi(sUConversion,charsetAnsi)

END

pclPumaPage.Dispose()

Delete pclPumaPage

pclPumaPage = Null

RETURN sRes
Posted on January, 08 2017 - 6:20 AM
can someone upload a working example of a project (wd20) or lower of windev with integrated ocr function.
i recently stumbled on windev and the fast result for having limited background information got me intreged in to diving deeper into the W5 code.i want to learn how to incluude external functions (.dll) into windev
i stongly beleive ocr function should be integratid in windev with basic functions

- there is an example of tesseract usage in the latest LST...
wha is lst a magezine where can i download or buy a copy of it?
Posted on January, 09 2017 - 1:18 PM
Hi,

- there is an example of tesseract usage in the latest LST...
wha is lst a magezine where can i download or buy a copy of it?


on pcsoft web site (on the french one, pcsoft.fr) :
http://pcsoft.fr/lst/bdc-old-lst.htm

By the way, according to this page (search engine for examples), the
tesseract example was in LST 93 (last one is 106):
http://pcsoft.fr/st/nouveautes-st.html


Best regards

--
Fabrice Harari
International WinDev, WebDev and WinDev mobile Consulting

Ready for you: WXShowroom.com, WXReplication (open source) and now WXEDM
(open source)

More information on http://www.fabriceharari.com
Posted on March, 25 2021 - 9:27 PM
//Before Windev 26, you can do OCR using Tesseract
//You need add some DLL to your project to do so
//inclure les Assemblages .NET mscorlib et Tesseract


pclPixObjet is Pix dynamic // Instance de la classe
pclSegMode is PageSegMode dynamic
pclTesseEngine is TesseractEngine dynamic // Instance de la classe
pclIEnumPropriétés is System.Collections.IEnumerator dynamic, useful // Enumérateur sur les propriétés
pclIEnumString is System.Collections.IEnumerator dynamic, useful
pclPage is Page dynamic

pclPixObjet = Pix::LoadFromFile(sImagePath)

pclTesseEngine = new TesseractEngine(sPathTessData,"fra",EngineMode.Default,sPathTessConfig)

pclPage = pclTesseEngine.Process (pclPixObjet, "page", pclSegMode)

sUTF16String = pclPage.GetText()


Mon programme ScanToPay .ch contient les DLLs - il tourne aussi avec des anciennnes version de Windev .