Tesseract / OCRtools .net or other OCR - WINDEV 2024 - Developer forums

PROFESSIONAL NEWSGROUPS
WINDEV, WEBDEV and WINDEV Mobile

Home | Recent messages | Connect | | English

Home → WINDEV 2024 → Tesseract / OCRtools .net or other OCR

Tesseract / OCRtools .net or other OCR

Started by Noel Tanti, Jul., 15 2013 2:51 PM - 12 replies

Noel Tanti

Posted on July, 15 2013 - 2:51 PM

Hi ,

I am looking to integrate an OCR into my windev app and basically the options are two

tesseract for free. has anybody implemented any of these in windev?

ocrtools . Is it possible to integrate a .net compnent (ocrtools) to windev?

any help appreciated

Regards
Noel

Report

You must be connected to evaluate this message

Ruben Sanchez Peña

Posted on July, 15 2013 - 8:24 PM

Hi. For simple jobs you can look at http://pumanet.codeplex.com/

Very free, simple to integrate with Windev.

Rubén

Report

Yogi Yang

Posted on July, 16 2013 - 8:45 AM

I have tried Tesseract OCR but I had used the command line version as it is easier to use.

i would suggest you to shell from your program and execute the command line version with necessary parameters.

Report

Fabrice Harari

Posted on July, 16 2013 - 1:50 PM

Hi Noel

FYI:
- there is an example of tesseract usage in the latest LST...
- I know, it's in french, but you can translate the whole code with 3 clicks in your UI, and the article itself shouldn't be THAT important
- also, you can buy THIS installment only from the french web site, in the LST part

so if you want to save time...

Best regards

Report

Gianni Spano

Posted on July, 16 2013 - 4:15 PM

@Ruben Sanchez

Please, can you share some example written in Windev that is using the Puma.dll?

Thanks in advance.

Gianni

Report

Ruben Sanchez Peña

Posted on July, 18 2013 - 2:16 PM

Hi. This procedure will make all the job:

PROCEDURE OCR(sFichero is string, sIdioma is string="Spanish")

//PumaLanguage
//English(PumaLanguage) = 0
//German(PumaLanguage) = 1
//French(PumaLanguage) = 2
//Russian(PumaLanguage) = 3
//Swedish(PumaLanguage) = 4
//Spanish(PumaLanguage) = 5
//Italian(PumaLanguage) = 6
//RussianEnglish(PumaLanguage) = 7
//Ukrainian(PumaLanguage) = 8
//Serbian(PumaLanguage) = 9
//Croatian(PumaLanguage) = 10
//Polish(PumaLanguage) = 11
//Danish(PumaLanguage) = 12
//Portuguese(PumaLanguage) = 13
//Dutch(PumaLanguage) = 14
//Digits(PumaLanguage) = 15
//Czech(PumaLanguage) = 19
//Romanian(PumaLanguage) = 20
//Hungarian(PumaLanguage) = 21
//Bulgarian(PumaLanguage) = 22
//Slovenian(PumaLanguage) = 23
//Lettish(PumaLanguage) = 24
//Lithuanian(PumaLanguage) = 25
//Estonian(PumaLanguage) = 26
//Turkish(PumaLanguage) = 27

sRes is string = ""
//sFichero = fexedir() + "\Test.png"
pclPumaPage is object "PumaPage" dynamic = new "PumaPage"(sFichero)
pclPumaPage.FileFormat = PumaFileFormat.TxtAnsi
pclPumaPage.EnableSpeller = True

pclPumaPage.Language = LanguageStrings.GetSupportedLanguage(pclPumaPage,sIdioma)
WHEN EXCEPTION IN
Cadena is UNICODE string = pclPumaPage.RecognizeToString()

DO
Info("La imagen no tiene texto reconocible.")
ELSE
sRes = UnicodeToAnsi(Cadena,charsetAnsi)
END

pclPumaPage.Dispose()

delete pclPumaPage

pclPumaPage = Null

RESULT sRes

A procedure for know the languages allowed (you can load the result in a drop box for example):

PROCEDURE LenguajesOCR()

//PumaLanguage
//English(PumaLanguage) = 0
//German(PumaLanguage) = 1
//French(PumaLanguage) = 2
//Russian(PumaLanguage) = 3
//Swedish(PumaLanguage) = 4
//Spanish(PumaLanguage) = 5
//Italian(PumaLanguage) = 6
//RussianEnglish(PumaLanguage) = 7
//Ukrainian(PumaLanguage) = 8
//Serbian(PumaLanguage) = 9
//Croatian(PumaLanguage) = 10
//Polish(PumaLanguage) = 11
//Danish(PumaLanguage) = 12
//Portuguese(PumaLanguage) = 13
//Dutch(PumaLanguage) = 14
//Digits(PumaLanguage) = 15
//Czech(PumaLanguage) = 19
//Romanian(PumaLanguage) = 20
//Hungarian(PumaLanguage) = 21
//Bulgarian(PumaLanguage) = 22
//Slovenian(PumaLanguage) = 23
//Lettish(PumaLanguage) = 24
//Lithuanian(PumaLanguage) = 25
//Estonian(PumaLanguage) = 26
//Turkish(PumaLanguage) = 27

sRes is string = ""

pclPumaPage is object "PumaPage" dynamic = new "PumaPage"

pclLista is object "System.Collections.Generic.iList" dynamic = LanguageStrings.GetSupportedLanguages(pclPumaPage)

FOR n = 0 TO 100
// if pclLista.get_Item(n) = null then
// break
// END
WHEN EXCEPTION IN
sTr is string = pclLista.get_Item(n)
DO
BREAK
END

IF sRes = "" THEN
sRes = sTr
ELSE
sRes += CR + sTr
END

END

pclPumaPage.Dispose()

delete pclPumaPage

pclPumaPage = Null

RESULT sRes

Rubén

Report

Gianni Spano

Posted on July, 18 2013 - 5:47 PM

Thank you Ruben..

Gianni

Report

Noel Tanti

Posted on July, 19 2013 - 10:23 AM

Belated thanks for all your help.
Noel

Report

PAOLOMARENGONI

Registered member
1 message

Posted on July, 28 2015 - 3:45 PM

hI about Puma.dll, there is a possibility to read an PDF File from scan ?

Report

Jose Dubois

#10

Posted on October, 15 2015 - 5:22 PM

Hello Ruben,

I am only getting garbage from the resulting string after the call to RecognizeToString. I have tried many documents, I have checked the language setting and it should work according to the documentation. I am running WinDev 20 (32 bits) on a Windows 10 x64 PC. Any idea why the OCR engine outputs garbage (but no error)?

FUNCTION OCR(sNomFichier is a string, sLangue is a string="French")

//PumaLanguage
//English(PumaLanguage) = 0
//French(PumaLanguage) = 2

sRes is a string
sMess is a string
sUConversion is a UNICODE string
nErrCode is an integer
pclOCRException is an object "RecognitionEngineException" dynamic = new "RecognitionEngineException"
pclPumaPage is an object "PumaPage" dynamic = new "PumaPage"(sNomFichier)

pclPumaPage.FileFormat = PumaFileFormat.TxtAnsi
pclPumaPage.EnableSpeller = True
pclPumaPage.Language = LanguageStrings.GetSupportedLanguage(pclPumaPage,sLangue)
//pclPumaPage.set_AutoRotateImage(vrai)
//pclPumaPage.ImproveFax100=vrai
//pclPumaPage.LoadImage(sNomFichier)
//pclPumaPage.RecognizePictures=faux
//pclPumaPage.RecognizeTables=faux
//pclPumaPage.UseTextFormating=faux

WHEN EXCEPTION IN

sUConversion = pclPumaPage.RecognizeToString()

DO

nErrCode=pclOCRException.get_ErrorCode()
sMess=pclOCRException.get_Message()
Error(sMess,nErrCode)
sRes=""

ELSE

sRes = UnicodeVersAnsi(sUConversion,charsetAnsi)

END

pclPumaPage.Dispose()

Delete pclPumaPage

pclPumaPage = Null

RETURN sRes

Report

6r6

#11

Posted on January, 08 2017 - 6:20 AM

can someone upload a working example of a project (wd20) or lower of windev with integrated ocr function.
i recently stumbled on windev and the fast result for having limited background information got me intreged in to diving deeper into the W5 code.i want to learn how to incluude external functions (.dll) into windev
i stongly beleive ocr function should be integratid in windev with basic functions

- there is an example of tesseract usage in the latest LST...
wha is lst a magezine where can i download or buy a copy of it?

Report

Fabrice Harari

#12

Posted on January, 09 2017 - 1:18 PM

Hi,

- there is an example of tesseract usage in the latest LST...
wha is lst a magezine where can i download or buy a copy of it?

on pcsoft web site (on the french one, pcsoft.fr) :
http://pcsoft.fr/lst/bdc-old-lst.htm

By the way, according to this page (search engine for examples), the
tesseract example was in LST 93 (last one is 106):
http://pcsoft.fr/st/nouveautes-st.html

Best regards

--
Fabrice Harari
International WinDev, WebDev and WinDev mobile Consulting

Ready for you: WXShowroom.com, WXReplication (open source) and now WXEDM
(open source)

More information on http://www.fabriceharari.com

Report

Réginald Dupuis

#13

Posted on March, 25 2021 - 9:27 PM

//Before Windev 26, you can do OCR using Tesseract
//You need add some DLL to your project to do so
//inclure les Assemblages .NET mscorlib et Tesseract

pclPixObjet is Pix dynamic // Instance de la classe
pclSegMode is PageSegMode dynamic
pclTesseEngine is TesseractEngine dynamic // Instance de la classe
pclIEnumPropriétés is System.Collections.IEnumerator dynamic, useful // Enumérateur sur les propriétés
pclIEnumString is System.Collections.IEnumerator dynamic, useful
pclPage is Page dynamic

pclPixObjet = Pix::LoadFromFile(sImagePath)

pclTesseEngine = new TesseractEngine(sPathTessData,"fra",EngineMode.Default,sPathTessConfig)

pclPage = pclTesseEngine.Process (pclPixObjet, "page", pclSegMode)

sUTF16String = pclPage.GetText()

Mon programme ScanToPay .ch contient les DLLs - il tourne aussi avec des anciennnes version de Windev .

Report

→ Go back to WINDEV 2024