Tesseract / OCRtools .net or other OCR - WINDEV 2024 - Forums développeurs

FORUMS PROFESSIONNELS
WINDEV, WEBDEV et WINDEV Mobile

Accueil | Messages récents | Connexion | | Français

Accueil → WINDEV 2024 → Tesseract / OCRtools .net or other OCR

Tesseract / OCRtools .net or other OCR

Débuté par Noel Tanti, 15 juil. 2013 14:51 - 12 réponses

Noel Tanti

Posté le 15 juillet 2013 - 14:51

Hi ,

I am looking to integrate an OCR into my windev app and basically the options are two

tesseract for free. has anybody implemented any of these in windev?

ocrtools . Is it possible to integrate a .net compnent (ocrtools) to windev?

any help appreciated

Regards
Noel

Signaler

Vous devez être connecté pour évaluer ce message

Ruben Sanchez Peña

Posté le 15 juillet 2013 - 20:24

Hi. For simple jobs you can look at http://pumanet.codeplex.com/

Very free, simple to integrate with Windev.

Rubén

Signaler

Yogi Yang

Posté le 16 juillet 2013 - 08:45

I have tried Tesseract OCR but I had used the command line version as it is easier to use.

i would suggest you to shell from your program and execute the command line version with necessary parameters.

Signaler

Fabrice Harari

Posté le 16 juillet 2013 - 13:50

Hi Noel

FYI:
- there is an example of tesseract usage in the latest LST...
- I know, it's in french, but you can translate the whole code with 3 clicks in your UI, and the article itself shouldn't be THAT important
- also, you can buy THIS installment only from the french web site, in the LST part

so if you want to save time...

Best regards

Signaler

Gianni Spano

Posté le 16 juillet 2013 - 16:15

@Ruben Sanchez

Please, can you share some example written in Windev that is using the Puma.dll?

Thanks in advance.

Gianni

Signaler

Ruben Sanchez Peña

Posté le 18 juillet 2013 - 14:16

Hi. This procedure will make all the job:

PROCEDURE OCR(sFichero is string, sIdioma is string="Spanish")

//PumaLanguage
//English(PumaLanguage) = 0
//German(PumaLanguage) = 1
//French(PumaLanguage) = 2
//Russian(PumaLanguage) = 3
//Swedish(PumaLanguage) = 4
//Spanish(PumaLanguage) = 5
//Italian(PumaLanguage) = 6
//RussianEnglish(PumaLanguage) = 7
//Ukrainian(PumaLanguage) = 8
//Serbian(PumaLanguage) = 9
//Croatian(PumaLanguage) = 10
//Polish(PumaLanguage) = 11
//Danish(PumaLanguage) = 12
//Portuguese(PumaLanguage) = 13
//Dutch(PumaLanguage) = 14
//Digits(PumaLanguage) = 15
//Czech(PumaLanguage) = 19
//Romanian(PumaLanguage) = 20
//Hungarian(PumaLanguage) = 21
//Bulgarian(PumaLanguage) = 22
//Slovenian(PumaLanguage) = 23
//Lettish(PumaLanguage) = 24
//Lithuanian(PumaLanguage) = 25
//Estonian(PumaLanguage) = 26
//Turkish(PumaLanguage) = 27

sRes is string = ""
//sFichero = fexedir() + "\Test.png"
pclPumaPage is object "PumaPage" dynamic = new "PumaPage"(sFichero)
pclPumaPage.FileFormat = PumaFileFormat.TxtAnsi
pclPumaPage.EnableSpeller = True

pclPumaPage.Language = LanguageStrings.GetSupportedLanguage(pclPumaPage,sIdioma)
WHEN EXCEPTION IN
Cadena is UNICODE string = pclPumaPage.RecognizeToString()

DO
Info("La imagen no tiene texto reconocible.")
ELSE
sRes = UnicodeToAnsi(Cadena,charsetAnsi)
END

pclPumaPage.Dispose()

delete pclPumaPage

pclPumaPage = Null

RESULT sRes

A procedure for know the languages allowed (you can load the result in a drop box for example):

PROCEDURE LenguajesOCR()

//PumaLanguage
//English(PumaLanguage) = 0
//German(PumaLanguage) = 1
//French(PumaLanguage) = 2
//Russian(PumaLanguage) = 3
//Swedish(PumaLanguage) = 4
//Spanish(PumaLanguage) = 5
//Italian(PumaLanguage) = 6
//RussianEnglish(PumaLanguage) = 7
//Ukrainian(PumaLanguage) = 8
//Serbian(PumaLanguage) = 9
//Croatian(PumaLanguage) = 10
//Polish(PumaLanguage) = 11
//Danish(PumaLanguage) = 12
//Portuguese(PumaLanguage) = 13
//Dutch(PumaLanguage) = 14
//Digits(PumaLanguage) = 15
//Czech(PumaLanguage) = 19
//Romanian(PumaLanguage) = 20
//Hungarian(PumaLanguage) = 21
//Bulgarian(PumaLanguage) = 22
//Slovenian(PumaLanguage) = 23
//Lettish(PumaLanguage) = 24
//Lithuanian(PumaLanguage) = 25
//Estonian(PumaLanguage) = 26
//Turkish(PumaLanguage) = 27

sRes is string = ""

pclPumaPage is object "PumaPage" dynamic = new "PumaPage"

pclLista is object "System.Collections.Generic.iList" dynamic = LanguageStrings.GetSupportedLanguages(pclPumaPage)

FOR n = 0 TO 100
// if pclLista.get_Item(n) = null then
// break
// END
WHEN EXCEPTION IN
sTr is string = pclLista.get_Item(n)
DO
BREAK
END

IF sRes = "" THEN
sRes = sTr
ELSE
sRes += CR + sTr
END

END

pclPumaPage.Dispose()

delete pclPumaPage

pclPumaPage = Null

RESULT sRes

Rubén

Signaler

Gianni Spano

Posté le 18 juillet 2013 - 17:47

Thank you Ruben..

Gianni

Signaler

Noel Tanti

Posté le 19 juillet 2013 - 10:23

Belated thanks for all your help.
Noel

Signaler

PAOLOMARENGONI

Membre enregistré
1 message

Posté le 28 juillet 2015 - 15:45

hI about Puma.dll, there is a possibility to read an PDF File from scan ?

Signaler

Jose Dubois

#10

Posté le 15 octobre 2015 - 17:22

Hello Ruben,

I am only getting garbage from the resulting string after the call to RecognizeToString. I have tried many documents, I have checked the language setting and it should work according to the documentation. I am running WinDev 20 (32 bits) on a Windows 10 x64 PC. Any idea why the OCR engine outputs garbage (but no error)?

FUNCTION OCR(sNomFichier is a string, sLangue is a string="French")

//PumaLanguage
//English(PumaLanguage) = 0
//French(PumaLanguage) = 2

sRes is a string
sMess is a string
sUConversion is a UNICODE string
nErrCode is an integer
pclOCRException is an object "RecognitionEngineException" dynamic = new "RecognitionEngineException"
pclPumaPage is an object "PumaPage" dynamic = new "PumaPage"(sNomFichier)

pclPumaPage.FileFormat = PumaFileFormat.TxtAnsi
pclPumaPage.EnableSpeller = True
pclPumaPage.Language = LanguageStrings.GetSupportedLanguage(pclPumaPage,sLangue)
//pclPumaPage.set_AutoRotateImage(vrai)
//pclPumaPage.ImproveFax100=vrai
//pclPumaPage.LoadImage(sNomFichier)
//pclPumaPage.RecognizePictures=faux
//pclPumaPage.RecognizeTables=faux
//pclPumaPage.UseTextFormating=faux

WHEN EXCEPTION IN

sUConversion = pclPumaPage.RecognizeToString()

DO

nErrCode=pclOCRException.get_ErrorCode()
sMess=pclOCRException.get_Message()
Error(sMess,nErrCode)
sRes=""

ELSE

sRes = UnicodeVersAnsi(sUConversion,charsetAnsi)

END

pclPumaPage.Dispose()

Delete pclPumaPage

pclPumaPage = Null

RETURN sRes

Signaler

6r6

#11

Posté le 08 janvier 2017 - 06:20

can someone upload a working example of a project (wd20) or lower of windev with integrated ocr function.
i recently stumbled on windev and the fast result for having limited background information got me intreged in to diving deeper into the W5 code.i want to learn how to incluude external functions (.dll) into windev
i stongly beleive ocr function should be integratid in windev with basic functions

- there is an example of tesseract usage in the latest LST...
wha is lst a magezine where can i download or buy a copy of it?

Signaler

Fabrice Harari

#12

Posté le 09 janvier 2017 - 13:18

Hi,

- there is an example of tesseract usage in the latest LST...
wha is lst a magezine where can i download or buy a copy of it?

on pcsoft web site (on the french one, pcsoft.fr) :
http://pcsoft.fr/lst/bdc-old-lst.htm

By the way, according to this page (search engine for examples), the
tesseract example was in LST 93 (last one is 106):
http://pcsoft.fr/st/nouveautes-st.html

Best regards

--
Fabrice Harari
International WinDev, WebDev and WinDev mobile Consulting

Ready for you: WXShowroom.com, WXReplication (open source) and now WXEDM
(open source)

More information on http://www.fabriceharari.com

Signaler

Réginald Dupuis

#13

Posté le 25 mars 2021 - 21:27

//Before Windev 26, you can do OCR using Tesseract
//You need add some DLL to your project to do so
//inclure les Assemblages .NET mscorlib et Tesseract

pclPixObjet is Pix dynamic // Instance de la classe
pclSegMode is PageSegMode dynamic
pclTesseEngine is TesseractEngine dynamic // Instance de la classe
pclIEnumPropriétés is System.Collections.IEnumerator dynamic, useful // Enumérateur sur les propriétés
pclIEnumString is System.Collections.IEnumerator dynamic, useful
pclPage is Page dynamic

pclPixObjet = Pix::LoadFromFile(sImagePath)

pclTesseEngine = new TesseractEngine(sPathTessData,"fra",EngineMode.Default,sPathTessConfig)

pclPage = pclTesseEngine.Process (pclPixObjet, "page", pclSegMode)

sUTF16String = pclPage.GetText()

Mon programme ScanToPay .ch contient les DLLs - il tourne aussi avec des anciennnes version de Windev .

Signaler

→ Revenir à WINDEV 2024