PC SOFT

FOROS PROFESIONALES
WINDEVWEBDEV y WINDEV Mobile

Inicio → WINDEV 2024 → Split a string
Split a string
Iniciado por antonino.neri, 21,ene. 2015 09:19 - 6 respuestas
Publicado el 21,enero 2015 - 09:19
Hi guys,

I have the need to correctly split the following string coming from a csv file:


POO0132,MN2111AB,"02 - 03","167,66","391,44","61,15","405,95","61,97","174,35","73,27","250,93","0,84","0,85","0,35","20,42","1,04","0,21","0,57","0,75","0,7788","0,46","567,4",17,0,0,1,0,"0,34","1310,1","0,31","0,02",0,0,"7,54","5,76","41,48","44,81","0,08","-107,19","-107,4",-108,"23,96","2,52","26,48","8,6","12,5","118833,3","51100,8","0,02","0,01","0,05","0,93","0,58","0,16","0,7","0,03",4,5,"9,51","1,33","3105,1","445,1","118,7","3549,4","47,9"


As we can see the comma ( , ) is used as the separator for the columns but it also is used as the separator for the decimals. The columns where the numeric values are located are identified with the double quote.
If I open this file in Microsoft Excel it automatically opens it in the correct way. Is there any built in function in Windev 19 that does the same thing? My need is to extract some specific values from this string but if I use ExtractString using the comma as separator I don't get the correct values (except the first two sub-strings).

I have written a small routine that converts the original string in a way that I can use ExtractString with no problems but I need to cycle on the whole string and this takes a very long time to parse the file because I have almost 100000 rows.

Any alternative way?


Kindest Regards,
Antonino
Publicado el 21,enero 2015 - 20:03
Hi Antonino

if the first two columns are the only one without commas, you can do
extractstring on commas twice, then extractstring on ", for the rest of
the line

Best regards


--
Fabrice Harari
International WinDev, WebDev and WinDev mobile Consulting

NEW: WXReplication, your open source replication system is available on
my web site!!!
WXShowroom.com: Show your projects!
More information on http://www.fabriceharari.com


On 1/21/2015 3:19 AM, Antonino wrote:
Hi guys,

I have the need to correctly split the following string coming from a
csv file:


POO0132,MN2111AB,"02 -
03","167,66","391,44","61,15","405,95","61,97","174,35","73,27","250,93","0,84","0,85","0,35","20,42","1,04","0,21","0,57","0,75","0,7788","0,46","567,4",17,0,0,1,0,"0,34","1310,1","0,31","0,02",0,0,"7,54","5,76","41,48","44,81","0,08","-107,19","-107,4",-108,"23,96","2,52","26,48","8,6","12,5","118833,3","51100,8","0,02","0,01","0,05","0,93","0,58","0,16","0,7","0,03",4,5,"9,51","1,33","3105,1","445,1","118,7","3549,4","47,9"



As we can see the comma ( , ) is used as the separator for the columns
but it also is used as the separator for the decimals. The columns where
the numeric values are located are identified with the double quote.
If I open this file in Microsoft Excel it automatically opens it in the
correct way. Is there any built in function in Windev 19 that does the
same thing? My need is to extract some specific values from this string
but if I use ExtractString using the comma as separator I don't get the
correct values (except the first two sub-strings).

I have written a small routine that converts the original string in a
way that I can use ExtractString with no problems but I need to cycle on
the whole string and this takes a very long time to parse the file
because I have almost 100000 rows.

Any alternative way?


Kindest Regards,
Antonino
Publicado el 22,enero 2015 - 09:36
Hi Fabrice,

thanks a lot for yourcare and for your idea. I'll try to implement it in hope it reduces the computational time.

Kindest Regards
Publicado el 22,enero 2015 - 10:58
Hi Antonino

I do not know how you implemented your process, so, just in case...

A mistake that lots of people are doing when it comes to working on
lines of a text file is to use freadline to go through the file's lines...

It is orders of magnitude faster to do a floadtext, then fexatractstring
for the line (on CR), then process the line

In the rare cases where the file is huge (too big to be loaded at once
in memory), it is still worth it to load it by BIG chunks and process
till the last line (no CR) then reload the next chunk

Best regards

--
Fabrice Harari
International WinDev, WebDev and WinDev mobile Consulting

NEW: WXReplication, your open source replication system is available on
my web site!!!
WXShowroom.com: Show your projects!
More information on http://www.fabriceharari.com


On 1/22/2015 3:36 AM, Antonino wrote:
Hi Fabrice,

thanks a lot for yourcare and for your idea. I'll try to implement it in
hope it reduces the computational time.

Kindest Regards
Publicado el 23,enero 2015 - 10:07
Hi Fabrice,

once again thanks for your care.

In fact I am one of those who make the mistake you are talking about.
At the moment my routine uses fReadLine to go through the file and read one line at a time.
The file I need to parse is very big and I don't know how I can load in memory one chunk after the other.
In this specific case another problem that I encountered while trying to implement your idea is that unfortunately in some columns (that have integer values) there are no double quote and the comma is used just to separate the column from the next one. That's the solution I am using to convert each line in a way that I can easily separate sub-strings:


// Read the next lines of the file
sCurrentLineFormatted = ""
sCurrentLine = fReadLine(nFileID)

nSeparatorsCounter = StringCount(sCurrentLine,",",IgnoreCase) + 1
FOR i = 1 TO nSeparatorsCounter
sPortion = ExtractString(sCurrentLine,i,",",FromBeginning)

IF bToSkip = False THEN
IF Modulo(StringCount(sPortion,"""",IgnoreCase),2) > 0 THEN
sCurrentLineFormatted = sCurrentLineFormatted + "|" + sPortion
sPortion = ExtractString(sCurrentLine,i+1,",",FromBeginning)
sCurrentLineFormatted = sCurrentLineFormatted + "." + sPortion
bToSkip = True
ELSE
IF sCurrentLineFormatted = "" THEN
sCurrentLineFormatted = sPortion
ELSE
sCurrentLineFormatted = sCurrentLineFormatted + "|" + sPortion
END
END
ELSE
bToSkip = False
END
END
sCurrentLineFormatted = Replace(sCurrentLineFormatted,",",".",IgnoreCase)



Do you see a fastest way to achieve the same result?

Kindest Regards,
Antonino
Publicado el 23,enero 2015 - 11:45
Hi Antonino

believe me when I say that you are barking up the wrong tree... Your
speed problem is the freadline, NOT the processing of the line itself...

What do you call very big?
100 000 rows, of 1000 bytes each, that would be 100 000 000 bytes, so
100 MB...

You can load 100 MB with a floadtext in a string without any problem,
then a first loop to extract lines, then a second with your current line
processing

You'll be amazed at the speed difference

Best regards

--
Fabrice Harari
International WinDev, WebDev and WinDev mobile Consulting

NEW: WXReplication, your open source replication system is available on
my web site!!!
WXShowroom.com: Show your projects!
More information on http://www.fabriceharari.com


On 1/23/2015 4:07 AM, Antonino wrote:
Hi Fabrice,

once again thanks for your care.

In fact I am one of those who make the mistake you are talking about.
At the moment my routine uses fReadLine to go through the file and read
one line at a time.
The file I need to parse is very big and I don't know how I can load in
memory one chunk after the other.
In this specific case another problem that I encountered while trying to
implement your idea is that unfortunately in some columns (that have
integer values) there are no double quote and the comma is used just to
separate the column from the next one. That's the solution I am using to
convert each line in a way that I can easily separate sub-strings:


// Read the next lines of the file
sCurrentLineFormatted = ""
sCurrentLine = fReadLine(nFileID)

nSeparatorsCounter = StringCount(sCurrentLine,",",IgnoreCase) + 1
FOR i = 1 TO nSeparatorsCounter
sPortion = ExtractString(sCurrentLine,i,",",FromBeginning)

IF bToSkip = False THEN
IF modulo(StringCount(sPortion,"""",IgnoreCase),2) > 0 THEN
sCurrentLineFormatted = sCurrentLineFormatted + "|" + sPortion
sPortion = ExtractString(sCurrentLine,i+1,",",FromBeginning)
sCurrentLineFormatted = sCurrentLineFormatted + "." + sPortion
bToSkip = True
ELSE
IF sCurrentLineFormatted = "" THEN
sCurrentLineFormatted = sPortion
ELSE
sCurrentLineFormatted = sCurrentLineFormatted + "|" +
sPortion
END END
ELSE
bToSkip = False
END
END
sCurrentLineFormatted = Replace(sCurrentLineFormatted,",",".",IgnoreCase)



Do you see a fastest way to achieve the same result?

Kindest Regards,
Antonino
Publicado el 23,enero 2015 - 15:55
Hi Fabrice,

actually the file I need to parse has 91000 rows and its size is, more or less, 45 MB. So according your post I may use fLoadText to load it in memory in one chunck. I'll give that a try and let you know.

Thanks again for your support.


Kindest Regards,
Antonino