|
| Iniciado por antonino.neri, 21,ene. 2015 09:19 - 6 respuestas |
| |
| | | |
|
| |
| Publicado el 21,enero 2015 - 09:19 |
Hi guys,
I have the need to correctly split the following string coming from a csv file:
POO0132,MN2111AB,"02 - 03","167,66","391,44","61,15","405,95","61,97","174,35","73,27","250,93","0,84","0,85","0,35","20,42","1,04","0,21","0,57","0,75","0,7788","0,46","567,4",17,0,0,1,0,"0,34","1310,1","0,31","0,02",0,0,"7,54","5,76","41,48","44,81","0,08","-107,19","-107,4",-108,"23,96","2,52","26,48","8,6","12,5","118833,3","51100,8","0,02","0,01","0,05","0,93","0,58","0,16","0,7","0,03",4,5,"9,51","1,33","3105,1","445,1","118,7","3549,4","47,9"
As we can see the comma ( , ) is used as the separator for the columns but it also is used as the separator for the decimals. The columns where the numeric values are located are identified with the double quote. If I open this file in Microsoft Excel it automatically opens it in the correct way. Is there any built in function in Windev 19 that does the same thing? My need is to extract some specific values from this string but if I use ExtractString using the comma as separator I don't get the correct values (except the first two sub-strings).
I have written a small routine that converts the original string in a way that I can use ExtractString with no problems but I need to cycle on the whole string and this takes a very long time to parse the file because I have almost 100000 rows.
Any alternative way?
Kindest Regards, Antonino |
| |
| |
| | | |
|
| | |
| |
| Publicado el 21,enero 2015 - 20:03 |
Hi Antonino
if the first two columns are the only one without commas, you can do extractstring on commas twice, then extractstring on ", for the rest of the line
Best regards
-- Fabrice Harari International WinDev, WebDev and WinDev mobile Consulting
NEW: WXReplication, your open source replication system is available on my web site!!! WXShowroom.com: Show your projects! More information on http://www.fabriceharari.com
On 1/21/2015 3:19 AM, Antonino wrote:
Hi guys,
I have the need to correctly split the following string coming from a csv file:
POO0132,MN2111AB,"02 - 03","167,66","391,44","61,15","405,95","61,97","174,35","73,27","250,93","0,84","0,85","0,35","20,42","1,04","0,21","0,57","0,75","0,7788","0,46","567,4",17,0,0,1,0,"0,34","1310,1","0,31","0,02",0,0,"7,54","5,76","41,48","44,81","0,08","-107,19","-107,4",-108,"23,96","2,52","26,48","8,6","12,5","118833,3","51100,8","0,02","0,01","0,05","0,93","0,58","0,16","0,7","0,03",4,5,"9,51","1,33","3105,1","445,1","118,7","3549,4","47,9"
As we can see the comma ( , ) is used as the separator for the columns but it also is used as the separator for the decimals. The columns where the numeric values are located are identified with the double quote. If I open this file in Microsoft Excel it automatically opens it in the correct way. Is there any built in function in Windev 19 that does the same thing? My need is to extract some specific values from this string but if I use ExtractString using the comma as separator I don't get the correct values (except the first two sub-strings).
I have written a small routine that converts the original string in a way that I can use ExtractString with no problems but I need to cycle on the whole string and this takes a very long time to parse the file because I have almost 100000 rows.
Any alternative way?
Kindest Regards, Antonino |
| |
| |
| | | |
|
| | |
| |
| Publicado el 22,enero 2015 - 09:36 |
Hi Fabrice,
thanks a lot for yourcare and for your idea. I'll try to implement it in hope it reduces the computational time.
Kindest Regards |
| |
| |
| | | |
|
| | |
| |
| Publicado el 22,enero 2015 - 10:58 |
Hi Antonino
I do not know how you implemented your process, so, just in case...
A mistake that lots of people are doing when it comes to working on lines of a text file is to use freadline to go through the file's lines...
It is orders of magnitude faster to do a floadtext, then fexatractstring for the line (on CR), then process the line
In the rare cases where the file is huge (too big to be loaded at once in memory), it is still worth it to load it by BIG chunks and process till the last line (no CR) then reload the next chunk
Best regards
-- Fabrice Harari International WinDev, WebDev and WinDev mobile Consulting
NEW: WXReplication, your open source replication system is available on my web site!!! WXShowroom.com: Show your projects! More information on http://www.fabriceharari.com
On 1/22/2015 3:36 AM, Antonino wrote:
Hi Fabrice,
thanks a lot for yourcare and for your idea. I'll try to implement it in hope it reduces the computational time.
Kindest Regards |
| |
| |
| | | |
|
| | |
| |
| Publicado el 23,enero 2015 - 10:07 |
Hi Fabrice,
once again thanks for your care.
In fact I am one of those who make the mistake you are talking about. At the moment my routine uses fReadLine to go through the file and read one line at a time. The file I need to parse is very big and I don't know how I can load in memory one chunk after the other. In this specific case another problem that I encountered while trying to implement your idea is that unfortunately in some columns (that have integer values) there are no double quote and the comma is used just to separate the column from the next one. That's the solution I am using to convert each line in a way that I can easily separate sub-strings:
sCurrentLineFormatted = "" sCurrentLine = fReadLine(nFileID)
nSeparatorsCounter = StringCount(sCurrentLine,",",IgnoreCase) + 1 FOR i = 1 TO nSeparatorsCounter sPortion = ExtractString(sCurrentLine,i,",",FromBeginning)
IF bToSkip = False THEN IF Modulo(StringCount(sPortion,"""",IgnoreCase),2) > 0 THEN sCurrentLineFormatted = sCurrentLineFormatted + "|" + sPortion sPortion = ExtractString(sCurrentLine,i+1,",",FromBeginning) sCurrentLineFormatted = sCurrentLineFormatted + "." + sPortion bToSkip = True ELSE IF sCurrentLineFormatted = "" THEN sCurrentLineFormatted = sPortion ELSE sCurrentLineFormatted = sCurrentLineFormatted + "|" + sPortion END END ELSE bToSkip = False END END sCurrentLineFormatted = Replace(sCurrentLineFormatted,",",".",IgnoreCase)
Do you see a fastest way to achieve the same result?
Kindest Regards, Antonino |
| |
| |
| | | |
|
| | |
| |
| Publicado el 23,enero 2015 - 11:45 |
Hi Antonino
believe me when I say that you are barking up the wrong tree... Your speed problem is the freadline, NOT the processing of the line itself...
What do you call very big? 100 000 rows, of 1000 bytes each, that would be 100 000 000 bytes, so 100 MB...
You can load 100 MB with a floadtext in a string without any problem, then a first loop to extract lines, then a second with your current line processing
You'll be amazed at the speed difference
Best regards
-- Fabrice Harari International WinDev, WebDev and WinDev mobile Consulting
NEW: WXReplication, your open source replication system is available on my web site!!! WXShowroom.com: Show your projects! More information on http://www.fabriceharari.com
On 1/23/2015 4:07 AM, Antonino wrote:
Hi Fabrice, once again thanks for your care. In fact I am one of those who make the mistake you are talking about. At the moment my routine uses fReadLine to go through the file and read one line at a time. The file I need to parse is very big and I don't know how I can load in memory one chunk after the other. In this specific case another problem that I encountered while trying to implement your idea is that unfortunately in some columns (that have integer values) there are no double quote and the comma is used just to separate the column from the next one. That's the solution I am using to convert each line in a way that I can easily separate sub-strings: sCurrentLineFormatted = "" sCurrentLine = fReadLine(nFileID)
nSeparatorsCounter = StringCount(sCurrentLine,",",IgnoreCase) + 1 FOR i = 1 TO nSeparatorsCounter sPortion = ExtractString(sCurrentLine,i,",",FromBeginning)
IF bToSkip = False THEN IF modulo(StringCount(sPortion,"""",IgnoreCase),2) > 0 THEN sCurrentLineFormatted = sCurrentLineFormatted + "|" + sPortion sPortion = ExtractString(sCurrentLine,i+1,",",FromBeginning) sCurrentLineFormatted = sCurrentLineFormatted + "." + sPortion bToSkip = True ELSE IF sCurrentLineFormatted = "" THEN sCurrentLineFormatted = sPortion ELSE sCurrentLineFormatted = sCurrentLineFormatted + "|" + sPortion END END ELSE bToSkip = False END END sCurrentLineFormatted = Replace(sCurrentLineFormatted,",",".",IgnoreCase)
Do you see a fastest way to achieve the same result? Kindest Regards, Antonino |
| |
| |
| | | |
|
| | |
| |
| Publicado el 23,enero 2015 - 15:55 |
Hi Fabrice,
actually the file I need to parse has 91000 rows and its size is, more or less, 45 MB. So according your post I may use fLoadText to load it in memory in one chunck. I'll give that a try and let you know.
Thanks again for your support.
Kindest Regards, Antonino |
| |
| |
| | | |
|
| | | | |
| | |
|