Discussion:
read CSV data consistently from string or file
Browning, Robert S IV ERDC-RDE-GSL-MS CIV
2018-10-12 20:10:11 UTC
Permalink
I am needing to read in a text file that contains numerical data separated by keywords. Because of the keywords, I need to read each line in as text first and test if it’s a keyword, and then call the appropriate subroutine based on the keyword value. Within the subroutines I also need to read each line as text first so I can stop the subroutine when the next keyword is found.
 
The numerical data can be in either fixed format or comma-separated-value (CSV) format. In the CSV data, a zero could be represented by a pair of commas with no number between them.
 
The issue I have encountered is that the read statement appears to behave differently when it is reading from a character string than when it is reading directly from the input file. If I were able to read directly from the input file, then a simple formatted read statement does everything I need. But if I am reading from a string, then things get a bit more complicated.
 
I have constructed an example program that demonstrates the inconsistencies. I have also run this program using Intel’s ifort compiler and it works like I expected, which makes me think that this is a bug in gfortran. 
 
I have copied the text of my code (read_csv.f90), an input file (input.txt), and my results from execution (results.txt) to the bottom of this email.
 
My code consists of three test cases:
 
Case 1: Reads values from the input file directly into the assigned variables. This works as I expected and is the behavior desired. However, in my actual program I can’t read directly from the input file because I need to first check for keywords.
 
Case 2: Reads a line into a character string variable, and then attempts to read from the string into the numerical variables in the same fashion as Case 1. This approach completely fails for all CSV data.
 
Case 3: Same as Case 2 but this approach explicitly calls a list-directed read command on the string if iostat /= 0. However, this still produces different results than Case 1. Here, instead of missing values being read in as zero, they are completely skipped and the initial value of the variable is unchanged. I can see value in both options, but it seems like the read statement should do the same thing regardless of where it’s reading from.
 
Here is the content contained in the attached files:
 
!main program => read_csv.f90 
program read_csv
implicit none
integer, parameter :: dbl = selected_real_kind(p=14, r=99)
integer :: iost=0
integer :: I1, I2, I3
real(dbl) :: R1, R2, R3
character(80) :: text
character(90) :: line="------------------------------------------------------------------------------------------"
 
!Character formats
1  format (A)
2  format (/A/)
3  format (A/)
4  format (/A)
!Combined format
10 format (I8,3ES16.8,2I8)
 
open (unit=1, status="old", file="input.txt", action="read")
 
print 4, line
print 3, "CASE 1"
print 3, "Read numbers from file into int/real variables:"
print 1, "Results:"
print 1, "      I1              R1              R2              R3      I2      I3"
read_loop_1: do
  I1=-99;       I2=-99;       I3=-99
  R1=-99._DBL;  R2=-99._DBL;  R3=-99._DBL
  read(1,10,iostat=iost) I1, R1, R2, R3, I2, I3
  if ( iost /= 0 ) then
    if ( iost < 0 ) then
      print 4, "End Of File Reached"
      print 3, line
    else if ( iost > 0 ) then
      print *, "ERROR reading in CASE 1:  iost = ", iost
    end if
    exit read_loop_1
  else
    print 10, I1, R1, R2, R3, I2, I3
  end if
end do read_loop_1
 
rewind (1)
 
print 4, line
print 3, "CASE 2"
print 3, "Read numbers from file into string and then read from string into int/real variables:"
print 1, "Results:"
print 1, "      I1              R1              R2              R3      I2      I3"
read_loop_2: do
  I1=-99;       I2=-99;       I3=-99
  R1=-99._DBL;  R2=-99._DBL;  R3=-99._DBL
  read(1,1,iostat=iost) text
  if ( iost /= 0 ) then
    if ( iost < 0 ) then
      print 4, "End Of File Reached"
      print 3, line
    else if ( iost > 0 ) then
      print *, "ERROR reading in CASE 2:  iost = ", iost
    end if
    exit read_loop_2
  else
    read(text,10,iostat=iost) I1, R1, R2, R3, I2, I3
    print 10, I1, R1, R2, R3, I2, I3
  end if
end do read_loop_2
 
rewind (1)
 
print 4, line
print 3, "CASE 3"
print 3, "Read numbers from file into string and then read from string into int/real variables:"
print 3, "Add condition that if formatted read fails, then do list-directed read using *"
print 1, "Results:"
print 1, "      I1              R1              R2              R3      I2      I3"
read_loop_3: do
  I1=-99;       I2=-99;       I3=-99
  R1=-99._DBL;  R2=-99._DBL;  R3=-99._DBL
  read(1,1,iostat=iost) text
  if ( iost /= 0 ) then
    if ( iost < 0 ) then
      print 4, "End Of File Reached"
      print 3, line
    else if ( iost > 0 ) then
      print *, "ERROR reading in CASE 3:  iost = ", iost
    end if
    exit read_loop_3
  else
    read(text,10,iostat=iost) I1, R1, R2, R3, I2, I3
    if ( iost /= 0 ) then
      read(text,*,iostat=iost) I1, R1, R2, R3, I2, I3
    end if
    print 10, I1, R1, R2, R3, I2, I3
  end if
end do read_loop_3
 
print 2, "NORMAL TERMINATION"
 
close (1)
end program read_csv
 
!input data in input.txt
       1  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
       2  1.00000000E+00                  3.00000000E+00               7
          1.00000000E+00                  3.00000000E+00
101,1.,2.,3.,7,7
102,1.,,3.,,7
,1.,,3.,,
 
!results using gfortran
browning$ gfortran read_csv.f90
browning$ ./a.out
 
------------------------------------------------------------------------------------------
CASE 1
 
Read numbers from file into int/real variables:
 
Results:
      I1              R1              R2              R3      I2      I3
       1  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
       2  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
     101  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
     102  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
 
End Of File Reached
------------------------------------------------------------------------------------------
 
 
------------------------------------------------------------------------------------------
CASE 2
 
Read numbers from file into string and then read from string into int/real variables:
 
Results:
      I1              R1              R2              R3      I2      I3
       1  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
       2  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
     -99 -9.90000000E+01 -9.90000000E+01 -9.90000000E+01     -99     -99
     -99 -9.90000000E+01 -9.90000000E+01 -9.90000000E+01     -99     -99
     -99 -9.90000000E+01 -9.90000000E+01 -9.90000000E+01     -99     -99
 
End Of File Reached
------------------------------------------------------------------------------------------
 
 
------------------------------------------------------------------------------------------
CASE 3
 
Read numbers from file into string and then read from string into int/real variables:
 
Add condition that if formatted read fails, then do list-directed read using *
 
Results:
      I1              R1              R2              R3      I2      I3
       1  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
       2  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
     101  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
     102  1.00000000E+00 -9.90000000E+01  3.00000000E+00     -99       7
     -99  1.00000000E+00 -9.90000000E+01  3.00000000E+00     -99     -99
 
End Of File Reached
------------------------------------------------------------------------------------------
 
 
NORMAL TERMINATION
 
!results using ifort
browning$ ifort read_csv.f90
browning$ ./a.out
 
------------------------------------------------------------------------------------------
CASE 1
 
Read numbers from file into int/real variables:
 
Results:
      I1              R1              R2              R3      I2      I3
       1  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
       2  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
     101  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
     102  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
 
End Of File Reached
------------------------------------------------------------------------------------------
 
 
------------------------------------------------------------------------------------------
CASE 2
 
Read numbers from file into string and then read from string into int/real variables:
 
Results:
      I1              R1              R2              R3      I2      I3
       1  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
       2  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
     101  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
     102  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
 
End Of File Reached
------------------------------------------------------------------------------------------
 
 
------------------------------------------------------------------------------------------
CASE 3
 
Read numbers from file into string and then read from string into int/real variables:
 
Add condition that if formatted read fails, then do list-directed read using *
 
Results:
      I1              R1              R2              R3      I2      I3
       1  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
       2  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
     101  1.00000000E+00  2.00000000E+00  3.00000000E+00       7       7
     102  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       7
       0  1.00000000E+00  0.00000000E+00  3.00000000E+00       0       0
 
End Of File Reached
------------------------------------------------------------------------------------------
 
 
NORMAL TERMINATION
 
 
 
 
-Bob
 
___________________________________________________
Robert S. Browning IV
Research Structural Engineer
Structural Mechanics Branch
Geotechnical and Structures Laboratory
U.S. Army Engineer Research and Development Center
3909 Halls Ferry Road, Vicksburg, MS  39180-6199
ATTN: CEERD-GSM, BLDG 5008
 
Office:  601-634-5228
iPhone:  601-630-7276
Fax:  601-634-2309 (email preferred)
Email:  mailto:***@usace.army.mil
Jerry DeLisle
2018-10-12 23:52:54 UTC
Permalink
I am needing to read in a text file that contains numerical data separated by keywords. Because of the keywords, I need to read each line in as text first and test if it’s a keyword, and then call the appropriate subroutine based on the keyword value. Within the subroutines I also need to read each line as text first so I can stop the subroutine when the next keyword is found.
The numerical data can be in either fixed format or comma-separated-value (CSV) format. In the CSV data, a zero could be represented by a pair of commas with no number between them.
The issue I have encountered is that the read statement appears to behave differently when it is reading from a character string than when it is reading directly from the input file. If I were able to read directly from the input file, then a simple formatted read statement does everything I need. But if I am reading from a string, then things get a bit more complicated.
Hi Robert,

This may be a known bug. It sounds familiar. Let me do some checking.

Fortranners, if I confirm this is the bug I think it is, I think I can
squeeze some time to fix it.

Jerry
Jerry DeLisle
2018-10-13 16:23:14 UTC
Permalink
Post by Jerry DeLisle
Post by Browning, Robert S IV ERDC-RDE-GSL-MS CIV
I am needing to read in a text file that contains numerical data
separated by keywords. Because of the keywords, I need to read each
line in as text first and test if it’s a keyword, and then call the
appropriate subroutine based on the keyword value. Within the
subroutines I also need to read each line as text first so I can stop
the subroutine when the next keyword is found.
The numerical data can be in either fixed format or
comma-separated-value (CSV) format. In the CSV data, a zero could be
represented by a pair of commas with no number between them.
The issue I have encountered is that the read statement appears to
behave differently when it is reading from a character string than
when it is reading directly from the input file. If I were able to
read directly from the input file, then a simple formatted read
statement does everything I need. But if I am reading from a string,
then things get a bit more complicated.
Hi Robert,
This may be a known bug. It sounds familiar. Let me do some checking.
Fortranners, if I confirm this is the bug I think it is, I think I can
squeeze some time to fix it.
Jerry
I am fairly certain this is PR78351. As far as we can tell the code is
non-conforming Fortran but the use of comma to terminate a read has been
around forever and gfortran supported this before. It was my breakage
in my endeavors to improve string I/O performance. I feel obligated at
this point to set it straight.

Jerry
Jerry DeLisle
2018-10-13 19:43:09 UTC
Permalink
Post by Jerry DeLisle
Post by Jerry DeLisle
Post by Browning, Robert S IV ERDC-RDE-GSL-MS CIV
I am needing to read in a text file that contains numerical data
separated by keywords. Because of the keywords, I need to read each
line in as text first and test if it’s a keyword, and then call the
appropriate subroutine based on the keyword value. Within the
subroutines I also need to read each line as text first so I can stop
the subroutine when the next keyword is found.
The numerical data can be in either fixed format or
comma-separated-value (CSV) format. In the CSV data, a zero could be
represented by a pair of commas with no number between them.
The issue I have encountered is that the read statement appears to
behave differently when it is reading from a character string than
when it is reading directly from the input file. If I were able to
read directly from the input file, then a simple formatted read
statement does everything I need. But if I am reading from a string,
then things get a bit more complicated.
Hi Robert,
This may be a known bug. It sounds familiar. Let me do some checking.
Fortranners, if I confirm this is the bug I think it is, I think I can
squeeze some time to fix it.
Jerry
I am fairly certain this is PR78351. As far as we can tell the code is
non-conforming Fortran but the use of comma to terminate a read has been
around forever and gfortran supported this before.  It was my breakage
in my endeavors to improve string I/O performance. I feel obligated at
this point to set it straight.
Jerry
The patch attached to pr78351 fixes this and regression tests cleanly.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78351

I am wondering if I should put the check behind some condition related
to the standards to avoid the extra string scan or just leave it as is
in the patch.

Regards,

Jerry
Bernhard Reutner-Fischer
2018-10-14 10:02:20 UTC
Permalink
Post by Jerry DeLisle
The patch attached to pr78351 fixes this and regression tests cleanly.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78351
I am wondering if I should put the check behind some condition related
to the standards to avoid the extra string scan or just leave it as is
in the patch.
Please only for compile_options.allow_std & GFC_STD_LEGACY

Could you use memchr to look for the comma, eventually?

TIA

Loading...