Discussion:
[PATCH, Fortran] pad char to int conversions with spaces instead of zeros (legacy)
Mark Eggleston
2018-12-04 14:47:25 UTC
Permalink
Here is a patch to considered for incorporation into gfortran adding to
its legacy support. It pads character to integer conversions using
spaces instead of zeros when enabled.

The pad character is 'undefined' or 'processor dependent' depending on
which standard you read. This makes it 0x20 which matches the Oracle
Fortran compiler.

Enabled using -fdec-pad-with-spaces and -fdec.

Please find attached a patch file and text files containing change log
entries for gcc/fortran and gcc/testsuite.

Note: I do not have write access so can not commit changes. Someone else
will have to commit the change if it is acceptable.

There are several more patches to support legacy Fortran which will be
sent in order.

regards,

Mark Eggleston
--
https://www.codethink.co.uk/privacy.html
Jakub Jelinek
2018-12-04 15:11:30 UTC
Permalink
Here is a patch to considered for incorporation into gfortran adding to its
legacy support. It pads character to integer conversions using spaces
instead of zeros when enabled.
The pad character is 'undefined' or 'processor dependent' depending on which
standard you read. This makes it 0x20 which matches the Oracle Fortran
compiler.
Trying fortran.godbolt.org, I think ifort pads this with spaces too.
Enabled using -fdec-pad-with-spaces and -fdec.
Just a couple of random comments.
-fdec-pad-with-spaces option name doesn't look right, because it doesn't say
what the option affects. So perhaps have transfer in the option name?
Wouldn't it be better to allow specifying whatever character you want to pad
with, so that users can choose something even different?
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)
/* Allocate the buffer to store the binary version of the source. */
buffer_size = MAX (source_size, result_size);
buffer = (unsigned char*)alloca (buffer_size);
- memset (buffer, 0, buffer_size);
+ memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);
This affects just the simplification when the argument is a known constant.
Shouldn't we handle it also when it is not a constant? Like I've tried:
program test
integer(kind=8) :: a, b, c
character(len=4) :: e
a = transfer("ABCE", 1_8)
e = "ABCE"
b = transfer(e, 1_8)
c = transfer("ABCE ", 1_8)
print *, a, b, c
end
and for a the result is on little-endian indeed z'45434241', for b
the upper 32 bits are completely random:
D.3854 = 4;
D.3856 = 8;
_1 = MIN_EXPR <D.3856, D.3854>;
_2 = MAX_EXPR <_1, 0>;
_3 = (unsigned long) _2;
__builtin_memcpy (&transfer.0, &e, _3);
transfer.3_4 = transfer.0;
b = transfer.3_4;
and for c it is the padding with zeros I assume you want for -fdec.
So, what does Oracle fortran or ifort do for this b case above?
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90
@@ -0,0 +1,17 @@
+! { dg-do run }
+! { dg-options "-fdec-pad-with-spaces" }
+!
+
+program test
+ integer(kind=8) :: a
+ a = transfer("ABCE", 1_8)
+ ! If a has not been converted into big endian
+ ! or little endian integer it has failed.
+ if ((a.ne.int(z'4142434520202020',kind=8)).and. &
+ (a.ne.int(z'2020202045434241',kind=8))) then
The tests look too much big-endian vs. little-endian dependent and
ascii dependent. We have pdp-endian and doesn't s390x TPF use EBCDIC?

Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE ", 1_8)
?

Jakub
Fritz Reese
2018-12-04 17:04:40 UTC
Permalink
Post by Jakub Jelinek
Just a couple of random comments.
-fdec-pad-with-spaces option name doesn't look right, because it doesn't say
what the option affects. So perhaps have transfer in the option name?
[...]
Post by Jakub Jelinek
Wouldn't it be better to allow specifying whatever character you want to pad
with, so that users can choose something even different?
I concur with this. In that case some option like -ftransfer-pad-char=
may be a more appriopriate name, where -fdec may enable a default
transfer-pad-char of \x20 rather than \x00.
Post by Jakub Jelinek
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)
[...]
Post by Jakub Jelinek
This affects just the simplification when the argument is a known constant.
Shouldn't we handle it also when it is not a constant?
Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)
to affect non-constant resolution.
Post by Jakub Jelinek
The tests look too much big-endian vs. little-endian dependent and
ascii dependent. We have pdp-endian and doesn't s390x TPF use EBCDIC?
Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE ", 1_8)
Agreed.

---
Fritz
Jerry DeLisle
2018-12-06 02:27:00 UTC
Permalink
Post by Fritz Reese
Post by Jakub Jelinek
Just a couple of random comments.
-fdec-pad-with-spaces option name doesn't look right, because it doesn't say
what the option affects. So perhaps have transfer in the option name?
[...]
Post by Jakub Jelinek
Wouldn't it be better to allow specifying whatever character you want to pad
with, so that users can choose something even different?
I concur with this. In that case some option like -ftransfer-pad-char=
may be a more appriopriate name, where -fdec may enable a default
transfer-pad-char of \x20 rather than \x00.
I disagree completely. I assume the idea of -fdec-pad-with-spaces is to
accomodate some old dec fortran code. The only reason to use some other
character is if someone is writing new dec fortran code, which is
implying encouraging people to be writing non standard conforming code.

Even if it is conforming in the sense that it is processor dependent you
are encouraging people to create new non portable code across compilers.
Please just stay consistent with Intel.

This whole padding business just stinks to me. There are better ways to
accomplish these results without using transfer to convert ascii strings
into bit patterns or whatever the heck some programmer is trying to do
here, like maybe use 'ABCE ' if thats what is really needed. And,
please everyone please quit fiddling with the compiler to fix problems
in the source code. Are people so lazy or such cheapskates they won't do
this the right way and update the damn source code if it needs to be used.

We have truly more serious and real problems/bugs in gfortran that
people should be spending the scarce resources on and not this junk.

Jerry
Jakub Jelinek
2018-12-06 10:33:12 UTC
Permalink
Post by Jerry DeLisle
I disagree completely. I assume the idea of -fdec-pad-with-spaces is to
accomodate some old dec fortran code. The only reason to use some other
character is if someone is writing new dec fortran code, which is implying
encouraging people to be writing non standard conforming code.
Even if it is conforming in the sense that it is processor dependent you are
encouraging people to create new non portable code across compilers. Please
just stay consistent with Intel.
So do you prefer to always use ' ' instead of '\0', or decide based on -fdec
without a separate option controlling that?

Jakub
Mark Eggleston
2018-12-06 10:54:07 UTC
Permalink
Post by Jakub Jelinek
Post by Jerry DeLisle
I disagree completely. I assume the idea of -fdec-pad-with-spaces is to
accomodate some old dec fortran code. The only reason to use some other
character is if someone is writing new dec fortran code, which is implying
encouraging people to be writing non standard conforming code.
I should have made this clear.
Post by Jakub Jelinek
Post by Jerry DeLisle
Even if it is conforming in the sense that it is processor dependent you are
encouraging people to create new non portable code across compilers. Please
just stay consistent with Intel.
I agree.
Post by Jakub Jelinek
So do you prefer to always use ' ' instead of '\0', or decide based on -fdec
without a separate option controlling that?
It would be simpler to dispense with the extra option and just set space
padding with -fdec. I think it would be odd to use something other than
a space or '\0' for padding.
Post by Jakub Jelinek
Jakub
Mark
--
https://www.codethink.co.uk/privacy.html
Jerry DeLisle
2018-12-07 01:58:21 UTC
Permalink
Post by Mark Eggleston
Post by Jakub Jelinek
Post by Jerry DeLisle
I disagree completely. I assume the idea of -fdec-pad-with-spaces is to
accomodate some old dec fortran code. The only reason to use some other
character is if someone is writing new dec fortran code, which is implying
encouraging people to be writing non standard conforming code.
I should have made this clear.
Post by Jakub Jelinek
Post by Jerry DeLisle
Even if it is conforming in the sense that it is processor dependent you are
encouraging people to create new non portable code across compilers. Please
just stay consistent with Intel.
I agree.
Post by Jakub Jelinek
So do you prefer to always use ' ' instead of '\0', or decide based on -fdec
without a separate option controlling that?
It would be simpler to dispense with the extra option and just set space
padding with -fdec. I think it would be odd to use something other than
a space or '\0' for padding.
Post by Jakub Jelinek
    Jakub
Mark
Agree with Mark on this.

Jerry
Jerry DeLisle
2018-12-07 01:57:21 UTC
Permalink
Post by Jakub Jelinek
Post by Jerry DeLisle
I disagree completely. I assume the idea of -fdec-pad-with-spaces is to
accomodate some old dec fortran code. The only reason to use some other
character is if someone is writing new dec fortran code, which is implying
encouraging people to be writing non standard conforming code.
Even if it is conforming in the sense that it is processor dependent you are
encouraging people to create new non portable code across compilers. Please
just stay consistent with Intel.
So do you prefer to always use ' ' instead of '\0', or decide based on -fdec
without a separate option controlling that?
Keep current default of '\0' and use ' ' when -fdec given.

Regards,

Jerry
Mark Eggleston
2018-12-06 10:20:35 UTC
Permalink
Post by Fritz Reese
Post by Jakub Jelinek
Just a couple of random comments.
-fdec-pad-with-spaces option name doesn't look right, because it doesn't say
what the option affects. So perhaps have transfer in the option name?
[...]
Post by Jakub Jelinek
Wouldn't it be better to allow specifying whatever character you want to pad
with, so that users can choose something even different?
I concur with this. In that case some option like -ftransfer-pad-char=
may be a more appriopriate name, where -fdec may enable a default
transfer-pad-char of \x20 rather than \x00.
How would the value be specified?

-ftransfer-pad-char=0x20
-ftransfer-pad-char=32
-ftransfer-pad-char=' '

If -fdec and -ftransfer-pad-char are both specified which should have
precedence?

For -fdec and -fno-dec both specified would I be correct in assuming 0x0
should be used?
Post by Fritz Reese
Post by Jakub Jelinek
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)
[...]
Post by Jakub Jelinek
This affects just the simplification when the argument is a known constant.
Shouldn't we handle it also when it is not a constant?
Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)
to affect non-constant resolution.
Thanks for the hint.
Post by Fritz Reese
Post by Jakub Jelinek
The tests look too much big-endian vs. little-endian dependent and
ascii dependent. We have pdp-endian and doesn't s390x TPF use EBCDIC?
Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE ", 1_8)
Agreed.
---
Fritz
Thanks for the feedback.

Mark
--
https://www.codethink.co.uk/privacy.html
Jakub Jelinek
2018-12-10 17:12:29 UTC
Permalink
Post by Mark Eggleston
Post by Fritz Reese
Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)
to affect non-constant resolution.
Thanks for the hint.
I've looked at gfc_resolve_transfer regarding handling of padding when a
character variable is passed to transfer instead of a literal. This routine
is not called so can't be where a variable would be handled.
Don't yet know where to make the change.
I think you want to change gfc_conv_intrinsic_transfer
For the scalar case, which is the one for which I've posted testcase, there
is:
extent = fold_build2_loc (input_location, MIN_EXPR, gfc_array_index_type,
dest_word_len, source_bytes);
extent = fold_build2_loc (input_location, MAX_EXPR, gfc_array_index_type,
extent, gfc_index_zero_node);
and later:
tmp = build_call_expr_loc (input_location,
builtin_decl_explicit (BUILT_IN_MEMCPY), 3,
fold_convert (pvoid_type_node, tmp),
fold_convert (pvoid_type_node, ptr),
fold_convert (size_type_node, extent));
gfc_add_expr_to_block (&se->pre, tmp);
I guess you want to add after this, guarded with flag_dec only,
code to compare (at runtime) if extent < dest_word_len and if so,
use fill_with_spaces to pad it with spaces at the end (from
that first memcpy's argument + extent dest_word_len - extent bytes),
with a comment why you are doing it. Guess the array case will need
something similar. But, for these runtime checks, guess you should start by
looking what the other compilers are exactly doing for the different kinds
of transfers, if they pad for non-constants at all, if they only pad if
transfer is from a character object to whatever, etc.

Looking at ifort on godbolt
function foo (e)
integer(kind=8) :: foo
character(len=4) :: e
foo = transfer(e, 1_8)
end
I see it calls for_cpystr (..., 8, ..., 4, ...)
so I guess it uses standard library function to copy string into the
destination in that case, so essentially what gfc_trans_string_copy
does inline or our fstrcpy (internal library function).

Jakub
Fritz Reese
2018-12-10 19:43:59 UTC
Permalink
Post by Jakub Jelinek
Post by Mark Eggleston
Post by Fritz Reese
Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)
to affect non-constant resolution.
Thanks for the hint.
I've looked at gfc_resolve_transfer regarding handling of padding when a
character variable is passed to transfer instead of a literal. This routine
is not called so can't be where a variable would be handled.
Don't yet know where to make the change.
It actually is called through a function pointer
(gfc_intrinsic_sym->resolve) in intrinsic.c (resolve_intrinsic), as
with all intrinsics. You can find the backtrace by running f951
(`gfortran -print-prog-name=f951`) through gdb and setting a
breakpoint on gfc_resolve_transfer. That being said...
Post by Jakub Jelinek
I think you want to change gfc_conv_intrinsic_transfer
... in this case Jakub is right. I hadn't looked at the body of the
resolve function yet, but peeking at it tells me the transfer
expression doesn't contain support for padding information, so you'd
have to deal with it yourself in translation.
Post by Jakub Jelinek
I guess you want to add after this, guarded with flag_dec only,
code to compare (at runtime) if extent < dest_word_len and if so,
use fill_with_spaces to pad it with spaces at the end (from
that first memcpy's argument + extent dest_word_len - extent bytes),
with a comment why you are doing it. Guess the array case will need
something similar.
Alternatively you could call BUILT_IN_MEMSET(&tmpdecl, '\x20',
dest_word_len) just prior to the MEMCPY whenever flag_dec is set.

Fritz

Mark Eggleston
2018-12-06 10:14:26 UTC
Permalink
Post by Jakub Jelinek
Here is a patch to considered for incorporation into gfortran adding to its
legacy support. It pads character to integer conversions using spaces
instead of zeros when enabled.
The pad character is 'undefined' or 'processor dependent' depending on which
standard you read. This makes it 0x20 which matches the Oracle Fortran
compiler.
Trying fortran.godbolt.org, I think ifort pads this with spaces too.
Enabled using -fdec-pad-with-spaces and -fdec.
Just a couple of random comments.
-fdec-pad-with-spaces option name doesn't look right, because it doesn't say
what the option affects. So perhaps have transfer in the option name?
Wouldn't it be better to allow specifying whatever character you want to pad
with, so that users can choose something even different?
Fritz Reese agrees with you here and suggests -ftransfer-pad-char= with
-fdec setting it to 0x20.
Post by Jakub Jelinek
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)
/* Allocate the buffer to store the binary version of the source. */
buffer_size = MAX (source_size, result_size);
buffer = (unsigned char*)alloca (buffer_size);
- memset (buffer, 0, buffer_size);
+ memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);
This affects just the simplification when the argument is a known constant.
program test
integer(kind=8) :: a, b, c
character(len=4) :: e
a = transfer("ABCE", 1_8)
e = "ABCE"
b = transfer(e, 1_8)
c = transfer("ABCE ", 1_8)
print *, a, b, c
end
and for a the result is on little-endian indeed z'45434241', for b
D.3854 = 4;
D.3856 = 8;
_1 = MIN_EXPR <D.3856, D.3854>;
_2 = MAX_EXPR <_1, 0>;
_3 = (unsigned long) _2;
__builtin_memcpy (&transfer.0, &e, _3);
transfer.3_4 = transfer.0;
b = transfer.3_4;
and for c it is the padding with zeros I assume you want for -fdec.
Clearly insufficient testing let this through. The padding should be
done for both literals and variables.
Post by Jakub Jelinek
So, what does Oracle fortran or ifort do for this b case above?
Don't have access to either of those compilers. It may be possible to
check the ifort compiler.
Post by Jakub Jelinek
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90
@@ -0,0 +1,17 @@
+! { dg-do run }
+! { dg-options "-fdec-pad-with-spaces" }
+!
+
+program test
+ integer(kind=8) :: a
+ a = transfer("ABCE", 1_8)
+ ! If a has not been converted into big endian
+ ! or little endian integer it has failed.
+ if ((a.ne.int(z'4142434520202020',kind=8)).and. &
+ (a.ne.int(z'2020202045434241',kind=8))) then
The tests look too much big-endian vs. little-endian dependent and
ascii dependent. We have pdp-endian and doesn't s390x TPF use EBCDIC?
I hadn't considered those.
Post by Jakub Jelinek
Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE ", 1_8)
?
That simplifies things.
Post by Jakub Jelinek
Jakub
Thanks for the feedback. I inherited this patch, my addition of
-fdec-pad-with-spaces and improved testing were insufficient.

Will resubmit the patch after taking these issues into account.

Mark.
--
https://www.codethink.co.uk/privacy.html
Loading...