Discussion:
[PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
Kyrill Tkachov
2018-07-17 12:35:15 UTC
Permalink
Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17 Kyrylo Tkachov <***@arm.com>

* f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
__builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
__builtin_fmaxl.
* trans-intrinsic.c: Include builtins.h.
(gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
functions to calculate the min/max.

2018-07-17 Kyrylo Tkachov <***@arm.com>

* gfortran.dg/max_fmaxf.f90: New test.
* gfortran.dg/min_fminf.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
* gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
Richard Biener
2018-07-17 13:27:25 UTC
Permalink
On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi all,
This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.
This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
What is Fortrans requirement on min/max intrinsics? Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is

/* Minimum and maximum values. When used with floating point, if both
operands are zeros, or if either operand is NaN, then it is unspecified
which of the two operands is returned as the result. */

which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.

I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.

Richard.
Post by Kyrill Tkachov
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.
With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.
Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
* f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
__builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
__builtin_fmaxl.
* trans-intrinsic.c: Include builtins.h.
(gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
functions to calculate the min/max.
* gfortran.dg/max_fmaxf.f90: New test.
* gfortran.dg/min_fminf.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
* gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
Kyrill Tkachov
2018-07-17 13:46:13 UTC
Permalink
Hi Richard,
Post by Richard Biener
On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi all,
This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.
This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
What is Fortrans requirement on min/max intrinsics? Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is
The current implementation expands to:
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;

That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
Post by Richard Biener
/* Minimum and maximum values. When used with floating point, if both
operands are zeros, or if either operand is NaN, then it is unspecified
which of the two operands is returned as the result. */
which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.
True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
Post by Richard Biener
I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.
The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)

If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
to the existing expansion.

FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

Thanks,
Kyrill
Post by Richard Biener
Richard.
Post by Kyrill Tkachov
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.
With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.
Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
* f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
__builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
__builtin_fmaxl.
* trans-intrinsic.c: Include builtins.h.
(gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
functions to calculate the min/max.
* gfortran.dg/max_fmaxf.f90: New test.
* gfortran.dg/min_fminf.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
* gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
Thomas Koenig
2018-07-17 15:36:59 UTC
Permalink
Hi Kyrill,
    mvar = a1;
    if (a2 .op. mvar || isnan (mvar))
      mvar = a2;
    if (a3 .op. mvar || isnan (mvar))
      mvar = a3;
    ...
    return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
I've looked at the F2008 standard, and, interestingly enough, the
requirement on MIN and MAX do not mention NaNs at all. 13.7.106
has, for MAX,

Result Value. The value of the result is that of the largest argument.

plus some stuff about character variables (not relevant here). Similar
for MIN.

Also, the section on IEEE_ARITHMETIC (14.9) does not mention
comparisons; also, "Complete conformance with IEC 60559:1989 is not
required", what is required is the correct support for +,-, and *,
plus support for / if IEEE_SUPPORT_DIVIDE is covered.

So, the Fortran standard does not impose many requirements. I do think
that a patch such as yours should not change the current behavior unless
we know what it does and do think it is a good idea. Hmm...

Having said that, I think we pretty much cover all the corner cases
in nan_1.f90, so if that test passes without regression, then that
aspect should be fine.

Question: You have found an advantage on Aarm64. Do you have
access to other architectures so see if there is also a speed
advantage, or maybe a disadvantage?

Regards

Thomas
Kyrill Tkachov
2018-07-17 16:16:20 UTC
Permalink
Hi Thomas,
Post by Thomas Koenig
Hi Kyrill,
Post by Kyrill Tkachov
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
I've looked at the F2008 standard, and, interestingly enough, the
requirement on MIN and MAX do not mention NaNs at all. 13.7.106
has, for MAX,
Result Value. The value of the result is that of the largest argument.
plus some stuff about character variables (not relevant here). Similar
for MIN.
Also, the section on IEEE_ARITHMETIC (14.9) does not mention
comparisons; also, "Complete conformance with IEC 60559:1989 is not
required", what is required is the correct support for +,-, and *,
plus support for / if IEEE_SUPPORT_DIVIDE is covered.
Thanks for checking this.
Post by Thomas Koenig
So, the Fortran standard does not impose many requirements. I do think
that a patch such as yours should not change the current behavior unless
we know what it does and do think it is a good idea. Hmm...
Having said that, I think we pretty much cover all the corner cases
in nan_1.f90, so if that test passes without regression, then that
aspect should be fine.
Looking at the test it looks like there is a de facto expected behaviour.
For example it contains:
if (max(2.d0, nan) /= 2.d0) STOP 9

So it definitely expects comparison with NaN to return the non-NaN result,
which is a the behaviour what my patch preserves.

On integral arguments or when we don't care about NaNs (-Ofast and such) we'll be using
the MIN/MAX_EXPR, which doesn't specify what's returned on a NaN argument, thus allowing
for more aggressive optimisations.
Post by Thomas Koenig
Question: You have found an advantage on Aarm64. Do you have
access to other architectures so see if there is also a speed
advantage, or maybe a disadvantage?
Because the expansion now emits straightline code rather than conditionals and branches
it should be easier to optimise in general, so I'd expect this to be an improvement overall.
That said, I have benchmarked it on SPEC2017 on aarch64.

If you have any benchmarks of interest to you you (or somebody else) can run on a target that you
care about I would be very grateful for any results.

Thanks,
Kyrill
Post by Thomas Koenig
Regards
Thomas
Thomas Koenig
2018-07-17 17:42:03 UTC
Permalink
Hi Kyrill,
Post by Kyrill Tkachov
Because the expansion now emits straightline code rather than
conditionals and branches
it should be easier to optimise in general, so I'd expect this to be an
improvement overall.
That said, I have benchmarked it on SPEC2017 on aarch64.
If you have any benchmarks of interest to you you (or somebody else) can
run on a target that you
care about I would be very grateful for any results.
Well, most people currently use x86_64 for scientific computing, so I
would be concerned most about this architecture. As for the test case,
min / max performance clearly has an effect on 521.wrf, so this would
be ideal.

If you could run 521.wrf on x86_64, and find that it does not
regress measureably (or even shows an improvement), the patch is OK.
I'd be interested in the timings you get.

Regards

Thomas
Janne Blomqvist
2018-07-17 20:06:19 UTC
Permalink
Post by Thomas Koenig
Hi Kyrill,
Post by Kyrill Tkachov
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
I've looked at the F2008 standard, and, interestingly enough, the
requirement on MIN and MAX do not mention NaNs at all. 13.7.106
has, for MAX,
Result Value. The value of the result is that of the largest argument.
plus some stuff about character variables (not relevant here). Similar
for MIN.
FWIW, this has not changed in the latest(?) draft for F2018 (N2146), see
16.9.125.

Also, the section on IEEE_ARITHMETIC (14.9) does not mention
Post by Thomas Koenig
comparisons; also, "Complete conformance with IEC 60559:1989 is not
required", what is required is the correct support for +,-, and *,
plus support for / if IEEE_SUPPORT_DIVIDE is covered.
Interestingly, here the F2018 draft has new intrinsics in the
IEEE_ARITHMETIC module, IEEE_MAX_NUM, IEEE_MAX_NUM_MAG, IEEE_MIN_NUM,
IEEE_MIN_NUM_MAG. These correspond to the {max,min}num{,_mag} operations in
IEEE 754-2008, which AFAICT has the same NaN semantics as __builtin_fmax
etc.
Post by Thomas Koenig
So, the Fortran standard does not impose many requirements.
If so, why don't we just use {MAX,MIN}_EXPR unconditionally? Those who
worry about the behavior wrt. NaNs, infinities etc. can use the intrinsics
from IEEE_ARITHMETIC?


This thread also has some interesting discussion on the topic:
https://github.com/JuliaLang/julia/issues/7866
--
Janne Blomqvist
Janne Blomqvist
2018-07-17 20:35:42 UTC
Permalink
Post by Janne Blomqvist
Post by Thomas Koenig
Hi Kyrill,
Post by Kyrill Tkachov
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the
semantics of fmin/max
as far as I can tell.
I've looked at the F2008 standard, and, interestingly enough, the
requirement on MIN and MAX do not mention NaNs at all. 13.7.106
has, for MAX,
Result Value. The value of the result is that of the largest argument.
plus some stuff about character variables (not relevant here). Similar
for MIN.
FWIW, this has not changed in the latest(?) draft for F2018 (N2146), see
16.9.125.
Also, the section on IEEE_ARITHMETIC (14.9) does not mention
Post by Thomas Koenig
comparisons; also, "Complete conformance with IEC 60559:1989 is not
required", what is required is the correct support for +,-, and *,
plus support for / if IEEE_SUPPORT_DIVIDE is covered.
Interestingly, here the F2018 draft has new intrinsics in the
IEEE_ARITHMETIC module, IEEE_MAX_NUM, IEEE_MAX_NUM_MAG, IEEE_MIN_NUM,
IEEE_MIN_NUM_MAG. These correspond to the {max,min}num{,_mag} operations in
IEEE 754-2008, which AFAICT has the same NaN semantics as __builtin_fmax
etc.
Post by Thomas Koenig
So, the Fortran standard does not impose many requirements.
If so, why don't we just use {MAX,MIN}_EXPR unconditionally? Those who
worry about the behavior wrt. NaNs, infinities etc. can use the intrinsics
from IEEE_ARITHMETIC?
https://github.com/JuliaLang/julia/issues/7866
Oh, and on http://754r.ucbtest.org/ there is information about the next
update after IEEE 754-2008. In particular,
http://754r.ucbtest.org/changes.html notes that the above mentioned
{max,min}num{,_mag} have been deleted, and "new
{min,max}imum{,Number,Magnitude,MagnitudeNumber} operations are
recommended; NaN and signed zero handling are changed from 754-2008 5.3.1.
".
--
Janne Blomqvist
Kyrill Tkachov
2018-07-18 11:17:52 UTC
Permalink
Hi all,

Thank you for the feedback so far.
This version of the patch doesn't try to emit fmin/fmax function calls but instead
emits MIN/MAX_EXPR sequences unconditionally.
I think a source of confusion in the original proposal (for me at least) was
that on aarch64 (that I primarily work on) we implement the fmin/fmax optabs
and therefore these calls are expanded to a single instruction.
But on x86_64 these optabs are not implemented and therefore expand to actual library calls.
Therefore at -O3 (no -ffast-math) I saw a gain on aarch64. But I measured today
on x86_64 and saw a regression.

Thomas and Janne suggested that the Fortran standard does not impose a requirement
on NaN handling for the min/max intrinsics, which would make emitting MIN/MAX_EXPR
sequences unconditionally a valid approach.

However, the gfortran.dg/nan_1.f90 test checks that handling of NaN values in
these intrinsics follows the IEEE semantics (min (nan, 2.0) == 2.0, for example).
This is not required by the standard, but is the existing gfortran behaviour.

If we end up always emitting MIN/MAX_EXPR sequences, like this version of the patch does,
then that test fails on some configurations of x86_64 and not others (for me it FAILs
by default, but passes with -march=native on my machine) and passes on AArch64.
This is expected since MIN/MAX_EXPR doesn't enforce IEEE behaviour on its arguments.

However, by always emitting MIN/MAX_EXPR the gfc_conv_intrinsic_minmax function is
simplified and, perhaps more importantly, generates faster code in the -O3 case.
With this patch I see performance improvement on 521.wrf on both AArch64 (3.7%)
and x86_64 (5.4%).

Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits
mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test?

Thanks,
Kyrill

2018-07-18 Kyrylo Tkachov <***@arm.com>

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
sequence to calculate the min/max.

2018-07-18 Kyrylo Tkachov <***@arm.com>

* gfortran.dg/max_float.f90: New test.
* gfortran.dg/min_float.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
Thomas König
2018-07-18 13:26:04 UTC
Permalink
Hi Kyrlll,
Post by Kyrill Tkachov
Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits
mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test?
I would be a bit careful about changing behavior in such a major way. What would the results with NaN and infinity then be, with or without optimization? Would the results be consistent with min(nan,num) vs min(num,nan)? Would they be consistent with the new IEEE standard?

In general, I think that min(nan,num) should be nan and that our current behavior is not the best.

Does anybody have dats points on how this is handled by other compilers?

Oh, and if anything is changed, then compile and runtime behavior should always be the same.

Regards, Thomas
Kyrill Tkachov
2018-07-18 14:03:19 UTC
Permalink
Post by Thomas König
Hi Kyrlll,
Post by Kyrill Tkachov
Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits
mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test?
I would be a bit careful about changing behavior in such a major way. What would the results with NaN and infinity then be, with or without optimization? Would the results be consistent with min(nan,num) vs min(num,nan)? Would they be consistent with the new IEEE standard?
In general, I think that min(nan,num) should be nan and that our current behavior is not the best.
Does anybody have dats points on how this is handled by other compilers?
Oh, and if anything is changed, then compile and runtime behavior should always be the same.
Thanks, that makes it clearer what behaviour is accceptable.

So this v3 patch follows Richard Sandiford's suggested approach of emitting IFN_FMIN/FMAX
when dealing with floating-point values and NaN handling is important and the target
supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison sequence is emitted.
For integer types and -ffast-math floating-point it will emit MIN/MAX_EXPR.

With this patch the nan_1.f90 behaviour is preserved on all targets, we get the optimal
sequence on aarch64 and on x86_64 we avoid the function call, with no changes in code generation.

This gives the performance improvement on 521.wrf on aarch64 and leaves it unchanged on x86_64.

I'm hoping this addresses all the concerns raised in this thread:
* The NaN-handling behaviour is unchanged on all platforms.
* The fast inline sequence is emitted where it is available.
* No calls to library fmin*/fmax* are emitted where there were none.
* MIN/MAX_EXPR sequence are emitted where possible.

Is this acceptable?

Thanks,
Kyrill

2018-07-18 Kyrylo Tkachov <***@arm.com>

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18 Kyrylo Tkachov <***@arm.com>

* gfortran.dg/max_fmaxl_aarch64.f90: New test.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
Janne Blomqvist
2018-07-18 14:55:30 UTC
Permalink
Post by Kyrill Tkachov
Post by Thomas König
Hi Kyrlll,
Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
Post by Kyrill Tkachov
Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits
mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test?
I would be a bit careful about changing behavior in such a major way.
What would the results with NaN and infinity then be, with or without
optimization? Would the results be consistent with min(nan,num) vs
min(num,nan)? Would they be consistent with the new IEEE standard?
In general, I think that min(nan,num) should be nan and that our current
behavior is not the best.
Does anybody have dats points on how this is handled by other compilers?
Oh, and if anything is changed, then compile and runtime behavior should
always be the same.
Thanks, that makes it clearer what behaviour is accceptable.
So this v3 patch follows Richard Sandiford's suggested approach of emitting IFN_FMIN/FMAX
when dealing with floating-point values and NaN handling is important and the target
supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison sequence is emitted.
For integer types and -ffast-math floating-point it will emit MIN/MAX_EXPR.
With this patch the nan_1.f90 behaviour is preserved on all targets, we get the optimal
sequence on aarch64 and on x86_64 we avoid the function call, with no
changes in code generation.
This gives the performance improvement on 521.wrf on aarch64 and leaves it
unchanged on x86_64.
* The NaN-handling behaviour is unchanged on all platforms.
* The fast inline sequence is emitted where it is available.
* No calls to library fmin*/fmax* are emitted where there were none.
* MIN/MAX_EXPR sequence are emitted where possible.
Is this acceptable?
So if I understand it correctly, the "internal fn" thing is a mechanism
that allows to check whether the target supports expanding a builtin inline
or whether it requires a call to an external library function?

If so, then yes, Ok, thanks for the patch!
--
Janne Blomqvist
Richard Sandiford
2018-07-18 15:27:54 UTC
Permalink
Thanks for doing this.
+ calc = build_call_expr_internal_loc (input_location, ifn, type,
+ 2, mvar, convert (type, val));
(indentation looks off)
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+ real (kind=16) :: a, b, c, d, e, f, g, h
+ a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+ real (kind=16) :: a, b, c, d, e, f, g, h
+ a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
Do these still pass? I wouldn't have expected us to use __builtin_fmin*
and __builtin_fmax* now.

It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
and kind=8 on AArch64, since that's really the end goal here.

Thanks,
Richard
Kyrill Tkachov
2018-07-18 16:04:23 UTC
Permalink
Hi Richard,
Post by Richard Sandiford
Thanks for doing this.
+ calc = build_call_expr_internal_loc (input_location, ifn, type,
+ 2, mvar, convert (type, val));
(indentation looks off)
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+ real (kind=16) :: a, b, c, d, e, f, g, h
+ a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+ real (kind=16) :: a, b, c, d, e, f, g, h
+ a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
Do these still pass? I wouldn't have expected us to use __builtin_fmin*
and __builtin_fmax* now.
It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
and kind=8 on AArch64, since that's really the end goal here.
Doh, yes. I had spotted that myself after I had sent out the patch.
I've fixed that and the indentation issue in this small revision.

Given Janne's comments I will commit this tomorrow if there are no objections.
This patch should be a conservative improvement. If the Fortran folks decide
to sacrifice the more predictable NaN handling in favour of more optimisation
leeway by using MIN/MAX_EXPR unconditionally we can do that as a follow-up.

Thanks for the help,
Kyrill

2018-07-18 Kyrylo Tkachov <***@arm.com>

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18 Kyrylo Tkachov <***@arm.com>

* gfortran.dg/max_fmax_aarch64.f90: New test.
* gfortran.dg/min_fmin_aarch64.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
Janne Blomqvist
2018-07-18 15:10:05 UTC
Permalink
Post by Thomas König
Hi Kyrlll,
Post by Thomas König
Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
Thomas, Janne, would this relaxation of NaN handling be acceptable given
the benefits
Post by Thomas König
mentioned above? If so, what would be the recommended adjustment to the
nan_1.f90 test?
I would be a bit careful about changing behavior in such a major way. What
would the results with NaN and infinity then be, with or without
optimization? Would the results be consistent with min(nan,num) vs
min(num,nan)? Would they be consistent with the new IEEE standard?
AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with
Inf. For NaN the result is undefined, and you might indeed have

min(a, NaN) = a
min(NaN, a) = NaN

where "a" is a normal number.

(I think that happens at least on x86 if MIN_EXPR is expanded to
minsd/minpd.

Apparently what the proper result for min(a, NaN) should be is contentious
enough that minnum was removed from the upcoming IEEE 754 revision, and new
operations AFAICS have the semantics

minimum(a, NaN) = minimum(NaN, a) = NaN
minimumNumber(a, NaN) = minimumNumber(NaN, a) = a

That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in
C, and to the current behavior of gfortran.
Post by Thomas König
In general, I think that min(nan,num) should be nan and that our current
behavior is not the best.
There was some extensive discussion of that in the Julia bug report I
linked to in an earlier message, and they came to the same conclusion and
changed their behavior.
Post by Thomas König
Does anybody have dats points on how this is handled by other compilers?
The only other compiler I have access to at the moment is ifort (and not
the latest version), but maybe somebody has access to a wider variety?
Post by Thomas König
Oh, and if anything is changed, then compile and runtime behavior should
always be the same.
Well, IFF we place some weight on the runtime behavior being particularly
sensible wrt NaN's, which it wouldn't be if we just use a plain
MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In
particular, if other compilers are inconsistent, we might as well do
whatever is fastest.
--
Janne Blomqvist
Joseph Myers
2018-07-26 20:35:59 UTC
Permalink
Post by Janne Blomqvist
minimumNumber(a, NaN) = minimumNumber(NaN, a) = a
That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in
No, it differs in the handling of signaling NaNs (with minimumNumber, if
the NaN argument is signaling, it results in the "invalid" exception but
the non-NaN argument is still returned, whereas with minNum, a quiet NaN
was returned in that case). A new fminimum_num function is proposed as a
C binding to the new operation.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf

(The new operations are also more strictly defined regarding zero
arguments, to treat -0 as less than +0, which was unspecified for minNum
and fmin.)
--
Joseph S. Myers
***@codesourcery.com
Janne Blomqvist
2018-08-06 12:04:54 UTC
Permalink
Post by Janne Blomqvist
Post by Thomas König
Hi Kyrlll,
Post by Thomas König
Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
Thomas, Janne, would this relaxation of NaN handling be acceptable
given the benefits
Post by Thomas König
mentioned above? If so, what would be the recommended adjustment to the
nan_1.f90 test?
I would be a bit careful about changing behavior in such a major way.
What would the results with NaN and infinity then be, with or without
optimization? Would the results be consistent with min(nan,num) vs
min(num,nan)? Would they be consistent with the new IEEE standard?
AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with
Inf. For NaN the result is undefined, and you might indeed have
min(a, NaN) = a
min(NaN, a) = NaN
where "a" is a normal number.
(I think that happens at least on x86 if MIN_EXPR is expanded to
minsd/minpd.
Apparently what the proper result for min(a, NaN) should be is contentious
enough that minnum was removed from the upcoming IEEE 754 revision, and new
operations AFAICS have the semantics
minimum(a, NaN) = minimum(NaN, a) = NaN
minimumNumber(a, NaN) = minimumNumber(NaN, a) = a
That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in
C, and to the current behavior of gfortran.
Post by Thomas König
In general, I think that min(nan,num) should be nan and that our current
behavior is not the best.
There was some extensive discussion of that in the Julia bug report I
linked to in an earlier message, and they came to the same conclusion and
changed their behavior.
Post by Thomas König
Does anybody have dats points on how this is handled by other compilers?
The only other compiler I have access to at the moment is ifort (and not
the latest version), but maybe somebody has access to a wider variety?
Post by Thomas König
Oh, and if anything is changed, then compile and runtime behavior should
always be the same.
Well, IFF we place some weight on the runtime behavior being particularly
sensible wrt NaN's, which it wouldn't be if we just use a plain
MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In
particular, if other compilers are inconsistent, we might as well do
whatever is fastest.
--
Janne Blomqvist
The testcase below (the functions in a separate file to prevent
inter-procedural and constant propagation optimizations):

program main
implicit none
real :: a, b = 1., mymax, mydiv
external mymax, mydiv
a = mydiv(0., 0.)
print *, 'Verify that the following value is a NaN: ', a
print *, 'max(', a, ',', b, ') = ', mymax(a, b)
print *, 'max(', b, ',', a, ') = ', mymax(b, a)

a = mydiv(1., 0.)
print *, 'Verify that the following is a Inf: ', a
print *, 'max(', a, ',', b, ') = ', mymax(a, b)
print *, 'max(', b, ',', a, ') = ', mymax(b, a)
end program main

real function mymax(a, b)
implicit none
real :: a, b
mymax = max(a, b)
end function mymax

real function mydiv(a, b)
implicit none
real :: a, b
mydiv = a/b
end function mydiv


With gfortran 6.2 (didn't bother to check other versions as it shouldn't
have changed lately) and Intel Fortran 17.0.1 I get the following:

% gfortran main.f90 my.f90 && ./a.out
Verify that the following value is a NaN: NaN
max( NaN , 1.00000000 ) = 1.00000000
max( 1.00000000 , NaN ) = 1.00000000
Verify that the following is a Inf: Infinity
max( Infinity , 1.00000000 ) = Infinity
max( 1.00000000 , Infinity ) = Infinity

% gfortran -ffast-math main.f90 my.f90 && ./a.out
Verify that the following value is a NaN: NaN
max( NaN , 1.00000000 ) = NaN
max( 1.00000000 , NaN ) = 1.00000000
Verify that the following is a Inf: Infinity
max( Infinity , 1.00000000 ) = Infinity
max( 1.00000000 , Infinity ) = Infinity


% ifort main.f90 my.f90 && ./a.out
Verify that the following value is a NaN: NaN
max( NaN , 1.000000 ) = 1.000000
max( 1.000000 , NaN ) = NaN
Verify that the following is a Inf: Infinity
max( Infinity , 1.000000 ) = Infinity
max( 1.000000 , Infinity ) = Infinity


% ifort -fp-model strict main.f90 my.f90 && ./a.out
Verify that the following value is a NaN: NaN
max( NaN , 1.000000 ) = 1.000000
max( 1.000000 , NaN ) = NaN
Verify that the following is a Inf: Infinity
max( Infinity , 1.000000 ) = Infinity
max( 1.000000 , Infinity ) = Infinity


For brevity I have omitted tests with various -O[N] optimization levels,
which didn't affect the results on either gfortran nor ifort.

This suggests that ifort does the equivalent of MAX_EXPR unconditionally.

Does anyone have access to other compilers, what results do they give?
--
Janne Blomqvist
Richard Biener
2018-07-18 09:44:29 UTC
Permalink
On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi Richard,
Post by Richard Biener
On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi all,
This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.
This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
What is Fortrans requirement on min/max intrinsics? Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
Post by Richard Biener
/* Minimum and maximum values. When used with floating point, if both
operands are zeros, or if either operand is NaN, then it is unspecified
which of the two operands is returned as the result. */
which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.
True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
Post by Richard Biener
I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.
The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)
This doesn't mean anything given you make them available with your
patch ;) So I expect it may
cause issues for !c99_runtime targets (and long double at least).
Post by Kyrill Tkachov
If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
to the existing expansion.
As said I would not use fmin/fmax calls here at all.
Post by Kyrill Tkachov
FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
You said that, yes. Even without -ffast-math?

Richard.
Post by Kyrill Tkachov
Thanks,
Kyrill
Post by Richard Biener
Richard.
Post by Kyrill Tkachov
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.
With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.
Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
* f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
__builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
__builtin_fmaxl.
* trans-intrinsic.c: Include builtins.h.
(gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
functions to calculate the min/max.
* gfortran.dg/max_fmaxf.f90: New test.
* gfortran.dg/min_fminf.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
* gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
Kyrill Tkachov
2018-07-18 09:50:18 UTC
Permalink
Post by Richard Biener
On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi Richard,
Post by Richard Biener
On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi all,
This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.
This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
What is Fortrans requirement on min/max intrinsics? Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
Post by Richard Biener
/* Minimum and maximum values. When used with floating point, if both
operands are zeros, or if either operand is NaN, then it is unspecified
which of the two operands is returned as the result. */
which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.
True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
Post by Richard Biener
I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.
The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)
This doesn't mean anything given you make them available with your
patch ;) So I expect it may
cause issues for !c99_runtime targets (and long double at least).
Urgh, that can cause headaches...
Post by Richard Biener
Post by Kyrill Tkachov
If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
to the existing expansion.
As said I would not use fmin/fmax calls here at all.
... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these intrinsics?
That should make it simpler and more portable.
Post by Richard Biener
Post by Kyrill Tkachov
FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
You said that, yes. Even without -ffast-math?
It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)

Thanks,
Kyrill
Post by Richard Biener
Richard.
Post by Kyrill Tkachov
Thanks,
Kyrill
Post by Richard Biener
Richard.
Post by Kyrill Tkachov
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.
With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.
Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
* f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
__builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
__builtin_fmaxl.
* trans-intrinsic.c: Include builtins.h.
(gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
functions to calculate the min/max.
* gfortran.dg/max_fmaxf.f90: New test.
* gfortran.dg/min_fminf.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
* gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
Richard Biener
2018-07-18 10:06:15 UTC
Permalink
On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
Post by Kyrill Tkachov
Post by Richard Biener
On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi Richard,
Post by Richard Biener
On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi all,
This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.
This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
What is Fortrans requirement on min/max intrinsics? Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
Post by Richard Biener
/* Minimum and maximum values. When used with floating point, if both
operands are zeros, or if either operand is NaN, then it is unspecified
which of the two operands is returned as the result. */
which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.
True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
Post by Richard Biener
I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.
The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)
This doesn't mean anything given you make them available with your
patch ;) So I expect it may
cause issues for !c99_runtime targets (and long double at least).
Urgh, that can cause headaches...
Post by Richard Biener
Post by Kyrill Tkachov
If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
to the existing expansion.
As said I would not use fmin/fmax calls here at all.
... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these intrinsics?
That should make it simpler and more portable.
That's fortran maintainers call.
Post by Kyrill Tkachov
Post by Richard Biener
Post by Kyrill Tkachov
FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
You said that, yes. Even without -ffast-math?
It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)
The question is will it be slower without -ffast-math, that is, when
fmin/max() calls are emitted rather
than inline conditionals.

I think a patch just using MAX/MIN_EXPR within the existing
constraints and otherwise falling back to
the current code would be more obvious and other changes should be
mande independently.

Richard.
Post by Kyrill Tkachov
Thanks,
Kyrill
Post by Richard Biener
Richard.
Post by Kyrill Tkachov
Thanks,
Kyrill
Post by Richard Biener
Richard.
Post by Kyrill Tkachov
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.
With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.
Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
* f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
__builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
__builtin_fmaxl.
* trans-intrinsic.c: Include builtins.h.
(gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
functions to calculate the min/max.
* gfortran.dg/max_fmaxf.f90: New test.
* gfortran.dg/min_fminf.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
* gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
Richard Sandiford
2018-07-18 11:44:09 UTC
Permalink
Post by Richard Biener
On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
Post by Kyrill Tkachov
Post by Richard Biener
On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi Richard,
Post by Richard Biener
On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
Post by Kyrill Tkachov
Hi all,
This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of
functions that expand
to these instructions.
This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use
MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier
to recognise in the
What is Fortrans requirement on min/max intrinsics? Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
mvar = a2;
if (a3 .op. mvar || isnan (mvar))
mvar = a3;
...
return mvar;
That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.
Post by Richard Biener
/* Minimum and maximum values. When used with floating point, if both
operands are zeros, or if either operand is NaN, then it is unspecified
which of the two operands is returned as the result. */
which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.
True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS
(type) && !HONOR_NANS (type).
Post by Richard Biener
I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.
The patch will generate fmin/fmax calls (or the fminf,fminl
variants) when mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)
This doesn't mean anything given you make them available with your
patch ;) So I expect it may
cause issues for !c99_runtime targets (and long double at least).
Urgh, that can cause headaches...
Post by Richard Biener
Post by Kyrill Tkachov
If the above doesn't hold and we can't use either MIN/MAX_EXPR of
fmin/fmax then the patch falls back
to the existing expansion.
As said I would not use fmin/fmax calls here at all.
... Given the comments from Thomas and Janne, maybe we should just
emit MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these intrinsics?
That should make it simpler and more portable.
That's fortran maintainers call.
Post by Kyrill Tkachov
Post by Richard Biener
Post by Kyrill Tkachov
FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
You said that, yes. Even without -ffast-math?
It improves at -O3 without -ffast-math in particular. With -ffast-math
phiopt optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs
(minmax_replacement in tree-ssa-phiopt.c)
The question is will it be slower without -ffast-math, that is, when
fmin/max() calls are emitted rather
than inline conditionals.
I think a patch just using MAX/MIN_EXPR within the existing
constraints and otherwise falling back to
the current code would be more obvious and other changes should be
mande independently.
If going to MIN_EXPR and MAX_EXPR unconditionally isn't acceptable,
maybe an alternative would be to go straight to internal functions,
under the usual:

direct_internal_fn_supported_p (IFN_F{MIN,MAX}, type, OPTIMIZE_FOR_SPEED)

condition.

Thanks,
Richard
Loading...