Discussion:
RFR: JDK-8210106: sun/tools/jps/TestJps.java timed out
Gary Adams
2018-12-06 13:27:45 UTC
Permalink
On a local linux-x64-debug build this test consistently runs in less
than 40 seconds.
On the mach5 testing machines there was a large fluctuation in the time
to complete this test.

Since the test runs jps with different combinations of arguments, the total
test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.

I've run the test several hundred times through mach5 on the slower
platforms
and then examined the test logs using a 3X setting from the default 120
second
jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.

To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.

Issue: https://bugs.openjdk.java.net/browse/JDK-8210106

Proposed fix:

diff --git a/test/jdk/sun/tools/jps/TestJps.java
b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
* @modules jdk.jartool/sun.tools.jar
* @build LingeredAppForJps
* @build LingeredApp
- * @run main/othervm TestJps
+ * @run main/othervm/timeout=360 TestJps
*/
Daniel D. Daugherty
2018-12-06 14:57:32 UTC
Permalink
Post by Gary Adams
On a local linux-x64-debug build this test consistently runs in less
than 40 seconds.
On the mach5 testing machines there was a large fluctuation in the
time to complete this test.
Since the test runs jps with different combinations of arguments, the total
test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.
I've run the test several hundred times through mach5 on the slower
platforms
and then examined the test logs using a 3X setting from the default
120 second
jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.
To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210106
diff --git a/test/jdk/sun/tools/jps/TestJps.java
b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
  */
Thumbs up! And trivial.

Dan
David Holmes
2018-12-07 00:52:31 UTC
Permalink
Hi Gary,
Post by Gary Adams
On a local linux-x64-debug build this test consistently runs in less
than 40 seconds.
On the mach5 testing machines there was a large fluctuation in the time
to complete this test.
Since the test runs jps with different combinations of arguments, the total
test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.
I've run the test several hundred times through mach5 on the slower
platforms
and then examined the test logs using a 3X setting from the default 120
second
jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.
To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210106
diff --git a/test/jdk/sun/tools/jps/TestJps.java
b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
  */
Doesn't that then get scaled by the timeout factor resulting in a much
longer timeout than the 360 seconds you intended?

For other timeout adjustments the needed time has been divided by the
timeout factor to get the actual intended timeout.

Cheers,
David
Gary Adams
2018-12-07 12:58:53 UTC
Permalink
Post by David Holmes
Hi Gary,
Post by Gary Adams
On a local linux-x64-debug build this test consistently runs in less
than 40 seconds.
On the mach5 testing machines there was a large fluctuation in the
time to complete this test.
Since the test runs jps with different combinations of arguments, the total
test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.
I've run the test several hundred times through mach5 on the slower
platforms
and then examined the test logs using a 3X setting from the default
120 second
jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.
To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.
Issue: https://bugs.openjdk.java.net/browse/JDK-8210106
diff --git a/test/jdk/sun/tools/jps/TestJps.java
b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
*/
Doesn't that then get scaled by the timeout factor resulting in a much
longer timeout than the 360 seconds you intended?
For other timeout adjustments the needed time has been divided by the
timeout factor to get the actual intended timeout.
This bug was filed fairly recently in Aug 2018.
At that time the timeout and timeout factor were not sufficient
to avoid the test failing.

The mach5 timeout factors were adjusted recently, so this test may
no longer be an issue.

If that is true, then we could simply close this bug as "cannot reproduce".
An argument could be made that the change in timeout factor may be
responsible for fixing a lot more of the intermittent bugs and that they
should be closed in a similar manner.

Historically, we could say this particular bug should have had timeouts
reassessed when the infrastructure switched from jprt to mach5 testing
where there were more visible Java processes running.

Using a higher explicit timeout will not make the test run any longer
than it needs. It will simply allow the test to not be terminated sooner
in a hung test scenario.

What is your preference for this particular issue:
- increase the explicit timeout
- close as cannot reproduce attributed to the timeout factor adjustments

What would you recommend going forward for other similar issues:
- determine a new explicit timeout
- close if no timeout failures have been observed since the timeout
factor
was raised
Post by David Holmes
Cheers,
David
Daniel D. Daugherty
2018-12-07 15:12:37 UTC
Permalink
Post by Gary Adams
Post by David Holmes
Hi Gary,
Post by Gary Adams
On a local linux-x64-debug build this test consistently runs in less
than 40 seconds.
On the mach5 testing machines there was a large fluctuation in the
time to complete this test.
Since the test runs jps with different combinations of arguments, the total
test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.
I've run the test several hundred times through mach5 on the slower
platforms
and then examined the test logs using a 3X setting from the default
120 second
jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.
To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.
   Issue: https://bugs.openjdk.java.net/browse/JDK-8210106
diff --git a/test/jdk/sun/tools/jps/TestJps.java
b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
   */
Doesn't that then get scaled by the timeout factor resulting in a
much longer timeout than the 360 seconds you intended?
For other timeout adjustments the needed time has been divided by the
timeout factor to get the actual intended timeout.
This bug was filed fairly recently in Aug 2018.
At that time the timeout and timeout factor were not sufficient
to avoid the test failing.
The mach5 timeout factors were adjusted recently, so this test may
no longer be an issue.
If that is true, then we could simply close this bug as "cannot reproduce".
An argument could be made that the change in timeout factor may be
responsible for fixing a lot more of the intermittent bugs and that they
should be closed in a similar manner.
Historically, we could say this particular bug should have had timeouts
reassessed when the infrastructure switched from jprt to mach5 testing
where there were more visible Java processes running.
Using a higher explicit timeout will not make the test run any longer
than it needs. It will simply allow the test to not be terminated sooner
in a hung test scenario.
   - increase the explicit timeout
   - close as cannot reproduce attributed to the timeout factor
adjustments
   - determine a new explicit timeout
   - close if no timeout failures have been observed since the timeout
factor
      was raised
General guidance that I've been giving folks is that you bump the
default timeout when you see timeouts with 'release' bits with
a timeout factor of '1'. The default timeout factor in Mach5 is 4
so we run both 'release' and 'fastdebug' bits with that timeout
factor. That means our total timeout value for 'release' bits is
probably a little long and our total timeout value for 'fastdebug'
bits is probably just right for a concurrency factor of 12.

On my personal servers I use the following timeout factors:

        release)
            TIMEOUT_FACTOR="4"
            ;;
        fastdebug)
            TIMEOUT_FACTOR="6"
            ;;
        slowdebug)
            TIMEOUT_FACTOR="12"
            ;;

However, I run with a concurrency factor of 16. I rarely see
timeout failures.

All that is well and good, but a 'jps' test is a different
beast since it is affected by factors outside of most tests.
Since the number of java processes running on the system can
affect the test, I'm okay with using a default timeout of '360'
for this test as a guard against increases in load or...

Gary saw a longest time value of 280 seconds in his testing with
fastdebug bits. The default timeout value is 120 seconds with a
default timeout factor of 4 in Mach5 gives a total timeout of
480 seconds (8 minutes). With Gary's new default timeout of
360 seconds, we'll have a total timeout value of 1440 seconds
(24 minutes).

Before the recent changes to Mach5, the default timeout factor
was 10 so we had a total timeout value of 1200 seconds (20
minutes). Gary's change only bumps the total timeout value
by 4 minutes from what we had back in August... when this bug
was filed... unfortunately, the log for that sighting is gone
so all we know is that the test timed out after 20 minutes
Post by Gary Adams
Timeout refired 1200 times
We know that JTREG timeout handling kicks in at timeout expiration,
but without the log we don't know how much longer than 20 minutes
the test ran before JTREG killed it. 24 minutes might do it, but...

Dan
Post by Gary Adams
Post by David Holmes
Cheers,
David
David Holmes
2018-12-10 07:55:16 UTC
Permalink
Hi Dan,
Post by Daniel D. Daugherty
Gary saw a longest time value of 280 seconds in his testing with
fastdebug bits. The default timeout value is 120 seconds with a
default timeout factor of 4 in Mach5 gives a total timeout of
480 seconds (8 minutes). With Gary's new default timeout of
360 seconds, we'll have a total timeout value of 1440 seconds
(24 minutes).
So based on that it seems Gary's observations are not relevant to what
has been seen in mach5 because at 280 seconds we should never have been
timing out!

I guess these changes won't hurt.

Thanks,
David
Post by Daniel D. Daugherty
Post by Gary Adams
Post by David Holmes
Hi Gary,
Post by Gary Adams
On a local linux-x64-debug build this test consistently runs in less
than 40 seconds.
On the mach5 testing machines there was a large fluctuation in the
time to complete this test.
Since the test runs jps with different combinations of arguments, the total
test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.
I've run the test several hundred times through mach5 on the slower
platforms
and then examined the test logs using a 3X setting from the default
120 second
jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.
To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.
   Issue: https://bugs.openjdk.java.net/browse/JDK-8210106
diff --git a/test/jdk/sun/tools/jps/TestJps.java
b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
   */
Doesn't that then get scaled by the timeout factor resulting in a
much longer timeout than the 360 seconds you intended?
For other timeout adjustments the needed time has been divided by the
timeout factor to get the actual intended timeout.
This bug was filed fairly recently in Aug 2018.
At that time the timeout and timeout factor were not sufficient
to avoid the test failing.
The mach5 timeout factors were adjusted recently, so this test may
no longer be an issue.
If that is true, then we could simply close this bug as "cannot reproduce".
An argument could be made that the change in timeout factor may be
responsible for fixing a lot more of the intermittent bugs and that they
should be closed in a similar manner.
Historically, we could say this particular bug should have had timeouts
reassessed when the infrastructure switched from jprt to mach5 testing
where there were more visible Java processes running.
Using a higher explicit timeout will not make the test run any longer
than it needs. It will simply allow the test to not be terminated sooner
in a hung test scenario.
   - increase the explicit timeout
   - close as cannot reproduce attributed to the timeout factor
adjustments
   - determine a new explicit timeout
   - close if no timeout failures have been observed since the timeout
factor
      was raised
General guidance that I've been giving folks is that you bump the
default timeout when you see timeouts with 'release' bits with
a timeout factor of '1'. The default timeout factor in Mach5 is 4
so we run both 'release' and 'fastdebug' bits with that timeout
factor. That means our total timeout value for 'release' bits is
probably a little long and our total timeout value for 'fastdebug'
bits is probably just right for a concurrency factor of 12.
        release)
            TIMEOUT_FACTOR="4"
            ;;
        fastdebug)
            TIMEOUT_FACTOR="6"
            ;;
        slowdebug)
            TIMEOUT_FACTOR="12"
            ;;
However, I run with a concurrency factor of 16. I rarely see
timeout failures.
All that is well and good, but a 'jps' test is a different
beast since it is affected by factors outside of most tests.
Since the number of java processes running on the system can
affect the test, I'm okay with using a default timeout of '360'
for this test as a guard against increases in load or...
Gary saw a longest time value of 280 seconds in his testing with
fastdebug bits. The default timeout value is 120 seconds with a
default timeout factor of 4 in Mach5 gives a total timeout of
480 seconds (8 minutes). With Gary's new default timeout of
360 seconds, we'll have a total timeout value of 1440 seconds
(24 minutes).
Before the recent changes to Mach5, the default timeout factor
was 10 so we had a total timeout value of 1200 seconds (20
minutes). Gary's change only bumps the total timeout value
by 4 minutes from what we had back in August... when this bug
was filed... unfortunately, the log for that sighting is gone
so all we know is that the test timed out after 20 minutes
Post by Gary Adams
Timeout refired 1200 times
We know that JTREG timeout handling kicks in at timeout expiration,
but without the log we don't know how much longer than 20 minutes
the test ran before JTREG killed it. 24 minutes might do it, but...
Dan
Post by Gary Adams
Post by David Holmes
Cheers,
David
Daniel D. Daugherty
2018-12-10 16:24:41 UTC
Permalink
Post by David Holmes
Hi Dan,
Post by Daniel D. Daugherty
Gary saw a longest time value of 280 seconds in his testing with
fastdebug bits. The default timeout value is 120 seconds with a
default timeout factor of 4 in Mach5 gives a total timeout of
480 seconds (8 minutes). With Gary's new default timeout of
360 seconds, we'll have a total timeout value of 1440 seconds
(24 minutes).
So based on that it seems Gary's observations are not relevant to what
has been seen in mach5 because at 280 seconds we should never have
been timing out!
Correct. Gary's test runs never ran into the same state as the
end-of-August run where the test ran into the 20 minute timeout.
Since those logs are now gone, we cannot discern any more information
about what happened back then...

Dan
Post by David Holmes
I guess these changes won't hurt.
Thanks,
David
Post by Daniel D. Daugherty
Post by Gary Adams
Post by David Holmes
Hi Gary,
Post by Gary Adams
On a local linux-x64-debug build this test consistently runs in
less than 40 seconds.
On the mach5 testing machines there was a large fluctuation in the
time to complete this test.
Since the test runs jps with different combinations of arguments, the total
test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.
I've run the test several hundred times through mach5 on the
slower platforms
and then examined the test logs using a 3X setting from the
default 120 second
jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.
To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.
   Issue: https://bugs.openjdk.java.net/browse/JDK-8210106
diff --git a/test/jdk/sun/tools/jps/TestJps.java
b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
   */
Doesn't that then get scaled by the timeout factor resulting in a
much longer timeout than the 360 seconds you intended?
For other timeout adjustments the needed time has been divided by
the timeout factor to get the actual intended timeout.
This bug was filed fairly recently in Aug 2018.
At that time the timeout and timeout factor were not sufficient
to avoid the test failing.
The mach5 timeout factors were adjusted recently, so this test may
no longer be an issue.
If that is true, then we could simply close this bug as "cannot reproduce".
An argument could be made that the change in timeout factor may be
responsible for fixing a lot more of the intermittent bugs and that they
should be closed in a similar manner.
Historically, we could say this particular bug should have had timeouts
reassessed when the infrastructure switched from jprt to mach5 testing
where there were more visible Java processes running.
Using a higher explicit timeout will not make the test run any longer
than it needs. It will simply allow the test to not be terminated sooner
in a hung test scenario.
   - increase the explicit timeout
   - close as cannot reproduce attributed to the timeout factor
adjustments
   - determine a new explicit timeout
   - close if no timeout failures have been observed since the
timeout factor
      was raised
General guidance that I've been giving folks is that you bump the
default timeout when you see timeouts with 'release' bits with
a timeout factor of '1'. The default timeout factor in Mach5 is 4
so we run both 'release' and 'fastdebug' bits with that timeout
factor. That means our total timeout value for 'release' bits is
probably a little long and our total timeout value for 'fastdebug'
bits is probably just right for a concurrency factor of 12.
         release)
             TIMEOUT_FACTOR="4"
             ;;
         fastdebug)
             TIMEOUT_FACTOR="6"
             ;;
         slowdebug)
             TIMEOUT_FACTOR="12"
             ;;
However, I run with a concurrency factor of 16. I rarely see
timeout failures.
All that is well and good, but a 'jps' test is a different
beast since it is affected by factors outside of most tests.
Since the number of java processes running on the system can
affect the test, I'm okay with using a default timeout of '360'
for this test as a guard against increases in load or...
Gary saw a longest time value of 280 seconds in his testing with
fastdebug bits. The default timeout value is 120 seconds with a
default timeout factor of 4 in Mach5 gives a total timeout of
480 seconds (8 minutes). With Gary's new default timeout of
360 seconds, we'll have a total timeout value of 1440 seconds
(24 minutes).
Before the recent changes to Mach5, the default timeout factor
was 10 so we had a total timeout value of 1200 seconds (20
minutes). Gary's change only bumps the total timeout value
by 4 minutes from what we had back in August... when this bug
was filed... unfortunately, the log for that sighting is gone
so all we know is that the test timed out after 20 minutes
 > Timeout refired 1200 times
We know that JTREG timeout handling kicks in at timeout expiration,
but without the log we don't know how much longer than 20 minutes
the test ran before JTREG killed it. 24 minutes might do it, but...
Dan
Post by Gary Adams
Post by David Holmes
Cheers,
David
Loading...