Discussion:
RFR(S) 8021335: Missing synchronization when reading counters for live threads and peak thread count
(too old to reply)
d***@oracle.com
2018-10-16 16:51:43 UTC
Permalink
https://bugs.openjdk.java.net/browse/JDK-8021335
http://cr.openjdk.java.net/~dlong/8021335/webrev.3/

This change attempts to fix an old and intermittent problem with
ThreadMXBean thread counts.
Changes have 3 main parts:
1. Make sure hidden threads don't affect counts (even when exiting)
2. Fix race reading counts using atomic counters
3. Remove some workarounds in test (because they hide bugs)

dl
JC Beyler
2018-10-16 17:28:40 UTC
Permalink
Hi Dean,

I noticed a typo here :
"are atomically" is in the comment but should no longer be there I believe;
the sentence probably should be:
// These 2 counters are like the above thread counts, but are

Any way we could move this test into a helper method and both
add_thread/current_thread_exiting
could use the same "ignore" thread code so that we ensure the two never get
out of sync?

+ if (jt->is_hidden_from_external_view() ||
+ jt->is_jvmti_agent_thread()) {
+ return;
+ }

(Also by curiosity, add_thread has an assert on the Threads_lock but not
thread_exiting?)

Thanks,
Jc
Post by d***@oracle.com
https://bugs.openjdk.java.net/browse/JDK-8021335
http://cr.openjdk.java.net/~dlong/8021335/webrev.3/
This change attempts to fix an old and intermittent problem with
ThreadMXBean thread counts.
1. Make sure hidden threads don't affect counts (even when exiting)
2. Fix race reading counts using atomic counters
3. Remove some workarounds in test (because they hide bugs)
dl
--
Thanks,
Jc
d***@oracle.com
2018-10-16 18:59:38 UTC
Permalink
Post by JC Beyler
Hi Dean,
"are atomically" is in the comment but should no longer be there I
// These 2 counters are like the above thread counts, but are
Fixed!
Post by JC Beyler
Any way we could move this test into a helper method and both
add_thread/current_thread_exiting could use the same "ignore" thread
code so that we ensure the two never get out of sync?
+  if (jt->is_hidden_from_external_view() ||
+      jt->is_jvmti_agent_thread()) {
+    return;
+  }
Good idea.  Fixed.
Post by JC Beyler
(Also by curiosity, add_thread has an assert on the Threads_lock but
not thread_exiting?)
Right, I followed the existing pattern where current_thread_exiting()
only uses the atomic counters, so it doesn't need Threads_lock.

dl
Post by JC Beyler
Thanks,
Jc
https://bugs.openjdk.java.net/browse/JDK-8021335
http://cr.openjdk.java.net/~dlong/8021335/webrev.3/
<http://cr.openjdk.java.net/%7Edlong/8021335/webrev.3/>
This change attempts to fix an old and intermittent problem with
ThreadMXBean thread counts.
1. Make sure hidden threads don't affect counts (even when exiting)
2. Fix race reading counts using atomic counters
3. Remove some workarounds in test (because they hide bugs)
dl
--
Thanks,
Jc
d***@oracle.com
2018-10-16 23:43:16 UTC
Permalink
New webrev:

http://cr.openjdk.java.net/~dlong/8021335/webrev.4/

dl
Post by d***@oracle.com
Post by JC Beyler
Hi Dean,
"are atomically" is in the comment but should no longer be there I
// These 2 counters are like the above thread counts, but are
Fixed!
Post by JC Beyler
Any way we could move this test into a helper method and both
add_thread/current_thread_exiting could use the same "ignore" thread
code so that we ensure the two never get out of sync?
+  if (jt->is_hidden_from_external_view() ||
+      jt->is_jvmti_agent_thread()) {
+    return;
+  }
Good idea.  Fixed.
Post by JC Beyler
(Also by curiosity, add_thread has an assert on the Threads_lock but
not thread_exiting?)
Right, I followed the existing pattern where current_thread_exiting()
only uses the atomic counters, so it doesn't need Threads_lock.
dl
Post by JC Beyler
Thanks,
Jc
https://bugs.openjdk.java.net/browse/JDK-8021335
http://cr.openjdk.java.net/~dlong/8021335/webrev.3/
<http://cr.openjdk.java.net/%7Edlong/8021335/webrev.3/>
This change attempts to fix an old and intermittent problem with
ThreadMXBean thread counts.
1. Make sure hidden threads don't affect counts (even when exiting)
2. Fix race reading counts using atomic counters
3. Remove some workarounds in test (because they hide bugs)
dl
--
Thanks,
Jc
David Holmes
2018-10-17 02:33:55 UTC
Permalink
Hi Dean,

Thanks for tackling this.

I'm still struggling to fully grasp why we need both the PerfCounters
and the regular counters. I get that we have to decrement the live
counts before ensure_join() has allowed Thread.join() to return, to
ensure that if we then check the number of threads it has dropped by
one. But I don't understand why that means we need to manage the thread
count in two parts. Particularly as now you don't use the PerfCounter to
return the live count, so it makes me wonder what role the PerfCounter
is playing as it is temporarily inconsistent with the reported live
count? Is it simply that we can't atomically decrement the PerfCounters,
and we can't require the Threads_lock when decrementing?

On the code itself one thing I find irksome is that we already have a
daemon flag in remove_thread but decrement_thread_counts doesn't get
passed that and so always re-examines the thread to see if it is a
daemon. I know only one of the code paths has the daemon flag, so
perhaps we should hoist the daemon check up into JavaThread::exit and
then pass a daemon parameter to decrement_thread_counts.

There's also some ambiguity (existing) in relation to the existence of
the jt->threadObj() and its affect on whether the thread is considered a
daemon or not. Your check has to mirror the existing checks - the
threadObj may be NULL at removal if it was a failed attach, but the
addition considers a JNI-attached thread to be non-daemon. Overall this
seems buggy to me but as long as your check follows the same rules that
is okay. In that regard JavaThread::exit should never encounter a NULL
threadObj().

Minor nit: I suggest moving the two occurrences of the comment

// Do not count VM internal or JVMTI agent threads

inside is_hidden_thread so that we don't have to remember to update the
comments if the definition changes.

Thanks,
David
-----
Post by d***@oracle.com
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
dl
Post by d***@oracle.com
Post by JC Beyler
Hi Dean,
"are atomically" is in the comment but should no longer be there I
// These 2 counters are like the above thread counts, but are
Fixed!
Post by JC Beyler
Any way we could move this test into a helper method and both
add_thread/current_thread_exiting could use the same "ignore" thread
code so that we ensure the two never get out of sync?
+  if (jt->is_hidden_from_external_view() ||
+      jt->is_jvmti_agent_thread()) {
+    return;
+  }
Good idea.  Fixed.
Post by JC Beyler
(Also by curiosity, add_thread has an assert on the Threads_lock but
not thread_exiting?)
Right, I followed the existing pattern where current_thread_exiting()
only uses the atomic counters, so it doesn't need Threads_lock.
dl
Post by JC Beyler
Thanks,
Jc
https://bugs.openjdk.java.net/browse/JDK-8021335
http://cr.openjdk.java.net/~dlong/8021335/webrev.3/
<http://cr.openjdk.java.net/%7Edlong/8021335/webrev.3/>
This change attempts to fix an old and intermittent problem with
ThreadMXBean thread counts.
1. Make sure hidden threads don't affect counts (even when exiting)
2. Fix race reading counts using atomic counters
3. Remove some workarounds in test (because they hide bugs)
dl
--
Thanks,
Jc
Mandy Chung
2018-10-17 20:41:55 UTC
Permalink
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the PerfCounters
and the regular counters. I get that we have to decrement the live
counts before ensure_join() has allowed Thread.join() to return, to
ensure that if we then check the number of threads it has dropped by
one. But I don't understand why that means we need to manage the
thread count in two parts. Particularly as now you don't use the
PerfCounter to return the live count, so it makes me wonder what role
the PerfCounter is playing as it is temporarily inconsistent with the
reported live count?
Perf counters were added long time back in JDK 1.4.2 for performance
measurement before java.lang.management API.  One can use jstat tool to
monitor VM perf counters of a running VM.   One could look into the
possibility of deprecating these counters and remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed, it's
holding Threads_lock.  Are the asserts in ThreadService::remove_thread
necessary?

For clarify, I think we could simply set _live_threads_count to the
value of _atomic_threads_count and set _daemon_threads_count to the
value of _atomic_daemon_threads_count.

Mandy
d***@oracle.com
2018-10-17 21:13:57 UTC
Permalink
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the PerfCounters
and the regular counters. I get that we have to decrement the live
counts before ensure_join() has allowed Thread.join() to return, to
ensure that if we then check the number of threads it has dropped by
one. But I don't understand why that means we need to manage the
thread count in two parts. Particularly as now you don't use the
PerfCounter to return the live count, so it makes me wonder what role
the PerfCounter is playing as it is temporarily inconsistent with the
reported live count?
Perf counters were added long time back in JDK 1.4.2 for performance
measurement before java.lang.management API.  One can use jstat tool
to monitor VM perf counters of a running VM.   One could look into the
possibility of deprecating these counters and remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed,
it's holding Threads_lock.  Are the asserts in
ThreadService::remove_thread necessary?
Not really.  They were intended to catch the case where the atomic
counters weren't decremented for some reason, not for the perf counters.
Should I remove them?
Post by Mandy Chung
For clarify, I think we could simply set _live_threads_count to the
value of _atomic_threads_count and set _daemon_threads_count to the
value of _atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts() without
holding the Threads_lock.  If you agree, I'll make that change.

dl
Post by Mandy Chung
Mandy
Mandy Chung
2018-10-17 21:38:41 UTC
Permalink
Post by d***@oracle.com
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the
PerfCounters and the regular counters. I get that we have to
decrement the live counts before ensure_join() has allowed
Thread.join() to return, to ensure that if we then check the number
of threads it has dropped by one. But I don't understand why that
means we need to manage the thread count in two parts. Particularly
as now you don't use the PerfCounter to return the live count, so it
makes me wonder what role the PerfCounter is playing as it is
temporarily inconsistent with the reported live count?
Perf counters were added long time back in JDK 1.4.2 for performance
measurement before java.lang.management API.  One can use jstat tool
to monitor VM perf counters of a running VM.   One could look into
the possibility of deprecating these counters and remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed,
it's holding Threads_lock.  Are the asserts in
ThreadService::remove_thread necessary?
Not really.  They were intended to catch the case where the atomic
counters weren't decremented for some reason, not for the perf counters.
Should I remove them?
Hmm... when remove_thread is called but decrement_thread_counts has not
been called.   It's a bug in thread accounting.  It happens to have the
perf counters that can be compared to assert.  It seems not obvious. 
Setting the perf counters same values as _atomic_threads_count and
_atomic_daemon_threads_count makes sense to me.

I would opt for removing the asserts but I can't think of an alternative
how to catch the issue you concern about.
Post by d***@oracle.com
Post by Mandy Chung
For clarify, I think we could simply set _live_threads_count to the
value of _atomic_threads_count and set _daemon_threads_count to the
value of _atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts() without
holding the Threads_lock.  If you agree, I'll make that change.
+1

Mandy
d***@oracle.com
2018-10-17 22:18:32 UTC
Permalink
Post by Mandy Chung
Post by d***@oracle.com
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the
PerfCounters and the regular counters. I get that we have to
decrement the live counts before ensure_join() has allowed
Thread.join() to return, to ensure that if we then check the number
of threads it has dropped by one. But I don't understand why that
means we need to manage the thread count in two parts. Particularly
as now you don't use the PerfCounter to return the live count, so
it makes me wonder what role the PerfCounter is playing as it is
temporarily inconsistent with the reported live count?
Perf counters were added long time back in JDK 1.4.2 for performance
measurement before java.lang.management API.  One can use jstat tool
to monitor VM perf counters of a running VM.   One could look into
the possibility of deprecating these counters and remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed,
it's holding Threads_lock.  Are the asserts in
ThreadService::remove_thread necessary?
Not really.  They were intended to catch the case where the atomic
counters weren't decremented for some reason, not for the perf counters.
Should I remove them?
Hmm... when remove_thread is called but decrement_thread_counts has
not been called.   It's a bug in thread accounting.  It happens to
have the perf counters that can be compared to assert. It seems not
obvious.  Setting the perf counters same values as
_atomic_threads_count and _atomic_daemon_threads_count makes sense to me.
I would opt for removing the asserts but I can't think of an
alternative how to catch the issue you concern about.
Post by d***@oracle.com
Post by Mandy Chung
For clarify, I think we could simply set _live_threads_count to the
value of _atomic_threads_count and set _daemon_threads_count to the
value of _atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts() without
holding the Threads_lock.  If you agree, I'll make that change.
+1
New webrevs, full and incremental:

http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/

I like it better without all the asserts too.

dl
Post by Mandy Chung
Mandy
JC Beyler
2018-10-18 00:06:40 UTC
Permalink
Hi Dean,

The new webrev looks much better :) LGTM (not a reviewer though :-)).

Thanks,
Jc
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the PerfCounters and
the regular counters. I get that we have to decrement the live counts
before ensure_join() has allowed Thread.join() to return, to ensure that if
we then check the number of threads it has dropped by one. But I don't
understand why that means we need to manage the thread count in two parts.
Particularly as now you don't use the PerfCounter to return the live count,
so it makes me wonder what role the PerfCounter is playing as it is
temporarily inconsistent with the reported live count?
Perf counters were added long time back in JDK 1.4.2 for performance
measurement before java.lang.management API. One can use jstat tool to
monitor VM perf counters of a running VM. One could look into the
possibility of deprecating these counters and remove them over time.
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed, it's
holding Threads_lock. Are the asserts in ThreadService::remove_thread
necessary?
Not really. They were intended to catch the case where the atomic
counters weren't decremented for some reason, not for the perf counters.
Should I remove them?
Hmm... when remove_thread is called but decrement_thread_counts has not
been called. It's a bug in thread accounting. It happens to have the
perf counters that can be compared to assert. It seems not obvious.
Setting the perf counters same values as _atomic_threads_count and
_atomic_daemon_threads_count makes sense to me.
I would opt for removing the asserts but I can't think of an alternative
how to catch the issue you concern about.
For clarify, I think we could simply set _live_threads_count to the value
of _atomic_threads_count and set _daemon_threads_count to the value of
_atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts() without holding
the Threads_lock. If you agree, I'll make that change.
+1
http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/
I like it better without all the asserts too.
dl
Mandy
--
Thanks,
Jc
d***@oracle.com
2018-10-18 00:38:14 UTC
Permalink
Thanks JC!

dl
Post by JC Beyler
Hi Dean,
The new webrev looks much better :) LGTM (not a reviewer though :-)).
Thanks,
Jc
Post by Mandy Chung
Post by d***@oracle.com
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the
PerfCounters and the regular counters. I get that we have to
decrement the live counts before ensure_join() has allowed
Thread.join() to return, to ensure that if we then check the
number of threads it has dropped by one. But I don't
understand why that means we need to manage the thread count
in two parts. Particularly as now you don't use the
PerfCounter to return the live count, so it makes me wonder
what role the PerfCounter is playing as it is temporarily
inconsistent with the reported live count?
Perf counters were added long time back in JDK 1.4.2 for
performance measurement before java.lang.management API.  One
can use jstat tool to monitor VM perf counters of a running
VM.   One could look into the possibility of deprecating these
counters and remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
<http://cr.openjdk.java.net/%7Edlong/8021335/webrev.4/>
When the perf counters are updated when a thread is
added/removed, it's holding Threads_lock.  Are the asserts in
ThreadService::remove_thread necessary?
Not really.  They were intended to catch the case where the
atomic counters weren't decremented for some reason, not for the
perf counters.
Should I remove them?
Hmm... when remove_thread is called but decrement_thread_counts
has not been called.   It's a bug in thread accounting.  It
happens to have the perf counters that can be compared to
assert.  It seems not obvious.  Setting the perf counters same
values as _atomic_threads_count and _atomic_daemon_threads_count
makes sense to me.
I would opt for removing the asserts but I can't think of an
alternative how to catch the issue you concern about.
Post by d***@oracle.com
Post by Mandy Chung
For clarify, I think we could simply set _live_threads_count to
the value of _atomic_threads_count and set
_daemon_threads_count to the value of _atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts()
without holding the Threads_lock.  If you agree, I'll make that
change.
+1
http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
<http://cr.openjdk.java.net/%7Edlong/8021335/webrev.6/>
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/
<http://cr.openjdk.java.net/%7Edlong/8021335/webrev.6.diff/>
I like it better without all the asserts too.
dl
Post by Mandy Chung
Mandy
--
Thanks,
Jc
Mandy Chung
2018-10-18 01:35:21 UTC
Permalink
Post by d***@oracle.com
http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/
I like it better without all the asserts too.
Looks good to me.

Mandy
d***@oracle.com
2018-10-18 03:09:18 UTC
Permalink
Thanks, Mandy!

dl
Post by Mandy Chung
Post by d***@oracle.com
http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/
I like it better without all the asserts too.
Looks good to me.
Mandy
David Holmes
2018-10-18 02:07:27 UTC
Permalink
Hi Dean,

This still seems racy to me. We increment all counts under the
Threads_lock but we still decrement without the Threads_lock. So we can
lose updates to the perfCounters.

117 _total_threads_count->inc();
118 Atomic::inc(&_atomic_threads_count);
119 int count = _atomic_threads_count;
<= context switch here
120 _live_threads_count->set_value(count);

If a second thread now exits while the above thread is descheduled, it
will decrement _atomic_threads_count and _live_threads_count, but when
the first thread resumes at line 120 above we will set
_live_threads_count to the wrong value!

You can't maintain two counters in sync without all changes using
locking across both.

David
Post by d***@oracle.com
Post by Mandy Chung
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the
PerfCounters and the regular counters. I get that we have to
decrement the live counts before ensure_join() has allowed
Thread.join() to return, to ensure that if we then check the number
of threads it has dropped by one. But I don't understand why that
means we need to manage the thread count in two parts. Particularly
as now you don't use the PerfCounter to return the live count, so
it makes me wonder what role the PerfCounter is playing as it is
temporarily inconsistent with the reported live count?
Perf counters were added long time back in JDK 1.4.2 for performance
measurement before java.lang.management API.  One can use jstat tool
to monitor VM perf counters of a running VM.   One could look into
the possibility of deprecating these counters and remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed,
it's holding Threads_lock.  Are the asserts in
ThreadService::remove_thread necessary?
Not really.  They were intended to catch the case where the atomic
counters weren't decremented for some reason, not for the perf counters.
Should I remove them?
Hmm... when remove_thread is called but decrement_thread_counts has
not been called.   It's a bug in thread accounting.  It happens to
have the perf counters that can be compared to assert. It seems not
obvious.  Setting the perf counters same values as
_atomic_threads_count and _atomic_daemon_threads_count makes sense to me.
I would opt for removing the asserts but I can't think of an
alternative how to catch the issue you concern about.
Post by Mandy Chung
For clarify, I think we could simply set _live_threads_count to the
value of _atomic_threads_count and set _daemon_threads_count to the
value of _atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts() without
holding the Threads_lock.  If you agree, I'll make that change.
+1
http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/
I like it better without all the asserts too.
dl
Post by Mandy Chung
Mandy
d***@oracle.com
2018-10-18 04:06:41 UTC
Permalink
Post by JC Beyler
Hi Dean,
This still seems racy to me. We increment all counts under the
Threads_lock but we still decrement without the Threads_lock. So we
can lose updates to the perfCounters.
 117   _total_threads_count->inc();
 118   Atomic::inc(&_atomic_threads_count);
 119   int count = _atomic_threads_count;
<= context switch here
 120   _live_threads_count->set_value(count);
If a second thread now exits while the above thread is descheduled, it
will decrement _atomic_threads_count and _live_threads_count, but when
the first thread resumes at line 120 above we will set
_live_threads_count to the wrong value!
You can't maintain two counters in sync without all changes using
locking across both.
You're right, I missed that.  I think the right thing to do is call
current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters.  So, here's one
more try:

    http://cr.openjdk.java.net/~dlong/8021335/webrev.7/

dl
Post by JC Beyler
David
Post by d***@oracle.com
Post by Mandy Chung
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the
PerfCounters and the regular counters. I get that we have to
decrement the live counts before ensure_join() has allowed
Thread.join() to return, to ensure that if we then check the
number of threads it has dropped by one. But I don't understand
why that means we need to manage the thread count in two parts.
Particularly as now you don't use the PerfCounter to return the
live count, so it makes me wonder what role the PerfCounter is
playing as it is temporarily inconsistent with the reported live
count?
Perf counters were added long time back in JDK 1.4.2 for
performance measurement before java.lang.management API. One can
use jstat tool to monitor VM perf counters of a running VM.   One
could look into the possibility of deprecating these counters and
remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed,
it's holding Threads_lock.  Are the asserts in
ThreadService::remove_thread necessary?
Not really.  They were intended to catch the case where the atomic
counters weren't decremented for some reason, not for the perf counters.
Should I remove them?
Hmm... when remove_thread is called but decrement_thread_counts has
not been called.   It's a bug in thread accounting.  It happens to
have the perf counters that can be compared to assert. It seems not
obvious.  Setting the perf counters same values as
_atomic_threads_count and _atomic_daemon_threads_count makes sense to me.
I would opt for removing the asserts but I can't think of an
alternative how to catch the issue you concern about.
Post by Mandy Chung
For clarify, I think we could simply set _live_threads_count to
the value of _atomic_threads_count and set _daemon_threads_count
to the value of _atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts() without
holding the Threads_lock.  If you agree, I'll make that change.
+1
http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/
I like it better without all the asserts too.
dl
Post by Mandy Chung
Mandy
David Holmes
2018-10-18 07:27:29 UTC
Permalink
Hi Dean,
Post by d***@oracle.com
Post by JC Beyler
Hi Dean,
This still seems racy to me. We increment all counts under the
Threads_lock but we still decrement without the Threads_lock. So we
can lose updates to the perfCounters.
 117   _total_threads_count->inc();
 118   Atomic::inc(&_atomic_threads_count);
 119   int count = _atomic_threads_count;
<= context switch here
 120   _live_threads_count->set_value(count);
If a second thread now exits while the above thread is descheduled, it
will decrement _atomic_threads_count and _live_threads_count, but when
the first thread resumes at line 120 above we will set
_live_threads_count to the wrong value!
You can't maintain two counters in sync without all changes using
locking across both.
You're right, I missed that.  I think the right thing to do is call
current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters.  So, here's one
    http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't require split
counts. So I have to ask Mandy if she recalls why this approach wasn't
taken 15 years ago when the exit counts were added as part of:

https://bugs.openjdk.java.net/browse/JDK-4530538 ?

Does taking the Threads_lock here cost too much and cause a thread
termination bottleneck?

Thanks,
David
-----
Post by d***@oracle.com
dl
Post by JC Beyler
David
Post by d***@oracle.com
Post by Mandy Chung
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the
PerfCounters and the regular counters. I get that we have to
decrement the live counts before ensure_join() has allowed
Thread.join() to return, to ensure that if we then check the
number of threads it has dropped by one. But I don't understand
why that means we need to manage the thread count in two parts.
Particularly as now you don't use the PerfCounter to return the
live count, so it makes me wonder what role the PerfCounter is
playing as it is temporarily inconsistent with the reported live
count?
Perf counters were added long time back in JDK 1.4.2 for
performance measurement before java.lang.management API. One can
use jstat tool to monitor VM perf counters of a running VM.   One
could look into the possibility of deprecating these counters and
remove them over time.
Post by JC Beyler
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
When the perf counters are updated when a thread is added/removed,
it's holding Threads_lock.  Are the asserts in
ThreadService::remove_thread necessary?
Not really.  They were intended to catch the case where the atomic
counters weren't decremented for some reason, not for the perf counters.
Should I remove them?
Hmm... when remove_thread is called but decrement_thread_counts has
not been called.   It's a bug in thread accounting.  It happens to
have the perf counters that can be compared to assert. It seems not
obvious.  Setting the perf counters same values as
_atomic_threads_count and _atomic_daemon_threads_count makes sense to me.
I would opt for removing the asserts but I can't think of an
alternative how to catch the issue you concern about.
Post by Mandy Chung
For clarify, I think we could simply set _live_threads_count to
the value of _atomic_threads_count and set _daemon_threads_count
to the value of _atomic_daemon_threads_count.
I think that works, even inside decrement_thread_counts() without
holding the Threads_lock.  If you agree, I'll make that change.
+1
http://cr.openjdk.java.net/~dlong/8021335/webrev.6/
http://cr.openjdk.java.net/~dlong/8021335/webrev.6.diff/
I like it better without all the asserts too.
dl
Post by Mandy Chung
Mandy
Mandy Chung
2018-10-19 01:12:31 UTC
Permalink
Post by JC Beyler
Hi Dean,
You're right, I missed that.  I think the right thing to do is call
current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters.  So, here's one
     http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't require
split counts. So I have to ask Mandy if she recalls why this approach
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a thread
termination bottleneck?
If the contention on Threads_lock is not high (that seems to me), it
should be okay.   I'm not close to the VM implementation (lot of changes
since then) and I don't have a definitive answer unless I study the code
closely.   You and others have a better judgement on this.

AFAICT the change is okay.

Mandy
d***@oracle.com
2018-10-20 03:28:48 UTC
Permalink
Post by Mandy Chung
Post by JC Beyler
Hi Dean,
You're right, I missed that.  I think the right thing to do is call
current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters.  So, here's one
http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't require
split counts. So I have to ask Mandy if she recalls why this approach
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a thread
termination bottleneck?
If the contention on Threads_lock is not high (that seems to me), it
should be okay.   I'm not close to the VM implementation (lot of
changes since then) and I don't have a definitive answer unless I
study the code closely.   You and others have a better judgement on this.
AFAICT the change is okay.
Thanks Mandy.  David, OK to push?

dl
Post by Mandy Chung
Mandy
David Holmes
2018-10-22 22:31:10 UTC
Permalink
Sorry Dean I'm concerned about a thread termination bottleneck with
this. A simple microbenchmark that creates 500,000 threads that have to
run and terminate, shows a 15+% slowdown on my Linux box. I tried to
find some kind of real benchmarks that covers thread termination but
couldn't see anything specific.

Can you at least run this through our performance system to see if any
of the regular benchmarks are affected.

Thanks,
David
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
You're right, I missed that.  I think the right thing to do is call
current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters.  So, here's one
http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't require
split counts. So I have to ask Mandy if she recalls why this approach
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a thread
termination bottleneck?
If the contention on Threads_lock is not high (that seems to me), it
should be okay.   I'm not close to the VM implementation (lot of
changes since then) and I don't have a definitive answer unless I
study the code closely.   You and others have a better judgement on this.
AFAICT the change is okay.
Thanks Mandy.  David, OK to push?
dl
Mandy
d***@oracle.com
2018-10-23 16:46:48 UTC
Permalink
Post by David Holmes
Sorry Dean I'm concerned about a thread termination bottleneck with
this. A simple microbenchmark that creates 500,000 threads that have
to run and terminate, shows a 15+% slowdown on my Linux box. I tried
to find some kind of real benchmarks that covers thread termination
but couldn't see anything specific.
Can you at least run this through our performance system to see if any
of the regular benchmarks are affected.
OK, but even if the regular benchmarks don't show a difference, I'd feel
better if microbenchmarks were not affected.  What if I go back to the
original approach and add locking:

   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }

along with the other cleanups around is_daemon and is_hidden_thread?

dl
Post by David Holmes
Thanks,
David
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
You're right, I missed that.  I think the right thing to do is
call current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters.  So, here's
http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't require
split counts. So I have to ask Mandy if she recalls why this
approach wasn't taken 15 years ago when the exit counts were added
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a thread
termination bottleneck?
If the contention on Threads_lock is not high (that seems to me), it
should be okay.   I'm not close to the VM implementation (lot of
changes since then) and I don't have a definitive answer unless I
study the code closely.   You and others have a better judgement on this.
AFAICT the change is okay.
Thanks Mandy.  David, OK to push?
dl
Mandy
d***@oracle.com
2018-10-23 18:05:00 UTC
Permalink
Post by d***@oracle.com
Post by David Holmes
Sorry Dean I'm concerned about a thread termination bottleneck with
this. A simple microbenchmark that creates 500,000 threads that have
to run and terminate, shows a 15+% slowdown on my Linux box. I tried
to find some kind of real benchmarks that covers thread termination
but couldn't see anything specific.
Can you at least run this through our performance system to see if
any of the regular benchmarks are affected.
OK, but even if the regular benchmarks don't show a difference, I'd
feel better if microbenchmarks were not affected.  What if I go back
   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }
along with the other cleanups around is_daemon and is_hidden_thread?
Some micro-benchmarks like SecureRandomBench show a regression with
webrev.7.  I could go back to webrev.5 and then we shouldn't need any
locking in the get_*() functions.

dl
Post by d***@oracle.com
dl
Post by David Holmes
Thanks,
David
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
You're right, I missed that.  I think the right thing to do is
call current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters. So, here's
http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't require
split counts. So I have to ask Mandy if she recalls why this
approach wasn't taken 15 years ago when the exit counts were added
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a thread
termination bottleneck?
If the contention on Threads_lock is not high (that seems to me),
it should be okay.   I'm not close to the VM implementation (lot of
changes since then) and I don't have a definitive answer unless I
study the code closely.   You and others have a better judgement on this.
AFAICT the change is okay.
Thanks Mandy.  David, OK to push?
dl
Mandy
David Holmes
2018-10-23 21:51:34 UTC
Permalink
Hi Dean,
Post by d***@oracle.com
Post by d***@oracle.com
Post by David Holmes
Sorry Dean I'm concerned about a thread termination bottleneck with
this. A simple microbenchmark that creates 500,000 threads that have
to run and terminate, shows a 15+% slowdown on my Linux box. I tried
to find some kind of real benchmarks that covers thread termination
but couldn't see anything specific.
Can you at least run this through our performance system to see if
any of the regular benchmarks are affected.
OK, but even if the regular benchmarks don't show a difference, I'd
feel better if microbenchmarks were not affected.  What if I go back
   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }
along with the other cleanups around is_daemon and is_hidden_thread?
Some micro-benchmarks like SecureRandomBench show a regression with
webrev.7.  I could go back to webrev.5 and then we shouldn't need any
locking in the get_*() functions.
I don't see version 5 discussed but I took a look and it seems okay. My
only query with that version is that it appears the actual perfCounters
no longer get read by anything - is that the case?

Thanks,
David
Post by d***@oracle.com
dl
Post by d***@oracle.com
dl
Post by David Holmes
Thanks,
David
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
You're right, I missed that.  I think the right thing to do is
call current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters. So, here's
http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't require
split counts. So I have to ask Mandy if she recalls why this
approach wasn't taken 15 years ago when the exit counts were added
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a thread
termination bottleneck?
If the contention on Threads_lock is not high (that seems to me),
it should be okay.   I'm not close to the VM implementation (lot of
changes since then) and I don't have a definitive answer unless I
study the code closely.   You and others have a better judgement on this.
AFAICT the change is okay.
Thanks Mandy.  David, OK to push?
dl
Mandy
d***@oracle.com
2018-10-23 22:30:35 UTC
Permalink
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
Post by d***@oracle.com
Post by David Holmes
Sorry Dean I'm concerned about a thread termination bottleneck with
this. A simple microbenchmark that creates 500,000 threads that
have to run and terminate, shows a 15+% slowdown on my Linux box. I
tried to find some kind of real benchmarks that covers thread
termination but couldn't see anything specific.
Can you at least run this through our performance system to see if
any of the regular benchmarks are affected.
OK, but even if the regular benchmarks don't show a difference, I'd
feel better if microbenchmarks were not affected.  What if I go back
   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }
along with the other cleanups around is_daemon and is_hidden_thread?
Some micro-benchmarks like SecureRandomBench show a regression with
webrev.7.  I could go back to webrev.5 and then we shouldn't need any
locking in the get_*() functions.
I don't see version 5 discussed but I took a look and it seems okay.
Mandy had questions about the asserts in .5, and it seemed like we could
just set the perf counters to the same value as the atomic counters,
which resulted in .6.  I think the only problem with .6 is that I set
the perf counters in decrement_thread_counts without the lock.  If I
"sync" the perf counters to the atomic counters only in add_thread and
remove_thread, with the lock, then it's about the same as .5, but
without the asserts and parallel inc/dec.  If anyone likes the sound of
that, I can send out a new webrev.  Or we can go with webrev.5.
Post by JC Beyler
My only query with that version is that it appears the actual
perfCounters no longer get read by anything - is that the case?
There does seem to be code that references them, through their name
string.  That's a difference interface that I'm not familiar with, so I
didn't want to break it.

dl
Post by JC Beyler
Thanks,
David
Post by d***@oracle.com
dl
Post by d***@oracle.com
dl
Post by David Holmes
Thanks,
David
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
You're right, I missed that.  I think the right thing to do is
call current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters. So, here's
http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't
require split counts. So I have to ask Mandy if she recalls why
this approach wasn't taken 15 years ago when the exit counts
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a
thread termination bottleneck?
If the contention on Threads_lock is not high (that seems to me),
it should be okay.   I'm not close to the VM implementation (lot
of changes since then) and I don't have a definitive answer
unless I study the code closely.   You and others have a better
judgement on this.
AFAICT the change is okay.
Thanks Mandy.  David, OK to push?
dl
Mandy
Mandy Chung
2018-10-24 03:22:15 UTC
Permalink
Post by d***@oracle.com
Post by David Holmes
Post by d***@oracle.com
Post by d***@oracle.com
OK, but even if the regular benchmarks don't show a difference, I'd
feel better if microbenchmarks were not affected.  What if I go
   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }
along with the other cleanups around is_daemon and is_hidden_thread?
Some micro-benchmarks like SecureRandomBench show a regression with
webrev.7.  I could go back to webrev.5 and then we shouldn't need
any locking in the get_*() functions.
I don't see version 5 discussed but I took a look and it seems okay.
Mandy had questions about the asserts in .5, and it seemed like we
could just set the perf counters to the same value as the atomic
counters, which resulted in .6.  I think the only problem with .6 is
that I set the perf counters in decrement_thread_counts without the
lock.  If I "sync" the perf counters to the atomic counters only in
add_thread and remove_thread, with the lock, then it's about the same
as .5, but without the asserts and parallel inc/dec.  If anyone likes
the sound of that, I can send out a new webrev.
ThreadService::current_thread_exiting will decrement the atomic counters.
add_thread will increase atomic counters and sync the perf counters with
the atomic counters.
remove_thread will decrement the atomic counters if thread is not
exiting and
update the perf counters to the atomic counters.

I'm good with that.
Post by d***@oracle.com
Or we can go with webrev.5.
Post by David Holmes
My only query with that version is that it appears the actual
perfCounters no longer get read by anything - is that the case?
There does seem to be code that references them, through their name
string.  That's a difference interface that I'm not familiar with, so
I didn't want to break it.
The perf counters are exposed via jstat tool.  I don't know who relies
on jstat to monitor thread counts.   I suggest to check with the
performance team if they depend on these counters.   With JFR, I think
we should re-examine whether these perf counters should stay.

Mandy
David Holmes
2018-10-24 05:34:03 UTC
Permalink
Post by d***@oracle.com
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
Post by d***@oracle.com
Post by David Holmes
Sorry Dean I'm concerned about a thread termination bottleneck with
this. A simple microbenchmark that creates 500,000 threads that
have to run and terminate, shows a 15+% slowdown on my Linux box. I
tried to find some kind of real benchmarks that covers thread
termination but couldn't see anything specific.
Can you at least run this through our performance system to see if
any of the regular benchmarks are affected.
OK, but even if the regular benchmarks don't show a difference, I'd
feel better if microbenchmarks were not affected.  What if I go back
   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }
along with the other cleanups around is_daemon and is_hidden_thread?
Some micro-benchmarks like SecureRandomBench show a regression with
webrev.7.  I could go back to webrev.5 and then we shouldn't need any
locking in the get_*() functions.
I don't see version 5 discussed but I took a look and it seems okay.
Mandy had questions about the asserts in .5, and it seemed like we could
just set the perf counters to the same value as the atomic counters,
which resulted in .6.  I think the only problem with .6 is that I set
the perf counters in decrement_thread_counts without the lock.  If I
"sync" the perf counters to the atomic counters only in add_thread and
remove_thread, with the lock, then it's about the same as .5, but
without the asserts and parallel inc/dec.  If anyone likes the sound of
that, I can send out a new webrev.  Or we can go with webrev.5.
I'm not sure what the concern was with the asserts - if they mis-fire
we'll know soon enough. So I'm okay with .5
Post by d***@oracle.com
Post by JC Beyler
My only query with that version is that it appears the actual
perfCounters no longer get read by anything - is that the case?
There does seem to be code that references them, through their name
string.  That's a difference interface that I'm not familiar with, so I
didn't want to break it.
Right - they can be accessed directly through other means. I was
concerned that the perfCounter was out of sync with
get_live_thread-count() but that's already the case so not an issue.

If all tests and benchmarks are happy I say go with version .5

Thanks,
David
Post by d***@oracle.com
dl
Post by JC Beyler
Thanks,
David
Post by d***@oracle.com
dl
Post by d***@oracle.com
dl
Post by David Holmes
Thanks,
David
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
You're right, I missed that.  I think the right thing to do is
call current_thread_exiting while holding the Threads_lock.
Then we can get rid of the parallel atomic counters. So, here's
http://cr.openjdk.java.net/~dlong/8021335/webrev.7/
Okay that is the simple and obvious solution that doesn't
require split counts. So I have to ask Mandy if she recalls why
this approach wasn't taken 15 years ago when the exit counts
It has been so long.  I think it's likely an oversight.
Post by JC Beyler
https://bugs.openjdk.java.net/browse/JDK-4530538 ?
Does taking the Threads_lock here cost too much and cause a
thread termination bottleneck?
If the contention on Threads_lock is not high (that seems to me),
it should be okay.   I'm not close to the VM implementation (lot
of changes since then) and I don't have a definitive answer
unless I study the code closely.   You and others have a better
judgement on this.
AFAICT the change is okay.
Thanks Mandy.  David, OK to push?
dl
Mandy
Mandy Chung
2018-10-24 06:04:30 UTC
Permalink
Post by David Holmes
Post by d***@oracle.com
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
Post by d***@oracle.com
Post by David Holmes
Sorry Dean I'm concerned about a thread termination bottleneck
with this. A simple microbenchmark that creates 500,000 threads
that have to run and terminate, shows a 15+% slowdown on my Linux
box. I tried to find some kind of real benchmarks that covers
thread termination but couldn't see anything specific.
Can you at least run this through our performance system to see
if any of the regular benchmarks are affected.
OK, but even if the regular benchmarks don't show a difference,
I'd feel better if microbenchmarks were not affected.  What if I
   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }
along with the other cleanups around is_daemon and is_hidden_thread?
Some micro-benchmarks like SecureRandomBench show a regression with
webrev.7.  I could go back to webrev.5 and then we shouldn't need
any locking in the get_*() functions.
I don't see version 5 discussed but I took a look and it seems okay.
Mandy had questions about the asserts in .5, and it seemed like we
could just set the perf counters to the same value as the atomic
counters, which resulted in .6.  I think the only problem with .6 is
that I set the perf counters in decrement_thread_counts without the
lock.  If I "sync" the perf counters to the atomic counters only in
add_thread and remove_thread, with the lock, then it's about the same
as .5, but without the asserts and parallel inc/dec.  If anyone likes
the sound of that, I can send out a new webrev.  Or we can go with
webrev.5.
I'm not sure what the concern was with the asserts - if they mis-fire
we'll know soon enough. So I'm okay with .5
Post by d***@oracle.com
Post by JC Beyler
My only query with that version is that it appears the actual
perfCounters no longer get read by anything - is that the case?
There does seem to be code that references them, through their name
string.  That's a difference interface that I'm not familiar with, so
I didn't want to break it.
Right - they can be accessed directly through other means. I was
concerned that the perfCounter was out of sync with
get_live_thread-count() but that's already the case so not an issue.
If all tests and benchmarks are happy I say go with version .5
I have no objection to version .5 if most people prefer that.  My
comment was that I don't think the asserts are necessary.

Mandy
d***@oracle.com
2018-10-24 21:18:00 UTC
Permalink
Post by Mandy Chung
Post by David Holmes
Post by d***@oracle.com
Post by JC Beyler
Hi Dean,
Post by d***@oracle.com
Post by d***@oracle.com
Post by David Holmes
Sorry Dean I'm concerned about a thread termination bottleneck
with this. A simple microbenchmark that creates 500,000 threads
that have to run and terminate, shows a 15+% slowdown on my
Linux box. I tried to find some kind of real benchmarks that
covers thread termination but couldn't see anything specific.
Can you at least run this through our performance system to see
if any of the regular benchmarks are affected.
OK, but even if the regular benchmarks don't show a difference,
I'd feel better if microbenchmarks were not affected.  What if I
   static jlong get_live_thread_count()        { MutexLocker
mu(Threads_lock); return _live_threads_count->get_value() -
_exiting_threads_count; }
   static jlong get_daemon_thread_count()      { MutexLocker
mu(Threads_lock); return _daemon_threads_count->get_value() -
_exiting_daemon_threads_count; }
along with the other cleanups around is_daemon and is_hidden_thread?
Some micro-benchmarks like SecureRandomBench show a regression
with webrev.7.  I could go back to webrev.5 and then we shouldn't
need any locking in the get_*() functions.
I don't see version 5 discussed but I took a look and it seems okay.
Mandy had questions about the asserts in .5, and it seemed like we
could just set the perf counters to the same value as the atomic
counters, which resulted in .6.  I think the only problem with .6 is
that I set the perf counters in decrement_thread_counts without the
lock.  If I "sync" the perf counters to the atomic counters only in
add_thread and remove_thread, with the lock, then it's about the
same as .5, but without the asserts and parallel inc/dec.  If anyone
likes the sound of that, I can send out a new webrev.  Or we can go
with webrev.5.
I'm not sure what the concern was with the asserts - if they mis-fire
we'll know soon enough. So I'm okay with .5
Post by d***@oracle.com
Post by JC Beyler
My only query with that version is that it appears the actual
perfCounters no longer get read by anything - is that the case?
There does seem to be code that references them, through their name
string.  That's a difference interface that I'm not familiar with,
so I didn't want to break it.
Right - they can be accessed directly through other means. I was
concerned that the perfCounter was out of sync with
get_live_thread-count() but that's already the case so not an issue.
If all tests and benchmarks are happy I say go with version .5
I have no objection to version .5 if most people prefer that.  My
comment was that I don't think the asserts are necessary.
OK, I'll rerun some performance benchmarks and push .5 if the results
look OK.  David, can you send me your micro-benchmark?
Thanks for the reviews!

dl
Post by Mandy Chung
Mandy
d***@oracle.com
2018-10-17 21:10:48 UTC
Permalink
Post by JC Beyler
Hi Dean,
Thanks for tackling this.
I'm still struggling to fully grasp why we need both the PerfCounters
and the regular counters. I get that we have to decrement the live
counts before ensure_join() has allowed Thread.join() to return, to
ensure that if we then check the number of threads it has dropped by
one. But I don't understand why that means we need to manage the
thread count in two parts. Particularly as now you don't use the
PerfCounter to return the live count, so it makes me wonder what role
the PerfCounter is playing as it is temporarily inconsistent with the
reported live count? Is it simply that we can't atomically decrement
the PerfCounters, and we can't require the Threads_lock when
decrementing?
Good questions.  I didn't know the history, so I tried not to change too
much.  The PerfCounters appear to be there to support StatSampler.  I
think it's OK for them to be a little out of sync. If we wanted to make
them more in sync with the atomic counters, I think we could do any of
the following:
- Grab Threads_lock before SR_lock() where we call
current_thread_exiting, and update perf counts at that time
- instead of decrementing in remove_thread, do this in
decrement_thread_counts
  _live_threads_count->set_value(_atomic_threads_count);
  _daemon_threads_count->set_value(_atomic_daemon_threads_count);
( I see that Mandy has suggested the same thing.)
- DO update the perf counts atomically
However, I consider the PerfCounters inconsistency a separate issue from
this bug.
Post by JC Beyler
On the code itself one thing I find irksome is that we already have a
daemon flag in remove_thread but decrement_thread_counts doesn't get
passed that and so always re-examines the thread to see if it is a
daemon. I know only one of the code paths has the daemon flag, so
perhaps we should hoist the daemon check up into JavaThread::exit and
then pass a daemon parameter to decrement_thread_counts.
OK, I've fixed this.
Post by JC Beyler
There's also some ambiguity (existing) in relation to the existence of
the jt->threadObj() and its affect on whether the thread is considered
a daemon or not. Your check has to mirror the existing checks - the
threadObj may be NULL at removal if it was a failed attach, but the
addition considers a JNI-attached thread to be non-daemon. Overall
this seems buggy to me but as long as your check follows the same
rules that is okay. In that regard JavaThread::exit should never
encounter a NULL threadObj().
I refactored is_daemon checks into a single helper function.
Post by JC Beyler
Minor nit: I suggest moving the two occurrences of the comment
// Do not count VM internal or JVMTI agent threads
inside is_hidden_thread so that we don't have to remember to update
the comments if the definition changes.
OK.  New webrev:

http://cr.openjdk.java.net/~dlong/8021335/webrev.5/

dl
Post by JC Beyler
Thanks,
David
-----
Post by d***@oracle.com
http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
dl
Post by d***@oracle.com
Post by JC Beyler
Hi Dean,
"are atomically" is in the comment but should no longer be there I
// These 2 counters are like the above thread counts, but are
Fixed!
Post by JC Beyler
Any way we could move this test into a helper method and both
add_thread/current_thread_exiting could use the same "ignore"
thread code so that we ensure the two never get out of sync?
+  if (jt->is_hidden_from_external_view() ||
+      jt->is_jvmti_agent_thread()) {
+    return;
+  }
Good idea.  Fixed.
Post by JC Beyler
(Also by curiosity, add_thread has an assert on the Threads_lock
but not thread_exiting?)
Right, I followed the existing pattern where
current_thread_exiting() only uses the atomic counters, so it
doesn't need Threads_lock.
dl
Post by JC Beyler
Thanks,
Jc
    https://bugs.openjdk.java.net/browse/JDK-8021335
    http://cr.openjdk.java.net/~dlong/8021335/webrev.3/
<http://cr.openjdk.java.net/%7Edlong/8021335/webrev.3/>
    This change attempts to fix an old and intermittent problem with
    ThreadMXBean thread counts.
    1. Make sure hidden threads don't affect counts (even when
exiting)
    2. Fix race reading counts using atomic counters
    3. Remove some workarounds in test (because they hide bugs)
    dl
--
Thanks,
Jc
Loading...