Discussion:
RFR (S): 8211909: JDWP Transport Listener: dt_socket thread crash
(too old to reply)
David Holmes
2018-10-15 05:12:31 UTC
Permalink
Bug: https://bugs.openjdk.java.net/browse/JDK-8211909
webrev: http://cr.openjdk.java.net/~dholmes/8211909/webrev/

The crash occurs when trying to access a thread that was returned as
part of JVM TI GetThreadGroupChildren. The problem is that the JVM TI
code tries to use the Threads_lock to ensure atomic access to the
ThreadGroup's threads and child groups:

{ // Cannot allow thread or group counts to change.
MutexLocker mu(Threads_lock);

but the Threads_lock does not control concurrency in the Java code that
can cause threads to be added and removed, so we do not get a stable
snapshot of the thread array and its length, and contents. To get a
stable snapshot we have to use the same synchronization mechanism as
used by the Java code: lock the monitor of the ThreadGroup instance.

Two other pointless acquisitions of the Threads_lock, in GetThreadInfo
and GetThreadGroupInfo, were also removed. These functions could report
an inconsistent snapshot of the Thread or ThreadGroup state, if the
mutable state is mutated concurrently. Again we could acquire the
object's monitor to prevent this but it is unclear that there is any
real benefit in doing so - after all the thread/group information could
change immediately after it has been read anyway. So here I just delete
the Threads_lock usage.

Testing: mach5 tier 1-3
JVM TI tests (serviceability/jvmti vmTestbase/nsk/jvmti/)

A regression test may be possible if we call GetThreadGroupChildren in a
loop, whilst threads add and remove themselves from the group
concurrently. But it is awkward to write.

Thanks,
David

Thanks,
David
Daniel D. Daugherty
2018-10-15 15:05:19 UTC
Permalink
Post by David Holmes
Bug: https://bugs.openjdk.java.net/browse/JDK-8211909
webrev: http://cr.openjdk.java.net/~dholmes/8211909/webrev/
src/hotspot/share/prims/jvmtiEnv.cpp
    No comments.

Thumbs up on the code changes.
Post by David Holmes
The crash occurs when trying to access a thread that was returned as
part of JVM TI GetThreadGroupChildren. The problem is that the JVM TI
code tries to use the Threads_lock to ensure atomic access to the
    { // Cannot allow thread or group counts to change.
      MutexLocker mu(Threads_lock);
but the Threads_lock does not control concurrency in the Java code
that can cause threads to be added and removed, so we do not get a
stable snapshot of the thread array and its length, and contents. To
get a stable snapshot we have to use the same synchronization
mechanism as used by the Java code: lock the monitor of the
ThreadGroup instance.
Agreed that the wrong locking mechanism was used.

Side note: The '// Cannot allow thread or group counts to change.'
comment was added sometime after we switched from TeamWare to
Mercurial.

Looks like all three uses of Threads_lock that you removed
date back to the original JVM/TI delta:

$ sgv src/share/vm/prims/jvmtiEnv.cpp | grep '{ MutexLocker
mu(Threads_lock);'
1.168
1.1 { MutexLocker mu(Threads_lock);
1.1 { MutexLocker mu(Threads_lock);
1.1 { MutexLocker mu(Threads_lock);
3358 lines
No id keywords (cm7)

$ sp -r1.1 src/share/vm/prims/jvmtiEnv.cpp
src/share/vm/prims/SCCS/s.jvmtiEnv.cpp:

D 1.1 03/01/28 20:52:06 rfield 1 0 03130/00000/00000
MRs:
COMMENTS:
date and time created 03/01/28 20:52:06 by rfield
Post by David Holmes
Two other pointless acquisitions of the Threads_lock, in GetThreadInfo
and GetThreadGroupInfo, were also removed. These functions could
report an inconsistent snapshot of the Thread or ThreadGroup state, if
the mutable state is mutated concurrently. Again we could acquire the
object's monitor to prevent this but it is unclear that there is any
real benefit in doing so - after all the thread/group information
could change immediately after it has been read anyway. So here I just
delete the Threads_lock usage.
Also agreed. I ran into all three of these in the Thread-SMR project
and I couldn't convince myself to remove the use of Threads_lock for
these two. I missed that the wrong lock was used for JVM/TI
GetThreadGroupChildren().

Very nice catch!

Dan
Post by David Holmes
Testing: mach5 tier 1-3
         JVM TI tests (serviceability/jvmti vmTestbase/nsk/jvmti/)
A regression test may be possible if we call GetThreadGroupChildren in
a loop, whilst threads add and remove themselves from the group
concurrently. But it is awkward to write.
Thanks,
David
Thanks,
David
David Holmes
2018-10-15 19:46:40 UTC
Permalink
Thanks for the review Dan!

David
Post by Daniel D. Daugherty
Post by David Holmes
Bug: https://bugs.openjdk.java.net/browse/JDK-8211909
webrev: http://cr.openjdk.java.net/~dholmes/8211909/webrev/
src/hotspot/share/prims/jvmtiEnv.cpp
    No comments.
Thumbs up on the code changes.
Post by David Holmes
The crash occurs when trying to access a thread that was returned as
part of JVM TI GetThreadGroupChildren. The problem is that the JVM TI
code tries to use the Threads_lock to ensure atomic access to the
    { // Cannot allow thread or group counts to change.
      MutexLocker mu(Threads_lock);
but the Threads_lock does not control concurrency in the Java code
that can cause threads to be added and removed, so we do not get a
stable snapshot of the thread array and its length, and contents. To
get a stable snapshot we have to use the same synchronization
mechanism as used by the Java code: lock the monitor of the
ThreadGroup instance.
Agreed that the wrong locking mechanism was used.
Side note: The '// Cannot allow thread or group counts to change.'
comment was added sometime after we switched from TeamWare to
Mercurial.
Looks like all three uses of Threads_lock that you removed
$ sgv src/share/vm/prims/jvmtiEnv.cpp | grep '{ MutexLocker
mu(Threads_lock);'
1.168
1.1 { MutexLocker mu(Threads_lock);
1.1 { MutexLocker mu(Threads_lock);
1.1 { MutexLocker mu(Threads_lock);
3358 lines
No id keywords (cm7)
$ sp -r1.1 src/share/vm/prims/jvmtiEnv.cpp
D 1.1 03/01/28 20:52:06 rfield 1 0 03130/00000/00000
date and time created 03/01/28 20:52:06 by rfield
Post by David Holmes
Two other pointless acquisitions of the Threads_lock, in GetThreadInfo
and GetThreadGroupInfo, were also removed. These functions could
report an inconsistent snapshot of the Thread or ThreadGroup state, if
the mutable state is mutated concurrently. Again we could acquire the
object's monitor to prevent this but it is unclear that there is any
real benefit in doing so - after all the thread/group information
could change immediately after it has been read anyway. So here I just
delete the Threads_lock usage.
Also agreed. I ran into all three of these in the Thread-SMR project
and I couldn't convince myself to remove the use of Threads_lock for
these two. I missed that the wrong lock was used for JVM/TI
GetThreadGroupChildren().
Very nice catch!
Dan
Post by David Holmes
Testing: mach5 tier 1-3
         JVM TI tests (serviceability/jvmti vmTestbase/nsk/jvmti/)
A regression test may be possible if we call GetThreadGroupChildren in
a loop, whilst threads add and remove themselves from the group
concurrently. But it is awkward to write.
Thanks,
David
Thanks,
David
Dmitry Samersoff
2018-10-16 12:46:02 UTC
Permalink
David,

The fix looks good to me.

PS: This code has lots of formatting nits like missed spaces.
It's clearly out of scope of this fix, but it might be worth to
launch a second webrev.

-Dmitry
Post by David Holmes
Bug: https://bugs.openjdk.java.net/browse/JDK-8211909
webrev: http://cr.openjdk.java.net/~dholmes/8211909/webrev/
The crash occurs when trying to access a thread that was returned as
part of JVM TI GetThreadGroupChildren. The problem is that the JVM TI
code tries to use the Threads_lock to ensure atomic access to the
    { // Cannot allow thread or group counts to change.
      MutexLocker mu(Threads_lock);
but the Threads_lock does not control concurrency in the Java code that
can cause threads to be added and removed, so we do not get a stable
snapshot of the thread array and its length, and contents. To get a
stable snapshot we have to use the same synchronization mechanism as
used by the Java code: lock the monitor of the ThreadGroup instance.
Two other pointless acquisitions of the Threads_lock, in GetThreadInfo
and GetThreadGroupInfo, were also removed. These functions could report
an inconsistent snapshot of the Thread or ThreadGroup state, if the
mutable state is mutated concurrently. Again we could acquire the
object's monitor to prevent this but it is unclear that there is any
real benefit in doing so - after all the thread/group information could
change immediately after it has been read anyway. So here I just delete
the Threads_lock usage.
Testing: mach5 tier 1-3
         JVM TI tests (serviceability/jvmti vmTestbase/nsk/jvmti/)
A regression test may be possible if we call GetThreadGroupChildren in a
loop, whilst threads add and remove themselves from the group
concurrently. But it is awkward to write.
Thanks,
David
Thanks,
David
--
Dmitry Samersoff
http://devnull.samersoff.net
* There will come soft rains ...
David Holmes
2018-10-16 22:50:25 UTC
Permalink
Hi Dmitry,
Post by Dmitry Samersoff
David,
The fix looks good to me.
Thanks for taking a look.
Post by Dmitry Samersoff
PS: This code has lots of formatting nits like missed spaces.
It's clearly out of scope of this fix, but it might be worth to
launch a second webrev.
out of scope but launch a second webrev? Sorry I'm a little confused.
But I only see a few missed space around for-loops:

929 for (int i=0; i < nthreads; i++) {
1463 for (int i=0, j=0; i<nthreads; i++) {
1495 for (int i=0; i<ngroups; i++) {
3250 for (intptr_t i=0; i <= recursion; i++) {
3680 for (int j=0; j<readable_count; j++) {

I can fix those before pushing.

Thanks,
David
Post by Dmitry Samersoff
-Dmitry
Post by David Holmes
Bug: https://bugs.openjdk.java.net/browse/JDK-8211909
webrev: http://cr.openjdk.java.net/~dholmes/8211909/webrev/
The crash occurs when trying to access a thread that was returned as
part of JVM TI GetThreadGroupChildren. The problem is that the JVM TI
code tries to use the Threads_lock to ensure atomic access to the
    { // Cannot allow thread or group counts to change.
      MutexLocker mu(Threads_lock);
but the Threads_lock does not control concurrency in the Java code that
can cause threads to be added and removed, so we do not get a stable
snapshot of the thread array and its length, and contents. To get a
stable snapshot we have to use the same synchronization mechanism as
used by the Java code: lock the monitor of the ThreadGroup instance.
Two other pointless acquisitions of the Threads_lock, in GetThreadInfo
and GetThreadGroupInfo, were also removed. These functions could report
an inconsistent snapshot of the Thread or ThreadGroup state, if the
mutable state is mutated concurrently. Again we could acquire the
object's monitor to prevent this but it is unclear that there is any
real benefit in doing so - after all the thread/group information could
change immediately after it has been read anyway. So here I just delete
the Threads_lock usage.
Testing: mach5 tier 1-3
         JVM TI tests (serviceability/jvmti vmTestbase/nsk/jvmti/)
A regression test may be possible if we call GetThreadGroupChildren in a
loop, whilst threads add and remove themselves from the group
concurrently. But it is awkward to write.
Thanks,
David
Thanks,
David
s***@oracle.com
2018-10-17 23:00:23 UTC
Permalink
Hi David,

Sorry for being late but just wanted to tell you that it looks good to me.
Thank you for catching and taking care about it!

Thanks,
Serguei
Post by David Holmes
Bug: https://bugs.openjdk.java.net/browse/JDK-8211909
webrev: http://cr.openjdk.java.net/~dholmes/8211909/webrev/
The crash occurs when trying to access a thread that was returned as
part of JVM TI GetThreadGroupChildren. The problem is that the JVM TI
code tries to use the Threads_lock to ensure atomic access to the
    { // Cannot allow thread or group counts to change.
      MutexLocker mu(Threads_lock);
but the Threads_lock does not control concurrency in the Java code
that can cause threads to be added and removed, so we do not get a
stable snapshot of the thread array and its length, and contents. To
get a stable snapshot we have to use the same synchronization
mechanism as used by the Java code: lock the monitor of the
ThreadGroup instance.
Two other pointless acquisitions of the Threads_lock, in GetThreadInfo
and GetThreadGroupInfo, were also removed. These functions could
report an inconsistent snapshot of the Thread or ThreadGroup state, if
the mutable state is mutated concurrently. Again we could acquire the
object's monitor to prevent this but it is unclear that there is any
real benefit in doing so - after all the thread/group information
could change immediately after it has been read anyway. So here I just
delete the Threads_lock usage.
Testing: mach5 tier 1-3
         JVM TI tests (serviceability/jvmti vmTestbase/nsk/jvmti/)
A regression test may be possible if we call GetThreadGroupChildren in
a loop, whilst threads add and remove themselves from the group
concurrently. But it is awkward to write.
Thanks,
David
Thanks,
David
David Holmes
2018-10-18 01:34:32 UTC
Permalink
Thanks Serguei!

David
Post by s***@oracle.com
Hi David,
Sorry for being late but just wanted to tell you that it looks good to me.
Thank you for catching and taking care about it!
Thanks,
Serguei
Post by David Holmes
Bug: https://bugs.openjdk.java.net/browse/JDK-8211909
webrev: http://cr.openjdk.java.net/~dholmes/8211909/webrev/
The crash occurs when trying to access a thread that was returned as
part of JVM TI GetThreadGroupChildren. The problem is that the JVM TI
code tries to use the Threads_lock to ensure atomic access to the
    { // Cannot allow thread or group counts to change.
      MutexLocker mu(Threads_lock);
but the Threads_lock does not control concurrency in the Java code
that can cause threads to be added and removed, so we do not get a
stable snapshot of the thread array and its length, and contents. To
get a stable snapshot we have to use the same synchronization
mechanism as used by the Java code: lock the monitor of the
ThreadGroup instance.
Two other pointless acquisitions of the Threads_lock, in GetThreadInfo
and GetThreadGroupInfo, were also removed. These functions could
report an inconsistent snapshot of the Thread or ThreadGroup state, if
the mutable state is mutated concurrently. Again we could acquire the
object's monitor to prevent this but it is unclear that there is any
real benefit in doing so - after all the thread/group information
could change immediately after it has been read anyway. So here I just
delete the Threads_lock usage.
Testing: mach5 tier 1-3
         JVM TI tests (serviceability/jvmti vmTestbase/nsk/jvmti/)
A regression test may be possible if we call GetThreadGroupChildren in
a loop, whilst threads add and remove themselves from the group
concurrently. But it is awkward to write.
Thanks,
David
Thanks,
David
Loading...