Discussion:
RFR: JDK-8210337: runtime/NMT/VirtualAllocTestType.java failed on RuntimeException missing from stdout/stderr
(too old to reply)
Gary Adams
2018-10-02 15:16:08 UTC
Permalink
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.

It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.

Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
Issue: https://bugs.openjdk.java.net/browse/JDK-8210337

Testing is in progress.
Chris Plummer
2018-10-02 19:12:06 UTC
Permalink
Without the fix was this issue easy enough to reproduce that you can be
sure this is resolving it?

Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
g***@oracle.com
2018-10-02 19:27:59 UTC
Permalink
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.

There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that you can
be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-10-02 20:11:59 UTC
Permalink
Can you summarize how the attach handshaking is suppose to work? I'm
just wondering why the attacher would ever be looking for the file
before the attachee has created it. It seems a proper handshake would
prevent this. Maybe there's some sort of visibility issue where the
attachee has indeed created the file, but it is not immediately visible
to the attacher process.

Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that you can
be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
g***@oracle.com
2018-10-02 20:44:45 UTC
Permalink
The general attach sequence ...

src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java

 the attacher creates an attach_pid file in a directory where the
attachee is runnning
 issues a signal to the attacheee

  loops waiting for the java_pid file to be created
  default timeout is 10 seconds

  src/hotspot/os/solaris/attachListener_solaris.cpp

   attachee creates the java_pid file
   listens til the attacher opens the door

...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to work? I'm
just wondering why the attacher would ever be looking for the file
before the attachee has created it. It seems a proper handshake would
prevent this. Maybe there's some sort of visibility issue where the
attachee has indeed created the file, but it is not immediately
visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that you can
be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-10-02 20:57:43 UTC
Permalink
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
 the attacher creates an attach_pid file in a directory where the
attachee is runnning
 issues a signal to the attacheee
  loops waiting for the java_pid file to be created
  default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK, but
IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
   attachee creates the java_pid file
   listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds for
initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the issue is that
the file is in the process of being created, but is not fully created
yet. Maybe it is there, but owner/group/permissions have not been set
yet, and this results in an IOException instead of FileNotFoundException.

Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to work? I'm
just wondering why the attacher would ever be looking for the file
before the attachee has created it. It seems a proper handshake would
prevent this. Maybe there's some sort of visibility issue where the
attachee has indeed created the file, but it is not immediately
visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that you
can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
David Holmes
2018-10-02 21:38:34 UTC
Permalink
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
 the attacher creates an attach_pid file in a directory where the
attachee is runnning
 issues a signal to the attacheee
  loops waiting for the java_pid file to be created
  default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK, but
IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
   attachee creates the java_pid file
   listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds for
initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the issue is that
the file is in the process of being created, but is not fully created
yet. Maybe it is there, but owner/group/permissions have not been set
yet, and this results in an IOException instead of FileNotFoundException.
The exception is shown in the bug report:

[java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)

And if you look at the native code the EPERM from open will cause
IOException to be thrown.

./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c

JNIEXPORT jint JNICALL Java_sun_tools_attach_VirtualMachineImpl_open
(JNIEnv *env, jclass cls, jstring path)
{
jboolean isCopy;
const char* p = GetStringPlatformChars(env, path, &isCopy);
if (p == NULL) {
return 0;
} else {
int fd;
int err = 0;

fd = open(p, O_RDWR);
if (fd == -1) {
err = errno;
}

if (isCopy) {
JNU_ReleaseStringPlatformChars(env, path, p);
}

if (fd == -1) {
if (err == ENOENT) {
JNU_ThrowByName(env, "java/io/FileNotFoundException",
NULL);
} else {
char* msg = strdup(strerror(err));
JNU_ThrowIOException(env, msg);
if (msg != NULL) {
free(msg);
}


We should add the path to the exception message.

David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to work? I'm
just wondering why the attacher would ever be looking for the file
before the attachee has created it. It seems a proper handshake would
prevent this. Maybe there's some sort of visibility issue where the
attachee has indeed created the file, but it is not immediately
visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that you
can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-10-02 21:54:07 UTC
Permalink
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
 the attacher creates an attach_pid file in a directory where the
attachee is runnning
 issues a signal to the attacheee
  loops waiting for the java_pid file to be created
  default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK, but
IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
   attachee creates the java_pid file
   listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds for
initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the issue is
that the file is in the process of being created, but is not fully
created yet. Maybe it is there, but owner/group/permissions have not
been set yet, and this results in an IOException instead of
FileNotFoundException.
 [java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will cause
IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL Java_sun_tools_attach_VirtualMachineImpl_open
  (JNIEnv *env, jclass cls, jstring path)
{
    jboolean isCopy;
    const char* p = GetStringPlatformChars(env, path, &isCopy);
    if (p == NULL) {
        return 0;
    } else {
        int fd;
        int err = 0;
        fd = open(p, O_RDWR);
        if (fd == -1) {
            err = errno;
        }
        if (isCopy) {
            JNU_ReleaseStringPlatformChars(env, path, p);
        }
        if (fd == -1) {
            if (err == ENOENT) {
                JNU_ThrowByName(env, "java/io/FileNotFoundException",
NULL);
            } else {
                char* msg = strdup(strerror(err));
                JNU_ThrowIOException(env, msg);
                if (msg != NULL) {
                    free(msg);
                }
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms later works, I
think that supports my hypothesis that the file is not quite fully
created. So Gary's fix is probably fine. The only other possible fix I
can think of that wouldn't require an explicit delay (or multiple
retries) is probably not worth the complexity. It would require that the
attachee create two files, and the attacher try to open the second file
first. When it either opens or returns EPERM, you know the first file
can safety be opened.

Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to work?
I'm just wondering why the attacher would ever be looking for the
file before the attachee has created it. It seems a proper
handshake would prevent this. Maybe there's some sort of visibility
issue where the attachee has indeed created the file, but it is not
immediately visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that you
can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
David Holmes
2018-10-02 22:23:15 UTC
Permalink
Minor correction: EPERM -> EACCES for Solaris

Hard to see how to get a transient EACCES when opening a file ... though
as it is really a door I guess there could be additional complexity.

David
Post by Chris Plummer
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
 the attacher creates an attach_pid file in a directory where the
attachee is runnning
 issues a signal to the attacheee
  loops waiting for the java_pid file to be created
  default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK, but
IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
   attachee creates the java_pid file
   listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds for
initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the issue is
that the file is in the process of being created, but is not fully
created yet. Maybe it is there, but owner/group/permissions have not
been set yet, and this results in an IOException instead of
FileNotFoundException.
 [java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will cause
IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL Java_sun_tools_attach_VirtualMachineImpl_open
  (JNIEnv *env, jclass cls, jstring path)
{
    jboolean isCopy;
    const char* p = GetStringPlatformChars(env, path, &isCopy);
    if (p == NULL) {
        return 0;
    } else {
        int fd;
        int err = 0;
        fd = open(p, O_RDWR);
        if (fd == -1) {
            err = errno;
        }
        if (isCopy) {
            JNU_ReleaseStringPlatformChars(env, path, p);
        }
        if (fd == -1) {
            if (err == ENOENT) {
                JNU_ThrowByName(env, "java/io/FileNotFoundException",
NULL);
            } else {
                char* msg = strdup(strerror(err));
                JNU_ThrowIOException(env, msg);
                if (msg != NULL) {
                    free(msg);
                }
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms later works, I
think that supports my hypothesis that the file is not quite fully
created. So Gary's fix is probably fine. The only other possible fix I
can think of that wouldn't require an explicit delay (or multiple
retries) is probably not worth the complexity. It would require that the
attachee create two files, and the attacher try to open the second file
first. When it either opens or returns EPERM, you know the first file
can safety be opened.
Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to work?
I'm just wondering why the attacher would ever be looking for the
file before the attachee has created it. It seems a proper
handshake would prevent this. Maybe there's some sort of visibility
issue where the attachee has indeed created the file, but it is not
immediately visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that you
can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
  Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Gary Adams
2018-10-04 15:54:14 UTC
Permalink
First, let me retract the proposed change,
it is not the right solution to the problem originally
reported.

Second, as a bit of explanation consider the code fragments below.

The high level processing calls openDoor which is willing to retry
the operation as long as the error is flagged specifically
as a FileNotFoundException.

VirtualMachineImpl.java:72
VirtualMachineImpl.c:81

During my testing I had added a check VirtualMachineImpl.java:214
and when an IOException was detected made a call to checkPermissions
to get more detailed information about the IOException. The error
I saw was an ENOENT from the stat call. And not the detailed checks for
specific permissions issues (VirtualMachineImpl.c:143)

VirtualMachineImpl.c:118
VirtualMachineImpl.c:147

What I missed in the original proposed solution was a FileNotFoundException
extends IOException. That means my delay and retry just duplicates the
higher
level retry around the openDoor call.

Third, the original error message logged in the bug report :

java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)

had to have come from

VirtualMachineImpl.c:70
VirtualMachineImpl.c:84

which means the actual open call reported the file does exist
but the permissions do not allow the file to be accessed.
That also means the normal mechanism of removing leftover
java_pid files would not have cleaned up another user's
java_pid files.

=====
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:
...
67 // Opens the door file to the target VM. If the file
is not
68 // found it might mean that the attach mechanism
isn't started in the
69 // target VM so we attempt to start it and retry.
70 try {
71 fd = openDoor(pid);
72 } catch (FileNotFoundException fnf1) {
73 File f = createAttachFile(pid);
74 try {
75 sigquit(pid);
76
77 // give the target VM time to start the
attach mechanism
78 final int delay_step = 100;
79 final long timeout = attachTimeout();
80 long time_spend = 0;
81 long delay = 0;
82 do {
83 // Increase timeout on each attempt to
reduce polling
84 delay += delay_step;
85 try {
86 Thread.sleep(delay);
87 } catch (InterruptedException x) { }
88 try {
89 fd = openDoor(pid);
90 } catch (FileNotFoundException fnf2) {
91 // pass
92 }
93
94 time_spend += delay;
95 if (time_spend > timeout/2 && fd == -1) {
96 // Send QUIT again to give target VM
the last chance to react
97 sigquit(pid);
98 }
99 } while (time_spend <= timeout && fd == -1);
100 if (fd == -1) {
101 throw new AttachNotSupportedException(
102 String.format("Unable to open door
%s: " +
103 "target process %d doesn't respond
within %dms " +
104 "or HotSpot VM not loaded",
socket_path, pid, time_spend));
105 }
...
212 // The door is attached to .java_pid<pid> in the
temporary directory.
213 private int openDoor(int pid) throws IOException {
214 socket_path = tmpdir + "/.java_pid" + pid;
215 fd = open(socket_path);
216
217 // Check that the file owner/permission to avoid
attaching to
218 // bogus process
219 try {
220 checkPermissions(socket_path);
221 } catch (IOException ioe) {
222 close(fd);
223 throw ioe;
224 }
225 return fd;
226 }

=====
src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:
...
59 JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
60 (JNIEnv *env, jclass cls, jstring path)
61 {
62 jboolean isCopy;
63 const char* p = GetStringPlatformChars(env, path, &isCopy);
64 if (p == NULL) {
65 return 0;
66 } else {
67 int fd;
68 int err = 0;
69
70 fd = open(p, O_RDWR);
71 if (fd == -1) {
72 err = errno;
73 }
74
75 if (isCopy) {
76 JNU_ReleaseStringPlatformChars(env, path, p);
77 }
78
79 if (fd == -1) {
80 if (err == ENOENT) {
81 JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
82 } else {
83 char* msg = strdup(strerror(err));
84 JNU_ThrowIOException(env, msg);
85 if (msg != NULL) {
86 free(msg);
87 }
88 }
89 }
90 return fd;
91 }
92 }
...
99 JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
100 (JNIEnv *env, jclass cls, jstring path)
101 {
102 jboolean isCopy;
103 const char* p = GetStringPlatformChars(env, path, &isCopy);
104 if (p != NULL) {
105 struct stat64 sb;
106 uid_t uid, gid;
107 int res;
108
109 memset(&sb, 0, sizeof(struct stat64));
110
111 /*
112 * Check that the path is owned by the effective
uid/gid of this
113 * process. Also check that group/other access is not
allowed.
114 */
115 uid = geteuid();
116 gid = getegid();
117
118 res = stat64(p, &sb);
119 if (res != 0) {
120 /* save errno */
121 res = errno;
122 }
123
124 if (res == 0) {
125 char msg[100];
126 jboolean isError = JNI_FALSE;
127 if (sb.st_uid != uid && uid != ROOT_UID) {
128 snprintf(msg, sizeof(msg),
129 "file should be owned by the current user
(which is %d) but is owned by %d", uid, sb.st_uid);
130 isError = JNI_TRUE;
131 } else if (sb.st_gid != gid && uid != ROOT_UID) {
132 snprintf(msg, sizeof(msg),
133 "file's group should be the current group
(which is %d) but the group is %d", gid, sb.st_gid);
134 isError = JNI_TRUE;
135 } else if ((sb.st_mode &
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
136 snprintf(msg, sizeof(msg),
137 "file should only be readable and
writable by the owner but has 0%03o access", sb.st_mode & 0777);
138 isError = JNI_TRUE;
139 }
140 if (isError) {
141 char buf[256];
142 snprintf(buf, sizeof(buf), "well-known file
%s is not secure: %s", p, msg);
143 JNU_ThrowIOException(env, buf);
144 }
145 } else {
146 char* msg = strdup(strerror(res));
147 JNU_ThrowIOException(env, msg);
148 if (msg != NULL) {
149 free(msg);
150 }
151 }
Post by David Holmes
Minor correction: EPERM -> EACCES for Solaris
Hard to see how to get a transient EACCES when opening a file ...
though as it is really a door I guess there could be additional
complexity.
David
Post by Chris Plummer
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
the attacher creates an attach_pid file in a directory where the
attachee is runnning
issues a signal to the attacheee
loops waiting for the java_pid file to be created
default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK, but
IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
attachee creates the java_pid file
listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds for
initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the issue is
that the file is in the process of being created, but is not fully
created yet. Maybe it is there, but owner/group/permissions have
not been set yet, and this results in an IOException instead of
FileNotFoundException.
[java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will cause
IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL Java_sun_tools_attach_VirtualMachineImpl_open
(JNIEnv *env, jclass cls, jstring path)
{
jboolean isCopy;
const char* p = GetStringPlatformChars(env, path, &isCopy);
if (p == NULL) {
return 0;
} else {
int fd;
int err = 0;
fd = open(p, O_RDWR);
if (fd == -1) {
err = errno;
}
if (isCopy) {
JNU_ReleaseStringPlatformChars(env, path, p);
}
if (fd == -1) {
if (err == ENOENT) {
JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
} else {
char* msg = strdup(strerror(err));
JNU_ThrowIOException(env, msg);
if (msg != NULL) {
free(msg);
}
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms later works,
I think that supports my hypothesis that the file is not quite fully
created. So Gary's fix is probably fine. The only other possible fix
I can think of that wouldn't require an explicit delay (or multiple
retries) is probably not worth the complexity. It would require that
the attachee create two files, and the attacher try to open the
second file first. When it either opens or returns EPERM, you know
the first file can safety be opened.
Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to work?
I'm just wondering why the attacher would ever be looking for the
file before the attachee has created it. It seems a proper
handshake would prevent this. Maybe there's some sort of
visibility issue where the attachee has indeed created the file,
but it is not immediately visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that
you can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-10-04 16:30:56 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Didn't the retry after 100ms delay
work? If yes, why would it if the problem is that a java_pid was
not cleaned up?<br>
<br>
Chris<br>
<br>
On 10/4/18 8:54 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
First, let me retract the proposed change,<br>
it is not the right solution to the problem originally<br>
reported.<br>
<br>
Second, as a bit of explanation consider the code fragments below.<br>
<br>
The high level processing calls openDoor which is willing to retry
<br>
the operation as long as the error is flagged specifically<br>
as a FileNotFoundException.<br>
<br>
  VirtualMachineImpl.java:72<br>
  VirtualMachineImpl.c:81<br>
<br>
During my testing I had added a check VirtualMachineImpl.java:214<br>
and when an IOException was detected made a call to
checkPermissions<br>
to get more detailed information about the IOException. The error
<br>
I saw was an ENOENT from the stat call. And not the detailed
checks for<br>
specific permissions issues (VirtualMachineImpl.c:143)<br>
<br>
    VirtualMachineImpl.c:118<br>
    VirtualMachineImpl.c:147<br>
<br>
What I missed in the original proposed solution was a
FileNotFoundException<br>
extends IOException. That means my delay and retry just duplicates
the higher<br>
level retry around the openDoor call.<br>
<br>
Third, the original error message logged in the bug report :<br>
<br>
<span style="caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51);
font-family: Arial, sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal; font-weight:
normal; letter-spacing: normal; orphans: auto; text-align:
start; text-indent: 0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration: none;
display: inline !important; float: none;">java.io.IOException:
Permission denied<span class="Apple-converted-space"> </span></span><br
style="caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51);
font-family: Arial, sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal; font-weight:
normal; letter-spacing: normal; orphans: auto; text-align:
start; text-indent: 0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;
text-decoration: none;">
<span style="caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51);
font-family: Arial, sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal; font-weight:
normal; letter-spacing: normal; orphans: auto; text-align:
start; text-indent: 0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration: none;
display: inline !important; float: none;">at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method)<span class="Apple-converted-space"> </span></span><br>
<br>
had to have come from<br>
<br>
  VirtualMachineImpl.c:70<br>
  VirtualMachineImpl.c:84<br>
<br>
which means the actual open call reported the file does exist<br>
but the permissions do not allow the file to be accessed.<br>
That also means the normal mechanism of removing leftover <br>
java_pid files would not have cleaned up another user's<br>
java_pid files.<br>
<br>
=====<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:<br>
...<br>
    67            // Opens the door file to the target VM. If the
file is not<br>
    68            // found it might mean that the attach mechanism
isn't started in the<br>
    69            // target VM so we attempt to start it and
retry.<br>
    70            try {<br>
    71                fd = openDoor(pid);<br>
    72            } catch (FileNotFoundException fnf1) {<br>
    73                File f = createAttachFile(pid);<br>
    74                try {<br>
    75                    sigquit(pid);<br>
    76    <br>
    77                    // give the target VM time to start the
attach mechanism<br>
    78                    final int delay_step = 100;<br>
    79                    final long timeout = attachTimeout();<br>
    80                    long time_spend = 0;<br>
    81                    long delay = 0;<br>
    82                    do {<br>
    83                        // Increase timeout on each attempt
to reduce polling<br>
    84                        delay += delay_step;<br>
    85                        try {<br>
    86                            Thread.sleep(delay);<br>
    87                        } catch (InterruptedException x) { }<br>
    88                        try {<br>
    89                            fd = openDoor(pid);<br>
    90                        } catch (FileNotFoundException fnf2)
{<br>
    91                            // pass<br>
    92                        }<br>
    93    <br>
    94                        time_spend += delay;<br>
    95                        if (time_spend &gt; timeout/2
&amp;&amp; fd == -1) {<br>
    96                            // Send QUIT again to give
target VM the last chance to react<br>
    97                            sigquit(pid);<br>
    98                        }<br>
    99                    } while (time_spend &lt;= timeout
&amp;&amp; fd == -1);<br>
   100                    if (fd  == -1) {<br>
   101                        throw new
AttachNotSupportedException(<br>
   102                            String.format("Unable to open
door %s: " +<br>
   103                              "target process %d doesn't
respond within %dms " +<br>
   104                              "or HotSpot VM not loaded",
socket_path, pid, time_spend));<br>
   105                    }<br>
...<br>
   212        // The door is attached to .java_pid&lt;pid&gt; in
the temporary directory.<br>
   213        private int openDoor(int pid) throws IOException {<br>
   214            socket_path = tmpdir + "/.java_pid" + pid;<br>
   215            fd = open(socket_path);<br>
   216    <br>
   217            // Check that the file owner/permission to avoid
attaching to<br>
   218            // bogus process<br>
   219            try {<br>
   220                checkPermissions(socket_path);<br>
   221            } catch (IOException ioe) {<br>
   222                close(fd);<br>
   223                throw ioe;<br>
   224            }<br>
   225            return fd;<br>
   226        }<br>
<br>
=====<br>
src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:<br>
...<br>
    59    JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open<br>
    60      (JNIEnv *env, jclass cls, jstring path)<br>
    61    {<br>
    62        jboolean isCopy;<br>
    63        const char* p = GetStringPlatformChars(env, path,
&amp;isCopy);<br>
    64        if (p == NULL) {<br>
    65            return 0;<br>
    66        } else {<br>
    67            int fd;<br>
    68            int err = 0;<br>
    69    <br>
    70            fd = open(p, O_RDWR);<br>
    71            if (fd == -1) {<br>
    72                err = errno;<br>
    73            }<br>
    74    <br>
    75            if (isCopy) {<br>
    76                JNU_ReleaseStringPlatformChars(env, path,
p);<br>
    77            }<br>
    78    <br>
    79            if (fd == -1) {<br>
    80                if (err == ENOENT) {<br>
    81                    JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);<br>
    82                } else {<br>
    83                    char* msg = strdup(strerror(err));<br>
    84                    JNU_ThrowIOException(env, msg);<br>
    85                    if (msg != NULL) {<br>
    86                        free(msg);<br>
    87                    }<br>
    88                }<br>
    89            }<br>
    90            return fd;<br>
    91        }<br>
    92    }<br>
...<br>
    99    JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions<br>
   100      (JNIEnv *env, jclass cls, jstring path)<br>
   101    {<br>
   102        jboolean isCopy;<br>
   103        const char* p = GetStringPlatformChars(env, path,
&amp;isCopy);<br>
   104        if (p != NULL) {<br>
   105            struct stat64 sb;<br>
   106            uid_t uid, gid;<br>
   107            int res;<br>
   108    <br>
   109            memset(&amp;sb, 0, sizeof(struct stat64));<br>
   110    <br>
   111            /*<br>
   112             * Check that the path is owned by the effective
uid/gid of this<br>
   113             * process. Also check that group/other access
is not allowed.<br>
   114             */<br>
   115            uid = geteuid();<br>
   116            gid = getegid();<br>
   117    <br>
   118            res = stat64(p, &amp;sb);<br>
   119            if (res != 0) {<br>
   120                /* save errno */<br>
   121                res = errno;<br>
   122            }<br>
   123    <br>
   124            if (res == 0) {<br>
   125                char msg[100];<br>
   126                jboolean isError = JNI_FALSE;<br>
   127                if (sb.st_uid != uid &amp;&amp; uid !=
ROOT_UID) {<br>
   128                    snprintf(msg, sizeof(msg),<br>
   129                        "file should be owned by the current
user (which is %d) but is owned by %d", uid, sb.st_uid);<br>
   130                    isError = JNI_TRUE;<br>
   131                } else if (sb.st_gid != gid &amp;&amp; uid
!= ROOT_UID) {<br>
   132                    snprintf(msg, sizeof(msg),<br>
   133                        "file's group should be the current
group (which is %d) but the group is %d", gid, sb.st_gid);<br>
   134                    isError = JNI_TRUE;<br>
   135                } else if ((sb.st_mode &amp;
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {<br>
   136                    snprintf(msg, sizeof(msg),<br>
   137                        "file should only be readable and
writable by the owner but has 0%03o access", sb.st_mode &amp;
0777);<br>
   138                    isError = JNI_TRUE;<br>
   139                }<br>
   140                if (isError) {<br>
   141                    char buf[256];<br>
   142                    snprintf(buf, sizeof(buf), "well-known
file %s is not secure: %s", p, msg);<br>
   143                    JNU_ThrowIOException(env, buf);<br>
   144                }<br>
   145            } else {<br>
   146                char* msg = strdup(strerror(res));<br>
   147                JNU_ThrowIOException(env, msg);<br>
   148                if (msg != NULL) {<br>
   149                    free(msg);<br>
   150                }<br>
   151            }<br>
<br>
On 10/2/18, 6:23 PM, David Holmes wrote:
<blockquote
cite="mid:e1adb28a-031f-5c28-74a5-***@oracle.com"
type="cite">Minor correction: EPERM -&gt; EACCES for Solaris <br>
<br>
Hard to see how to get a transient EACCES when opening a file
... though as it is really a door I guess there could be
additional complexity. <br>
<br>
David <br>
<br>
On 3/10/2018 7:54 AM, Chris Plummer wrote: <br>
<blockquote type="cite">On 10/2/18 2:38 PM, David Holmes wrote:
<br>
<blockquote type="cite">Chris, <br>
<br>
On 3/10/2018 6:57 AM, Chris Plummer wrote: <br>
<blockquote type="cite"> <br>
<br>
On 10/2/18 1:44 PM, <a class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a> wrote:
<br>
<blockquote type="cite">The general attach sequence ... <br>
<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
<br>
<br>
 the attacher creates an attach_pid file in a directory
where the attachee is runnning <br>
 issues a signal to the attacheee <br>
<br>
  loops waiting for the java_pid file to be created <br>
  default timeout is 10 seconds <br>
<br>
</blockquote>
So getting a FileNotFoundException while in this loop is
OK, but IOException is not. <br>
<br>
<blockquote type="cite">src/hotspot/os/solaris/attachListener_solaris.cpp
<br>
<br>
   attachee creates the java_pid file <br>
   listens til the attacher opens the door <br>
<br>
</blockquote>
I'm don't think this is related, but JDK-8199811 made a
fix in attachListener_solaris.cpp to make it wait up to 10
seconds for initialization to complete before failing the
enqueue. <br>
<br>
<blockquote type="cite">... <br>
Not sure when a bare IOException is thrown rather than
the <br>
more specific FileNotFoundException. <br>
</blockquote>
Where is the IOException originating from? I wonder if the
issue is that the file is in the process of being created,
but is not fully created yet. Maybe it is there, but
owner/group/permissions have not been set yet, and this
results in an IOException instead of
FileNotFoundException. <br>
</blockquote>
<br>
The exception is shown in the bug report: <br>
<br>
 [java.io.IOException: Permission denied <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method) <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
<br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.&lt;init&gt;(VirtualMachineImpl.java:71)
<br>
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
<br>
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
<br>
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98) <br>
<br>
And if you look at the native code the EPERM from open will
cause IOException to be thrown. <br>
<br>
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c <br>
<br>
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open <br>
  (JNIEnv *env, jclass cls, jstring path) <br>
{ <br>
    jboolean isCopy; <br>
    const char* p = GetStringPlatformChars(env, path,
&amp;isCopy); <br>
    if (p == NULL) { <br>
        return 0; <br>
    } else { <br>
        int fd; <br>
        int err = 0; <br>
<br>
        fd = open(p, O_RDWR); <br>
        if (fd == -1) { <br>
            err = errno; <br>
        } <br>
<br>
        if (isCopy) { <br>
            JNU_ReleaseStringPlatformChars(env, path, p); <br>
        } <br>
<br>
        if (fd == -1) { <br>
            if (err == ENOENT) { <br>
                JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL); <br>
            } else { <br>
                char* msg = strdup(strerror(err)); <br>
                JNU_ThrowIOException(env, msg); <br>
                if (msg != NULL) { <br>
                    free(msg); <br>
                } <br>
<br>
<br>
We should add the path to the exception message. <br>
<br>
</blockquote>
Thanks David. So if EPERM is the error and a retry 100ms later
works, I think that supports my hypothesis that the file is
not quite fully created. So Gary's fix is probably fine. The
only other possible fix I can think of that wouldn't require
an explicit delay (or multiple retries) is probably not worth
the complexity. It would require that the attachee create two
files, and the attacher try to open the second file first.
When it either opens or returns EPERM, you know the first file
can safety be opened. <br>
<br>
Chris <br>
<blockquote type="cite">David <br>
----- <br>
<br>
<blockquote type="cite">Chris <br>
<blockquote type="cite"> <br>
<br>
<br>
On 10/2/18 4:11 PM, Chris Plummer wrote: <br>
<blockquote type="cite">Can you summarize how the attach
handshaking is suppose to work? I'm just wondering why
the attacher would ever be looking for the file before
the attachee has created it. It seems a proper
handshake would prevent this. Maybe there's some sort
of visibility issue where the attachee has indeed
created the file, but it is not immediately visible to
the attacher process. <br>
<br>
Chris <br>
<br>
On 10/2/18 12:27 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The problem reproduced pretty
quickly. <br>
I added a call to checkPermission and revealed the <br>
"file not found" from the stat call when the
IOException <br>
was detected. <br>
<br>
There has been some flakiness from the Solaris test
machines today, <br>
so I'll continue with the testing a bit longer. <br>
<br>
On 10/2/18 3:12 PM, Chris Plummer wrote: <br>
<blockquote type="cite">Without the fix was this
issue easy enough to reproduce that you can be
sure this is resolving it? <br>
<br>
Chris <br>
<br>
On 10/2/18 8:16 AM, Gary Adams wrote: <br>
<blockquote type="cite">Solaris debug builds are
failing tests that use the attach interface. <br>
An IOException is reported when the java_pid
file is not opened. <br>
<br>
It appears that the attempt to attach is taking
place too quickly. <br>
This workaround will allow the open operation to
be retried <br>
after a short pause. <br>
<br>
  Webrev: <a class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Egadams/8210337/webrev/"
moz-do-not-send="true">http://cr.openjdk.java.net/~gadams/8210337/webrev/</a>
<br>
  Issue: <a class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8210337"
moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8210337</a>
<br>
<br>
Testing is in progress. <br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>
Gary Adams
2018-10-04 16:48:07 UTC
Permalink
My delay and retry just duplicated the openDoor retry.
The normal processing of FileNotFoundException(ENOENT) is to retry
several times until the file is available.

But the original problem reported is a "Permission denied" (EACCESS or
EPERM).
Delay and retry will not resolve a permissions error.
Didn't the retry after 100ms delay work? If yes, why would it if the
problem is that a java_pid was not cleaned up?
Chris
Post by Gary Adams
First, let me retract the proposed change,
it is not the right solution to the problem originally
reported.
Second, as a bit of explanation consider the code fragments below.
The high level processing calls openDoor which is willing to retry
the operation as long as the error is flagged specifically
as a FileNotFoundException.
VirtualMachineImpl.java:72
VirtualMachineImpl.c:81
During my testing I had added a check VirtualMachineImpl.java:214
and when an IOException was detected made a call to checkPermissions
to get more detailed information about the IOException. The error
I saw was an ENOENT from the stat call. And not the detailed checks for
specific permissions issues (VirtualMachineImpl.c:143)
VirtualMachineImpl.c:118
VirtualMachineImpl.c:147
What I missed in the original proposed solution was a
FileNotFoundException
extends IOException. That means my delay and retry just duplicates
the higher
level retry around the openDoor call.
java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
had to have come from
VirtualMachineImpl.c:70
VirtualMachineImpl.c:84
which means the actual open call reported the file does exist
but the permissions do not allow the file to be accessed.
That also means the normal mechanism of removing leftover
java_pid files would not have cleaned up another user's
java_pid files.
=====
...
67 // Opens the door file to the target VM. If the
file is not
68 // found it might mean that the attach mechanism
isn't started in the
69 // target VM so we attempt to start it and retry.
70 try {
71 fd = openDoor(pid);
72 } catch (FileNotFoundException fnf1) {
73 File f = createAttachFile(pid);
74 try {
75 sigquit(pid);
76
77 // give the target VM time to start the
attach mechanism
78 final int delay_step = 100;
79 final long timeout = attachTimeout();
80 long time_spend = 0;
81 long delay = 0;
82 do {
83 // Increase timeout on each attempt to
reduce polling
84 delay += delay_step;
85 try {
86 Thread.sleep(delay);
87 } catch (InterruptedException x) { }
88 try {
89 fd = openDoor(pid);
90 } catch (FileNotFoundException fnf2) {
91 // pass
92 }
93
94 time_spend += delay;
95 if (time_spend > timeout/2 && fd == -1) {
96 // Send QUIT again to give target
VM the last chance to react
97 sigquit(pid);
98 }
99 } while (time_spend <= timeout && fd == -1);
100 if (fd == -1) {
101 throw new AttachNotSupportedException(
102 String.format("Unable to open door
%s: " +
103 "target process %d doesn't
respond within %dms " +
104 "or HotSpot VM not loaded",
socket_path, pid, time_spend));
105 }
...
212 // The door is attached to .java_pid<pid> in the
temporary directory.
213 private int openDoor(int pid) throws IOException {
214 socket_path = tmpdir + "/.java_pid" + pid;
215 fd = open(socket_path);
216
217 // Check that the file owner/permission to avoid
attaching to
218 // bogus process
219 try {
220 checkPermissions(socket_path);
221 } catch (IOException ioe) {
222 close(fd);
223 throw ioe;
224 }
225 return fd;
226 }
=====
...
59 JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
60 (JNIEnv *env, jclass cls, jstring path)
61 {
62 jboolean isCopy;
63 const char* p = GetStringPlatformChars(env, path, &isCopy);
64 if (p == NULL) {
65 return 0;
66 } else {
67 int fd;
68 int err = 0;
69
70 fd = open(p, O_RDWR);
71 if (fd == -1) {
72 err = errno;
73 }
74
75 if (isCopy) {
76 JNU_ReleaseStringPlatformChars(env, path, p);
77 }
78
79 if (fd == -1) {
80 if (err == ENOENT) {
81 JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
82 } else {
83 char* msg = strdup(strerror(err));
84 JNU_ThrowIOException(env, msg);
85 if (msg != NULL) {
86 free(msg);
87 }
88 }
89 }
90 return fd;
91 }
92 }
...
99 JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
100 (JNIEnv *env, jclass cls, jstring path)
101 {
102 jboolean isCopy;
103 const char* p = GetStringPlatformChars(env, path, &isCopy);
104 if (p != NULL) {
105 struct stat64 sb;
106 uid_t uid, gid;
107 int res;
108
109 memset(&sb, 0, sizeof(struct stat64));
110
111 /*
112 * Check that the path is owned by the effective
uid/gid of this
113 * process. Also check that group/other access is
not allowed.
114 */
115 uid = geteuid();
116 gid = getegid();
117
118 res = stat64(p, &sb);
119 if (res != 0) {
120 /* save errno */
121 res = errno;
122 }
123
124 if (res == 0) {
125 char msg[100];
126 jboolean isError = JNI_FALSE;
127 if (sb.st_uid != uid && uid != ROOT_UID) {
128 snprintf(msg, sizeof(msg),
129 "file should be owned by the current
user (which is %d) but is owned by %d", uid, sb.st_uid);
130 isError = JNI_TRUE;
131 } else if (sb.st_gid != gid && uid != ROOT_UID) {
132 snprintf(msg, sizeof(msg),
133 "file's group should be the current
group (which is %d) but the group is %d", gid, sb.st_gid);
134 isError = JNI_TRUE;
135 } else if ((sb.st_mode &
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
136 snprintf(msg, sizeof(msg),
137 "file should only be readable and
writable by the owner but has 0%03o access", sb.st_mode & 0777);
138 isError = JNI_TRUE;
139 }
140 if (isError) {
141 char buf[256];
142 snprintf(buf, sizeof(buf), "well-known file
%s is not secure: %s", p, msg);
143 JNU_ThrowIOException(env, buf);
144 }
145 } else {
146 char* msg = strdup(strerror(res));
147 JNU_ThrowIOException(env, msg);
148 if (msg != NULL) {
149 free(msg);
150 }
151 }
Post by David Holmes
Minor correction: EPERM -> EACCES for Solaris
Hard to see how to get a transient EACCES when opening a file ...
though as it is really a door I guess there could be additional
complexity.
David
Post by Chris Plummer
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
the attacher creates an attach_pid file in a directory where
the attachee is runnning
issues a signal to the attacheee
loops waiting for the java_pid file to be created
default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK, but
IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
attachee creates the java_pid file
listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds for
initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the issue
is that the file is in the process of being created, but is not
fully created yet. Maybe it is there, but owner/group/permissions
have not been set yet, and this results in an IOException instead
of FileNotFoundException.
[java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will cause
IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL Java_sun_tools_attach_VirtualMachineImpl_open
(JNIEnv *env, jclass cls, jstring path)
{
jboolean isCopy;
const char* p = GetStringPlatformChars(env, path, &isCopy);
if (p == NULL) {
return 0;
} else {
int fd;
int err = 0;
fd = open(p, O_RDWR);
if (fd == -1) {
err = errno;
}
if (isCopy) {
JNU_ReleaseStringPlatformChars(env, path, p);
}
if (fd == -1) {
if (err == ENOENT) {
JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
} else {
char* msg = strdup(strerror(err));
JNU_ThrowIOException(env, msg);
if (msg != NULL) {
free(msg);
}
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms later
works, I think that supports my hypothesis that the file is not
quite fully created. So Gary's fix is probably fine. The only other
possible fix I can think of that wouldn't require an explicit delay
(or multiple retries) is probably not worth the complexity. It
would require that the attachee create two files, and the attacher
try to open the second file first. When it either opens or returns
EPERM, you know the first file can safety be opened.
Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to
work? I'm just wondering why the attacher would ever be looking
for the file before the attachee has created it. It seems a
proper handshake would prevent this. Maybe there's some sort of
visibility issue where the attachee has indeed created the
file, but it is not immediately visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce that
you can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-10-04 17:25:14 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">But I also thought you said the delay
and retry fixed the problem. How could fix the problem if it is
just duplicating something that is already in place?<br>
<br>
Chris<br>
<br>
On 10/4/18 9:48 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
My delay and retry just duplicated the openDoor retry.<br>
The normal processing of FileNotFoundException(ENOENT) is to retry<br>
several times until the file is available.<br>
<br>
But the original problem reported is a "Permission denied"
(EACCESS or EPERM).<br>
Delay and retry will not resolve a permissions error.<br>
<br>
On 10/4/18, 12:30 PM, Chris Plummer wrote:
<blockquote
cite="mid:7115147a-2aae-d168-2db5-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Didn't the retry after 100ms delay
work? If yes, why would it if the problem is that a java_pid
was not cleaned up?<br>
<br>
Chris<br>
<br>
On 10/4/18 8:54 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
First, let me retract the proposed change,<br>
it is not the right solution to the problem originally<br>
reported.<br>
<br>
Second, as a bit of explanation consider the code fragments
below.<br>
<br>
The high level processing calls openDoor which is willing to
retry <br>
the operation as long as the error is flagged specifically<br>
as a FileNotFoundException.<br>
<br>
  VirtualMachineImpl.java:72<br>
  VirtualMachineImpl.c:81<br>
<br>
During my testing I had added a check
VirtualMachineImpl.java:214<br>
and when an IOException was detected made a call to
checkPermissions<br>
to get more detailed information about the IOException. The
error <br>
I saw was an ENOENT from the stat call. And not the detailed
checks for<br>
specific permissions issues (VirtualMachineImpl.c:143)<br>
<br>
    VirtualMachineImpl.c:118<br>
    VirtualMachineImpl.c:147<br>
<br>
What I missed in the original proposed solution was a
FileNotFoundException<br>
extends IOException. That means my delay and retry just
duplicates the higher<br>
level retry around the openDoor call.<br>
<br>
Third, the original error message logged in the bug report :<br>
<br>
<span style="caret-color: rgb(51, 51, 51); color: rgb(51, 51,
51); font-family: Arial, sans-serif; font-size:
14.000000953674316px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(255,
255, 255); text-decoration: none; display: inline
!important; float: none;">java.io.IOException: Permission
denied<span class="Apple-converted-space"> </span></span><br
style="caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51);
font-family: Arial, sans-serif; font-size:
14.000000953674316px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; text-decoration: none;">
<span style="caret-color: rgb(51, 51, 51); color: rgb(51, 51,
51); font-family: Arial, sans-serif; font-size:
14.000000953674316px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(255,
255, 255); text-decoration: none; display: inline
!important; float: none;">at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method)<span class="Apple-converted-space"> </span></span><br>
<br>
had to have come from<br>
<br>
  VirtualMachineImpl.c:70<br>
  VirtualMachineImpl.c:84<br>
<br>
which means the actual open call reported the file does exist<br>
but the permissions do not allow the file to be accessed.<br>
That also means the normal mechanism of removing leftover <br>
java_pid files would not have cleaned up another user's<br>
java_pid files.<br>
<br>
=====<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:<br>
...<br>
    67            // Opens the door file to the target VM. If
the file is not<br>
    68            // found it might mean that the attach
mechanism isn't started in the<br>
    69            // target VM so we attempt to start it and
retry.<br>
    70            try {<br>
    71                fd = openDoor(pid);<br>
    72            } catch (FileNotFoundException fnf1) {<br>
    73                File f = createAttachFile(pid);<br>
    74                try {<br>
    75                    sigquit(pid);<br>
    76    <br>
    77                    // give the target VM time to start
the attach mechanism<br>
    78                    final int delay_step = 100;<br>
    79                    final long timeout =
attachTimeout();<br>
    80                    long time_spend = 0;<br>
    81                    long delay = 0;<br>
    82                    do {<br>
    83                        // Increase timeout on each
attempt to reduce polling<br>
    84                        delay += delay_step;<br>
    85                        try {<br>
    86                            Thread.sleep(delay);<br>
    87                        } catch (InterruptedException x)
{ }<br>
    88                        try {<br>
    89                            fd = openDoor(pid);<br>
    90                        } catch (FileNotFoundException
fnf2) {<br>
    91                            // pass<br>
    92                        }<br>
    93    <br>
    94                        time_spend += delay;<br>
    95                        if (time_spend &gt; timeout/2
&amp;&amp; fd == -1) {<br>
    96                            // Send QUIT again to give
target VM the last chance to react<br>
    97                            sigquit(pid);<br>
    98                        }<br>
    99                    } while (time_spend &lt;= timeout
&amp;&amp; fd == -1);<br>
   100                    if (fd  == -1) {<br>
   101                        throw new
AttachNotSupportedException(<br>
   102                            String.format("Unable to
open door %s: " +<br>
   103                              "target process %d doesn't
respond within %dms " +<br>
   104                              "or HotSpot VM not
loaded", socket_path, pid, time_spend));<br>
   105                    }<br>
...<br>
   212        // The door is attached to .java_pid&lt;pid&gt;
in the temporary directory.<br>
   213        private int openDoor(int pid) throws IOException
{<br>
   214            socket_path = tmpdir + "/.java_pid" + pid;<br>
   215            fd = open(socket_path);<br>
   216    <br>
   217            // Check that the file owner/permission to
avoid attaching to<br>
   218            // bogus process<br>
   219            try {<br>
   220                checkPermissions(socket_path);<br>
   221            } catch (IOException ioe) {<br>
   222                close(fd);<br>
   223                throw ioe;<br>
   224            }<br>
   225            return fd;<br>
   226        }<br>
<br>
=====<br>
src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:<br>
...<br>
    59    JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open<br>
    60      (JNIEnv *env, jclass cls, jstring path)<br>
    61    {<br>
    62        jboolean isCopy;<br>
    63        const char* p = GetStringPlatformChars(env,
path, &amp;isCopy);<br>
    64        if (p == NULL) {<br>
    65            return 0;<br>
    66        } else {<br>
    67            int fd;<br>
    68            int err = 0;<br>
    69    <br>
    70            fd = open(p, O_RDWR);<br>
    71            if (fd == -1) {<br>
    72                err = errno;<br>
    73            }<br>
    74    <br>
    75            if (isCopy) {<br>
    76                JNU_ReleaseStringPlatformChars(env,
path, p);<br>
    77            }<br>
    78    <br>
    79            if (fd == -1) {<br>
    80                if (err == ENOENT) {<br>
    81                    JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);<br>
    82                } else {<br>
    83                    char* msg = strdup(strerror(err));<br>
    84                    JNU_ThrowIOException(env, msg);<br>
    85                    if (msg != NULL) {<br>
    86                        free(msg);<br>
    87                    }<br>
    88                }<br>
    89            }<br>
    90            return fd;<br>
    91        }<br>
    92    }<br>
...<br>
    99    JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions<br>
   100      (JNIEnv *env, jclass cls, jstring path)<br>
   101    {<br>
   102        jboolean isCopy;<br>
   103        const char* p = GetStringPlatformChars(env,
path, &amp;isCopy);<br>
   104        if (p != NULL) {<br>
   105            struct stat64 sb;<br>
   106            uid_t uid, gid;<br>
   107            int res;<br>
   108    <br>
   109            memset(&amp;sb, 0, sizeof(struct stat64));<br>
   110    <br>
   111            /*<br>
   112             * Check that the path is owned by the
effective uid/gid of this<br>
   113             * process. Also check that group/other
access is not allowed.<br>
   114             */<br>
   115            uid = geteuid();<br>
   116            gid = getegid();<br>
   117    <br>
   118            res = stat64(p, &amp;sb);<br>
   119            if (res != 0) {<br>
   120                /* save errno */<br>
   121                res = errno;<br>
   122            }<br>
   123    <br>
   124            if (res == 0) {<br>
   125                char msg[100];<br>
   126                jboolean isError = JNI_FALSE;<br>
   127                if (sb.st_uid != uid &amp;&amp; uid !=
ROOT_UID) {<br>
   128                    snprintf(msg, sizeof(msg),<br>
   129                        "file should be owned by the
current user (which is %d) but is owned by %d", uid,
sb.st_uid);<br>
   130                    isError = JNI_TRUE;<br>
   131                } else if (sb.st_gid != gid &amp;&amp;
uid != ROOT_UID) {<br>
   132                    snprintf(msg, sizeof(msg),<br>
   133                        "file's group should be the
current group (which is %d) but the group is %d", gid,
sb.st_gid);<br>
   134                    isError = JNI_TRUE;<br>
   135                } else if ((sb.st_mode &amp;
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {<br>
   136                    snprintf(msg, sizeof(msg),<br>
   137                        "file should only be readable
and writable by the owner but has 0%03o access", sb.st_mode
&amp; 0777);<br>
   138                    isError = JNI_TRUE;<br>
   139                }<br>
   140                if (isError) {<br>
   141                    char buf[256];<br>
   142                    snprintf(buf, sizeof(buf),
"well-known file %s is not secure: %s", p, msg);<br>
   143                    JNU_ThrowIOException(env, buf);<br>
   144                }<br>
   145            } else {<br>
   146                char* msg = strdup(strerror(res));<br>
   147                JNU_ThrowIOException(env, msg);<br>
   148                if (msg != NULL) {<br>
   149                    free(msg);<br>
   150                }<br>
   151            }<br>
<br>
On 10/2/18, 6:23 PM, David Holmes wrote:
<blockquote
cite="mid:e1adb28a-031f-5c28-74a5-***@oracle.com"
type="cite">Minor correction: EPERM -&gt; EACCES for Solaris
<br>
<br>
Hard to see how to get a transient EACCES when opening a
file ... though as it is really a door I guess there could
be additional complexity. <br>
<br>
David <br>
<br>
On 3/10/2018 7:54 AM, Chris Plummer wrote: <br>
<blockquote type="cite">On 10/2/18 2:38 PM, David Holmes
wrote: <br>
<blockquote type="cite">Chris, <br>
<br>
On 3/10/2018 6:57 AM, Chris Plummer wrote: <br>
<blockquote type="cite"> <br>
<br>
On 10/2/18 1:44 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The general attach sequence
... <br>
<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
<br>
<br>
 the attacher creates an attach_pid file in a
directory where the attachee is runnning <br>
 issues a signal to the attacheee <br>
<br>
  loops waiting for the java_pid file to be created
<br>
  default timeout is 10 seconds <br>
<br>
</blockquote>
So getting a FileNotFoundException while in this loop
is OK, but IOException is not. <br>
<br>
<blockquote type="cite">src/hotspot/os/solaris/attachListener_solaris.cpp
<br>
<br>
   attachee creates the java_pid file <br>
   listens til the attacher opens the door <br>
<br>
</blockquote>
I'm don't think this is related, but JDK-8199811 made
a fix in attachListener_solaris.cpp to make it wait up
to 10 seconds for initialization to complete before
failing the enqueue. <br>
<br>
<blockquote type="cite">... <br>
Not sure when a bare IOException is thrown rather
than the <br>
more specific FileNotFoundException. <br>
</blockquote>
Where is the IOException originating from? I wonder if
the issue is that the file is in the process of being
created, but is not fully created yet. Maybe it is
there, but owner/group/permissions have not been set
yet, and this results in an IOException instead of
FileNotFoundException. <br>
</blockquote>
<br>
The exception is shown in the bug report: <br>
<br>
 [java.io.IOException: Permission denied <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method) <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
<br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.&lt;init&gt;(VirtualMachineImpl.java:71)
<br>
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
<br>
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
<br>
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98) <br>
<br>
And if you look at the native code the EPERM from open
will cause IOException to be thrown. <br>
<br>
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c <br>
<br>
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open <br>
  (JNIEnv *env, jclass cls, jstring path) <br>
{ <br>
    jboolean isCopy; <br>
    const char* p = GetStringPlatformChars(env, path,
&amp;isCopy); <br>
    if (p == NULL) { <br>
        return 0; <br>
    } else { <br>
        int fd; <br>
        int err = 0; <br>
<br>
        fd = open(p, O_RDWR); <br>
        if (fd == -1) { <br>
            err = errno; <br>
        } <br>
<br>
        if (isCopy) { <br>
            JNU_ReleaseStringPlatformChars(env, path,
p); <br>
        } <br>
<br>
        if (fd == -1) { <br>
            if (err == ENOENT) { <br>
                JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL); <br>
            } else { <br>
                char* msg = strdup(strerror(err)); <br>
                JNU_ThrowIOException(env, msg); <br>
                if (msg != NULL) { <br>
                    free(msg); <br>
                } <br>
<br>
<br>
We should add the path to the exception message. <br>
<br>
</blockquote>
Thanks David. So if EPERM is the error and a retry 100ms
later works, I think that supports my hypothesis that the
file is not quite fully created. So Gary's fix is probably
fine. The only other possible fix I can think of that
wouldn't require an explicit delay (or multiple retries)
is probably not worth the complexity. It would require
that the attachee create two files, and the attacher try
to open the second file first. When it either opens or
returns EPERM, you know the first file can safety be
opened. <br>
<br>
Chris <br>
<blockquote type="cite">David <br>
----- <br>
<br>
<blockquote type="cite">Chris <br>
<blockquote type="cite"> <br>
<br>
<br>
On 10/2/18 4:11 PM, Chris Plummer wrote: <br>
<blockquote type="cite">Can you summarize how the
attach handshaking is suppose to work? I'm just
wondering why the attacher would ever be looking
for the file before the attachee has created it.
It seems a proper handshake would prevent this.
Maybe there's some sort of visibility issue where
the attachee has indeed created the file, but it
is not immediately visible to the attacher
process. <br>
<br>
Chris <br>
<br>
On 10/2/18 12:27 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The problem reproduced
pretty quickly. <br>
I added a call to checkPermission and revealed
the <br>
"file not found" from the stat call when the
IOException <br>
was detected. <br>
<br>
There has been some flakiness from the Solaris
test machines today, <br>
so I'll continue with the testing a bit longer.
<br>
<br>
On 10/2/18 3:12 PM, Chris Plummer wrote: <br>
<blockquote type="cite">Without the fix was this
issue easy enough to reproduce that you can be
sure this is resolving it? <br>
<br>
Chris <br>
<br>
On 10/2/18 8:16 AM, Gary Adams wrote: <br>
<blockquote type="cite">Solaris debug builds
are failing tests that use the attach
interface. <br>
An IOException is reported when the java_pid
file is not opened. <br>
<br>
It appears that the attempt to attach is
taking place too quickly. <br>
This workaround will allow the open
operation to be retried <br>
after a short pause. <br>
<br>
  Webrev: <a class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Egadams/8210337/webrev/"
moz-do-not-send="true">http://cr.openjdk.java.net/~gadams/8210337/webrev/</a>
<br>
  Issue: <a class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8210337"
moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8210337</a>
<br>
<br>
Testing is in progress. <br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>
Gary Adams
2018-10-04 17:49:35 UTC
Permalink
My delay and retry did not fix the problem with permission denied.

When I was diagnosing the problem I instrumented the code
to catch an IOException and call checkPermission to get
more detail about the IOException. The error reported
from calling checkPermission was ENOENT (stat).

The code change I then proposed was catch the IOException,
delay, and retry the open. That fixed the problem of
ENOENT, but had nothing to do with "permission denied".
But I also thought you said the delay and retry fixed the problem. How
could fix the problem if it is just duplicating something that is
already in place?
Chris
Post by Gary Adams
My delay and retry just duplicated the openDoor retry.
The normal processing of FileNotFoundException(ENOENT) is to retry
several times until the file is available.
But the original problem reported is a "Permission denied" (EACCESS
or EPERM).
Delay and retry will not resolve a permissions error.
Didn't the retry after 100ms delay work? If yes, why would it if the
problem is that a java_pid was not cleaned up?
Chris
Post by Gary Adams
First, let me retract the proposed change,
it is not the right solution to the problem originally
reported.
Second, as a bit of explanation consider the code fragments below.
The high level processing calls openDoor which is willing to retry
the operation as long as the error is flagged specifically
as a FileNotFoundException.
VirtualMachineImpl.java:72
VirtualMachineImpl.c:81
During my testing I had added a check VirtualMachineImpl.java:214
and when an IOException was detected made a call to checkPermissions
to get more detailed information about the IOException. The error
I saw was an ENOENT from the stat call. And not the detailed checks for
specific permissions issues (VirtualMachineImpl.c:143)
VirtualMachineImpl.c:118
VirtualMachineImpl.c:147
What I missed in the original proposed solution was a
FileNotFoundException
extends IOException. That means my delay and retry just duplicates
the higher
level retry around the openDoor call.
java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
had to have come from
VirtualMachineImpl.c:70
VirtualMachineImpl.c:84
which means the actual open call reported the file does exist
but the permissions do not allow the file to be accessed.
That also means the normal mechanism of removing leftover
java_pid files would not have cleaned up another user's
java_pid files.
=====
...
67 // Opens the door file to the target VM. If the
file is not
68 // found it might mean that the attach mechanism
isn't started in the
69 // target VM so we attempt to start it and retry.
70 try {
71 fd = openDoor(pid);
72 } catch (FileNotFoundException fnf1) {
73 File f = createAttachFile(pid);
74 try {
75 sigquit(pid);
76
77 // give the target VM time to start the
attach mechanism
78 final int delay_step = 100;
79 final long timeout = attachTimeout();
80 long time_spend = 0;
81 long delay = 0;
82 do {
83 // Increase timeout on each attempt
to reduce polling
84 delay += delay_step;
85 try {
86 Thread.sleep(delay);
87 } catch (InterruptedException x) { }
88 try {
89 fd = openDoor(pid);
90 } catch (FileNotFoundException fnf2) {
91 // pass
92 }
93
94 time_spend += delay;
95 if (time_spend > timeout/2 && fd == -1) {
96 // Send QUIT again to give target
VM the last chance to react
97 sigquit(pid);
98 }
99 } while (time_spend <= timeout && fd == -1);
100 if (fd == -1) {
101 throw new AttachNotSupportedException(
102 String.format("Unable to open
door %s: " +
103 "target process %d doesn't
respond within %dms " +
104 "or HotSpot VM not loaded",
socket_path, pid, time_spend));
105 }
...
212 // The door is attached to .java_pid<pid> in the
temporary directory.
213 private int openDoor(int pid) throws IOException {
214 socket_path = tmpdir + "/.java_pid" + pid;
215 fd = open(socket_path);
216
217 // Check that the file owner/permission to avoid
attaching to
218 // bogus process
219 try {
220 checkPermissions(socket_path);
221 } catch (IOException ioe) {
222 close(fd);
223 throw ioe;
224 }
225 return fd;
226 }
=====
...
59 JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
60 (JNIEnv *env, jclass cls, jstring path)
61 {
62 jboolean isCopy;
63 const char* p = GetStringPlatformChars(env, path, &isCopy);
64 if (p == NULL) {
65 return 0;
66 } else {
67 int fd;
68 int err = 0;
69
70 fd = open(p, O_RDWR);
71 if (fd == -1) {
72 err = errno;
73 }
74
75 if (isCopy) {
76 JNU_ReleaseStringPlatformChars(env, path, p);
77 }
78
79 if (fd == -1) {
80 if (err == ENOENT) {
81 JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
82 } else {
83 char* msg = strdup(strerror(err));
84 JNU_ThrowIOException(env, msg);
85 if (msg != NULL) {
86 free(msg);
87 }
88 }
89 }
90 return fd;
91 }
92 }
...
99 JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
100 (JNIEnv *env, jclass cls, jstring path)
101 {
102 jboolean isCopy;
103 const char* p = GetStringPlatformChars(env, path, &isCopy);
104 if (p != NULL) {
105 struct stat64 sb;
106 uid_t uid, gid;
107 int res;
108
109 memset(&sb, 0, sizeof(struct stat64));
110
111 /*
112 * Check that the path is owned by the effective
uid/gid of this
113 * process. Also check that group/other access is
not allowed.
114 */
115 uid = geteuid();
116 gid = getegid();
117
118 res = stat64(p, &sb);
119 if (res != 0) {
120 /* save errno */
121 res = errno;
122 }
123
124 if (res == 0) {
125 char msg[100];
126 jboolean isError = JNI_FALSE;
127 if (sb.st_uid != uid && uid != ROOT_UID) {
128 snprintf(msg, sizeof(msg),
129 "file should be owned by the current
user (which is %d) but is owned by %d", uid, sb.st_uid);
130 isError = JNI_TRUE;
131 } else if (sb.st_gid != gid && uid != ROOT_UID) {
132 snprintf(msg, sizeof(msg),
133 "file's group should be the current
group (which is %d) but the group is %d", gid, sb.st_gid);
134 isError = JNI_TRUE;
135 } else if ((sb.st_mode &
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
136 snprintf(msg, sizeof(msg),
137 "file should only be readable and
writable by the owner but has 0%03o access", sb.st_mode & 0777);
138 isError = JNI_TRUE;
139 }
140 if (isError) {
141 char buf[256];
142 snprintf(buf, sizeof(buf), "well-known
file %s is not secure: %s", p, msg);
143 JNU_ThrowIOException(env, buf);
144 }
145 } else {
146 char* msg = strdup(strerror(res));
147 JNU_ThrowIOException(env, msg);
148 if (msg != NULL) {
149 free(msg);
150 }
151 }
Post by David Holmes
Minor correction: EPERM -> EACCES for Solaris
Hard to see how to get a transient EACCES when opening a file ...
though as it is really a door I guess there could be additional
complexity.
David
Post by Chris Plummer
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
the attacher creates an attach_pid file in a directory where
the attachee is runnning
issues a signal to the attacheee
loops waiting for the java_pid file to be created
default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK,
but IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
attachee creates the java_pid file
listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds for
initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the
issue is that the file is in the process of being created, but
is not fully created yet. Maybe it is there, but
owner/group/permissions have not been set yet, and this results
in an IOException instead of FileNotFoundException.
[java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will
cause IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
(JNIEnv *env, jclass cls, jstring path)
{
jboolean isCopy;
const char* p = GetStringPlatformChars(env, path, &isCopy);
if (p == NULL) {
return 0;
} else {
int fd;
int err = 0;
fd = open(p, O_RDWR);
if (fd == -1) {
err = errno;
}
if (isCopy) {
JNU_ReleaseStringPlatformChars(env, path, p);
}
if (fd == -1) {
if (err == ENOENT) {
JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
} else {
char* msg = strdup(strerror(err));
JNU_ThrowIOException(env, msg);
if (msg != NULL) {
free(msg);
}
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms later
works, I think that supports my hypothesis that the file is not
quite fully created. So Gary's fix is probably fine. The only
other possible fix I can think of that wouldn't require an
explicit delay (or multiple retries) is probably not worth the
complexity. It would require that the attachee create two files,
and the attacher try to open the second file first. When it
either opens or returns EPERM, you know the first file can safety
be opened.
Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to
work? I'm just wondering why the attacher would ever be
looking for the file before the attachee has created it. It
seems a proper handshake would prevent this. Maybe there's
some sort of visibility issue where the attachee has indeed
created the file, but it is not immediately visible to the
attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce
that you can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the attach
interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too quickly.
This workaround will allow the open operation to be retried
after a short pause.
Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Gary Adams
2018-11-07 19:20:12 UTC
Permalink
If there are no further suggestions on JDK-8210337,
I plan to close it out as cannot reproduce.

Similar bugs had been filed for the "Permission denied" error
from the openDoor request failure and each was attributed
to an infrastructure issue. e.g. another user with the same
pid left a temporary file that is blocking the current test
from attaching correctly.
Post by Gary Adams
My delay and retry did not fix the problem with permission denied.
When I was diagnosing the problem I instrumented the code
to catch an IOException and call checkPermission to get
more detail about the IOException. The error reported
from calling checkPermission was ENOENT (stat).
The code change I then proposed was catch the IOException,
delay, and retry the open. That fixed the problem of
ENOENT, but had nothing to do with "permission denied".
But I also thought you said the delay and retry fixed the problem.
How could fix the problem if it is just duplicating something that is
already in place?
Chris
Post by Gary Adams
My delay and retry just duplicated the openDoor retry.
The normal processing of FileNotFoundException(ENOENT) is to retry
several times until the file is available.
But the original problem reported is a "Permission denied" (EACCESS
or EPERM).
Delay and retry will not resolve a permissions error.
Didn't the retry after 100ms delay work? If yes, why would it if
the problem is that a java_pid was not cleaned up?
Chris
Post by Gary Adams
First, let me retract the proposed change,
it is not the right solution to the problem originally
reported.
Second, as a bit of explanation consider the code fragments below.
The high level processing calls openDoor which is willing to retry
the operation as long as the error is flagged specifically
as a FileNotFoundException.
VirtualMachineImpl.java:72
VirtualMachineImpl.c:81
During my testing I had added a check VirtualMachineImpl.java:214
and when an IOException was detected made a call to checkPermissions
to get more detailed information about the IOException. The error
I saw was an ENOENT from the stat call. And not the detailed checks for
specific permissions issues (VirtualMachineImpl.c:143)
VirtualMachineImpl.c:118
VirtualMachineImpl.c:147
What I missed in the original proposed solution was a
FileNotFoundException
extends IOException. That means my delay and retry just duplicates
the higher
level retry around the openDoor call.
java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
had to have come from
VirtualMachineImpl.c:70
VirtualMachineImpl.c:84
which means the actual open call reported the file does exist
but the permissions do not allow the file to be accessed.
That also means the normal mechanism of removing leftover
java_pid files would not have cleaned up another user's
java_pid files.
=====
...
67 // Opens the door file to the target VM. If the
file is not
68 // found it might mean that the attach mechanism
isn't started in the
69 // target VM so we attempt to start it and retry.
70 try {
71 fd = openDoor(pid);
72 } catch (FileNotFoundException fnf1) {
73 File f = createAttachFile(pid);
74 try {
75 sigquit(pid);
76
77 // give the target VM time to start the
attach mechanism
78 final int delay_step = 100;
79 final long timeout = attachTimeout();
80 long time_spend = 0;
81 long delay = 0;
82 do {
83 // Increase timeout on each attempt
to reduce polling
84 delay += delay_step;
85 try {
86 Thread.sleep(delay);
87 } catch (InterruptedException x) { }
88 try {
89 fd = openDoor(pid);
90 } catch (FileNotFoundException fnf2) {
91 // pass
92 }
93
94 time_spend += delay;
95 if (time_spend > timeout/2 && fd == -1) {
96 // Send QUIT again to give
target VM the last chance to react
97 sigquit(pid);
98 }
99 } while (time_spend <= timeout && fd == -1);
100 if (fd == -1) {
101 throw new AttachNotSupportedException(
102 String.format("Unable to open
door %s: " +
103 "target process %d doesn't
respond within %dms " +
104 "or HotSpot VM not loaded",
socket_path, pid, time_spend));
105 }
...
212 // The door is attached to .java_pid<pid> in the
temporary directory.
213 private int openDoor(int pid) throws IOException {
214 socket_path = tmpdir + "/.java_pid" + pid;
215 fd = open(socket_path);
216
217 // Check that the file owner/permission to avoid
attaching to
218 // bogus process
219 try {
220 checkPermissions(socket_path);
221 } catch (IOException ioe) {
222 close(fd);
223 throw ioe;
224 }
225 return fd;
226 }
=====
...
59 JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
60 (JNIEnv *env, jclass cls, jstring path)
61 {
62 jboolean isCopy;
63 const char* p = GetStringPlatformChars(env, path, &isCopy);
64 if (p == NULL) {
65 return 0;
66 } else {
67 int fd;
68 int err = 0;
69
70 fd = open(p, O_RDWR);
71 if (fd == -1) {
72 err = errno;
73 }
74
75 if (isCopy) {
76 JNU_ReleaseStringPlatformChars(env, path, p);
77 }
78
79 if (fd == -1) {
80 if (err == ENOENT) {
81 JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
82 } else {
83 char* msg = strdup(strerror(err));
84 JNU_ThrowIOException(env, msg);
85 if (msg != NULL) {
86 free(msg);
87 }
88 }
89 }
90 return fd;
91 }
92 }
...
99 JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
100 (JNIEnv *env, jclass cls, jstring path)
101 {
102 jboolean isCopy;
103 const char* p = GetStringPlatformChars(env, path, &isCopy);
104 if (p != NULL) {
105 struct stat64 sb;
106 uid_t uid, gid;
107 int res;
108
109 memset(&sb, 0, sizeof(struct stat64));
110
111 /*
112 * Check that the path is owned by the effective
uid/gid of this
113 * process. Also check that group/other access
is not allowed.
114 */
115 uid = geteuid();
116 gid = getegid();
117
118 res = stat64(p, &sb);
119 if (res != 0) {
120 /* save errno */
121 res = errno;
122 }
123
124 if (res == 0) {
125 char msg[100];
126 jboolean isError = JNI_FALSE;
127 if (sb.st_uid != uid && uid != ROOT_UID) {
128 snprintf(msg, sizeof(msg),
129 "file should be owned by the current
user (which is %d) but is owned by %d", uid, sb.st_uid);
130 isError = JNI_TRUE;
131 } else if (sb.st_gid != gid && uid != ROOT_UID) {
132 snprintf(msg, sizeof(msg),
133 "file's group should be the current
group (which is %d) but the group is %d", gid, sb.st_gid);
134 isError = JNI_TRUE;
135 } else if ((sb.st_mode &
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
136 snprintf(msg, sizeof(msg),
137 "file should only be readable and
writable by the owner but has 0%03o access", sb.st_mode & 0777);
138 isError = JNI_TRUE;
139 }
140 if (isError) {
141 char buf[256];
142 snprintf(buf, sizeof(buf), "well-known
file %s is not secure: %s", p, msg);
143 JNU_ThrowIOException(env, buf);
144 }
145 } else {
146 char* msg = strdup(strerror(res));
147 JNU_ThrowIOException(env, msg);
148 if (msg != NULL) {
149 free(msg);
150 }
151 }
Post by David Holmes
Minor correction: EPERM -> EACCES for Solaris
Hard to see how to get a transient EACCES when opening a file ...
though as it is really a door I guess there could be additional
complexity.
David
Post by Chris Plummer
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
the attacher creates an attach_pid file in a directory where
the attachee is runnning
issues a signal to the attacheee
loops waiting for the java_pid file to be created
default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK,
but IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
attachee creates the java_pid file
listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait up to 10 seconds
for initialization to complete before failing the enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the
issue is that the file is in the process of being created, but
is not fully created yet. Maybe it is there, but
owner/group/permissions have not been set yet, and this
results in an IOException instead of FileNotFoundException.
[java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will
cause IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
(JNIEnv *env, jclass cls, jstring path)
{
jboolean isCopy;
const char* p = GetStringPlatformChars(env, path, &isCopy);
if (p == NULL) {
return 0;
} else {
int fd;
int err = 0;
fd = open(p, O_RDWR);
if (fd == -1) {
err = errno;
}
if (isCopy) {
JNU_ReleaseStringPlatformChars(env, path, p);
}
if (fd == -1) {
if (err == ENOENT) {
JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
} else {
char* msg = strdup(strerror(err));
JNU_ThrowIOException(env, msg);
if (msg != NULL) {
free(msg);
}
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms later
works, I think that supports my hypothesis that the file is not
quite fully created. So Gary's fix is probably fine. The only
other possible fix I can think of that wouldn't require an
explicit delay (or multiple retries) is probably not worth the
complexity. It would require that the attachee create two files,
and the attacher try to open the second file first. When it
either opens or returns EPERM, you know the first file can
safety be opened.
Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to
work? I'm just wondering why the attacher would ever be
looking for the file before the attachee has created it. It
seems a proper handshake would prevent this. Maybe there's
some sort of visibility issue where the attachee has indeed
created the file, but it is not immediately visible to the
attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test
machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce
that you can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the
attach interface.
An IOException is reported when the java_pid file is not opened.
It appears that the attempt to attach is taking place too
quickly.
This workaround will allow the open operation to be retried
after a short pause.
Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-11-07 20:09:58 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Are you saying that the PID was
recycled, so the issue is a test run from long ago leaving behind
the file? If so, I thought there was something in the attach code
that cleaned up these PID files that were left behind. In general
I would not count this as an infra issue. PID files left behind
are the fault of the JVM. Maybe there is something wrong with the
cleanup code on solaris, or maybe we don't clean up on any
platform, but cycle through PIDs a lot faster on solaris.<br>
<br>
Chris<br>
<br>
On 11/7/18 11:20 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
If there are no further suggestions on JDK-8210337,<br>
I plan to close it out as cannot reproduce.<br>
<br>
Similar bugs had been filed for the "Permission denied" error<br>
from the openDoor request failure and each was attributed <br>
to an infrastructure issue. e.g. another user with the same <br>
pid left a temporary file that is blocking the current test <br>
from attaching correctly.<br>
<br>
On 10/4/18, 1:49 PM, Gary Adams wrote:
<blockquote cite="mid:***@oracle.com" type="cite">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry did not fix the problem with permission
denied.<br>
<br>
When I was diagnosing the problem I instrumented the code <br>
to catch an IOException and call checkPermission to get <br>
more detail about the IOException. The error reported<br>
from calling checkPermission was ENOENT (stat).<br>
<br>
The code change I then proposed was catch the IOException,<br>
delay, and retry the open. That fixed the problem of <br>
ENOENT, but had nothing to do with "permission denied".<br>
<br>
On 10/4/18, 1:25 PM, Chris Plummer wrote:
<blockquote
cite="mid:339eed9a-21e8-2a19-7a38-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">But I also thought you said the
delay and retry fixed the problem. How could fix the problem
if it is just duplicating something that is already in
place?<br>
<br>
Chris<br>
<br>
On 10/4/18 9:48 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry just duplicated the openDoor retry.<br>
The normal processing of FileNotFoundException(ENOENT) is to
retry<br>
several times until the file is available.<br>
<br>
But the original problem reported is a "Permission denied"
(EACCESS or EPERM).<br>
Delay and retry will not resolve a permissions error.<br>
<br>
On 10/4/18, 12:30 PM, Chris Plummer wrote:
<blockquote
cite="mid:7115147a-2aae-d168-2db5-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Didn't the retry after 100ms
delay work? If yes, why would it if the problem is that
a java_pid was not cleaned up?<br>
<br>
Chris<br>
<br>
On 10/4/18 8:54 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
First, let me retract the proposed change,<br>
it is not the right solution to the problem originally<br>
reported.<br>
<br>
Second, as a bit of explanation consider the code
fragments below.<br>
<br>
The high level processing calls openDoor which is
willing to retry <br>
the operation as long as the error is flagged
specifically<br>
as a FileNotFoundException.<br>
<br>
  VirtualMachineImpl.java:72<br>
  VirtualMachineImpl.c:81<br>
<br>
During my testing I had added a check
VirtualMachineImpl.java:214<br>
and when an IOException was detected made a call to
checkPermissions<br>
to get more detailed information about the IOException.
The error <br>
I saw was an ENOENT from the stat call. And not the
detailed checks for<br>
specific permissions issues (VirtualMachineImpl.c:143)<br>
<br>
    VirtualMachineImpl.c:118<br>
    VirtualMachineImpl.c:147<br>
<br>
What I missed in the original proposed solution was a
FileNotFoundException<br>
extends IOException. That means my delay and retry just
duplicates the higher<br>
level retry around the openDoor call.<br>
<br>
Third, the original error message logged in the bug
report :<br>
<br>
<span style="caret-color: rgb(51, 51, 51); color:
rgb(51, 51, 51); font-family: Arial, sans-serif;
font-size: 14.000000953674316px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; orphans: auto; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color:
rgb(255, 255, 255); text-decoration: none; display:
inline !important; float: none;">java.io.IOException:
Permission denied<span class="Apple-converted-space"> </span></span><br
style="caret-color: rgb(51, 51, 51); color: rgb(51,
51, 51); font-family: Arial, sans-serif; font-size:
14.000000953674316px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; orphans: auto; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; text-decoration:
none;">
<span style="caret-color: rgb(51, 51, 51); color:
rgb(51, 51, 51); font-family: Arial, sans-serif;
font-size: 14.000000953674316px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; orphans: auto; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color:
rgb(255, 255, 255); text-decoration: none; display:
inline !important; float: none;">at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method)<span class="Apple-converted-space"> </span></span><br>
<br>
had to have come from<br>
<br>
  VirtualMachineImpl.c:70<br>
  VirtualMachineImpl.c:84<br>
<br>
which means the actual open call reported the file does
exist<br>
but the permissions do not allow the file to be
accessed.<br>
That also means the normal mechanism of removing
leftover <br>
java_pid files would not have cleaned up another user's<br>
java_pid files.<br>
<br>
=====<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:<br>
...<br>
    67            // Opens the door file to the target
VM. If the file is not<br>
    68            // found it might mean that the attach
mechanism isn't started in the<br>
    69            // target VM so we attempt to start it
and retry.<br>
    70            try {<br>
    71                fd = openDoor(pid);<br>
    72            } catch (FileNotFoundException fnf1) {<br>
    73                File f = createAttachFile(pid);<br>
    74                try {<br>
    75                    sigquit(pid);<br>
    76    <br>
    77                    // give the target VM time to
start the attach mechanism<br>
    78                    final int delay_step = 100;<br>
    79                    final long timeout =
attachTimeout();<br>
    80                    long time_spend = 0;<br>
    81                    long delay = 0;<br>
    82                    do {<br>
    83                        // Increase timeout on
each attempt to reduce polling<br>
    84                        delay += delay_step;<br>
    85                        try {<br>
    86                            Thread.sleep(delay);<br>
    87                        } catch
(InterruptedException x) { }<br>
    88                        try {<br>
    89                            fd = openDoor(pid);<br>
    90                        } catch
(FileNotFoundException fnf2) {<br>
    91                            // pass<br>
    92                        }<br>
    93    <br>
    94                        time_spend += delay;<br>
    95                        if (time_spend &gt;
timeout/2 &amp;&amp; fd == -1) {<br>
    96                            // Send QUIT again to
give target VM the last chance to react<br>
    97                            sigquit(pid);<br>
    98                        }<br>
    99                    } while (time_spend &lt;=
timeout &amp;&amp; fd == -1);<br>
   100                    if (fd  == -1) {<br>
   101                        throw new
AttachNotSupportedException(<br>
   102                            String.format("Unable
to open door %s: " +<br>
   103                              "target process %d
doesn't respond within %dms " +<br>
   104                              "or HotSpot VM not
loaded", socket_path, pid, time_spend));<br>
   105                    }<br>
...<br>
   212        // The door is attached to
.java_pid&lt;pid&gt; in the temporary directory.<br>
   213        private int openDoor(int pid) throws
IOException {<br>
   214            socket_path = tmpdir + "/.java_pid" +
pid;<br>
   215            fd = open(socket_path);<br>
   216    <br>
   217            // Check that the file
owner/permission to avoid attaching to<br>
   218            // bogus process<br>
   219            try {<br>
   220                checkPermissions(socket_path);<br>
   221            } catch (IOException ioe) {<br>
   222                close(fd);<br>
   223                throw ioe;<br>
   224            }<br>
   225            return fd;<br>
   226        }<br>
<br>
=====<br>
src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:<br>
...<br>
    59    JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open<br>
    60      (JNIEnv *env, jclass cls, jstring path)<br>
    61    {<br>
    62        jboolean isCopy;<br>
    63        const char* p =
GetStringPlatformChars(env, path, &amp;isCopy);<br>
    64        if (p == NULL) {<br>
    65            return 0;<br>
    66        } else {<br>
    67            int fd;<br>
    68            int err = 0;<br>
    69    <br>
    70            fd = open(p, O_RDWR);<br>
    71            if (fd == -1) {<br>
    72                err = errno;<br>
    73            }<br>
    74    <br>
    75            if (isCopy) {<br>
    76               
JNU_ReleaseStringPlatformChars(env, path, p);<br>
    77            }<br>
    78    <br>
    79            if (fd == -1) {<br>
    80                if (err == ENOENT) {<br>
    81                    JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);<br>
    82                } else {<br>
    83                    char* msg =
strdup(strerror(err));<br>
    84                    JNU_ThrowIOException(env,
msg);<br>
    85                    if (msg != NULL) {<br>
    86                        free(msg);<br>
    87                    }<br>
    88                }<br>
    89            }<br>
    90            return fd;<br>
    91        }<br>
    92    }<br>
...<br>
    99    JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions<br>
   100      (JNIEnv *env, jclass cls, jstring path)<br>
   101    {<br>
   102        jboolean isCopy;<br>
   103        const char* p =
GetStringPlatformChars(env, path, &amp;isCopy);<br>
   104        if (p != NULL) {<br>
   105            struct stat64 sb;<br>
   106            uid_t uid, gid;<br>
   107            int res;<br>
   108    <br>
   109            memset(&amp;sb, 0, sizeof(struct
stat64));<br>
   110    <br>
   111            /*<br>
   112             * Check that the path is owned by the
effective uid/gid of this<br>
   113             * process. Also check that
group/other access is not allowed.<br>
   114             */<br>
   115            uid = geteuid();<br>
   116            gid = getegid();<br>
   117    <br>
   118            res = stat64(p, &amp;sb);<br>
   119            if (res != 0) {<br>
   120                /* save errno */<br>
   121                res = errno;<br>
   122            }<br>
   123    <br>
   124            if (res == 0) {<br>
   125                char msg[100];<br>
   126                jboolean isError = JNI_FALSE;<br>
   127                if (sb.st_uid != uid &amp;&amp;
uid != ROOT_UID) {<br>
   128                    snprintf(msg, sizeof(msg),<br>
   129                        "file should be owned by
the current user (which is %d) but is owned by %d", uid,
sb.st_uid);<br>
   130                    isError = JNI_TRUE;<br>
   131                } else if (sb.st_gid != gid
&amp;&amp; uid != ROOT_UID) {<br>
   132                    snprintf(msg, sizeof(msg),<br>
   133                        "file's group should be
the current group (which is %d) but the group is %d",
gid, sb.st_gid);<br>
   134                    isError = JNI_TRUE;<br>
   135                } else if ((sb.st_mode &amp;
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {<br>
   136                    snprintf(msg, sizeof(msg),<br>
   137                        "file should only be
readable and writable by the owner but has 0%03o
access", sb.st_mode &amp; 0777);<br>
   138                    isError = JNI_TRUE;<br>
   139                }<br>
   140                if (isError) {<br>
   141                    char buf[256];<br>
   142                    snprintf(buf, sizeof(buf),
"well-known file %s is not secure: %s", p, msg);<br>
   143                    JNU_ThrowIOException(env,
buf);<br>
   144                }<br>
   145            } else {<br>
   146                char* msg = strdup(strerror(res));<br>
   147                JNU_ThrowIOException(env, msg);<br>
   148                if (msg != NULL) {<br>
   149                    free(msg);<br>
   150                }<br>
   151            }<br>
<br>
On 10/2/18, 6:23 PM, David Holmes wrote:
<blockquote
cite="mid:e1adb28a-031f-5c28-74a5-***@oracle.com"
type="cite">Minor correction: EPERM -&gt; EACCES for
Solaris <br>
<br>
Hard to see how to get a transient EACCES when opening
a file ... though as it is really a door I guess there
could be additional complexity. <br>
<br>
David <br>
<br>
On 3/10/2018 7:54 AM, Chris Plummer wrote: <br>
<blockquote type="cite">On 10/2/18 2:38 PM, David
Holmes wrote: <br>
<blockquote type="cite">Chris, <br>
<br>
On 3/10/2018 6:57 AM, Chris Plummer wrote: <br>
<blockquote type="cite"> <br>
<br>
On 10/2/18 1:44 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The general attach
sequence ... <br>
<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
<br>
<br>
 the attacher creates an attach_pid file in a
directory where the attachee is runnning <br>
 issues a signal to the attacheee <br>
<br>
  loops waiting for the java_pid file to be
created <br>
  default timeout is 10 seconds <br>
<br>
</blockquote>
So getting a FileNotFoundException while in this
loop is OK, but IOException is not. <br>
<br>
<blockquote type="cite">src/hotspot/os/solaris/attachListener_solaris.cpp
<br>
<br>
   attachee creates the java_pid file <br>
   listens til the attacher opens the door <br>
<br>
</blockquote>
I'm don't think this is related, but JDK-8199811
made a fix in attachListener_solaris.cpp to make
it wait up to 10 seconds for initialization to
complete before failing the enqueue. <br>
<br>
<blockquote type="cite">... <br>
Not sure when a bare IOException is thrown
rather than the <br>
more specific FileNotFoundException. <br>
</blockquote>
Where is the IOException originating from? I
wonder if the issue is that the file is in the
process of being created, but is not fully
created yet. Maybe it is there, but
owner/group/permissions have not been set yet,
and this results in an IOException instead of
FileNotFoundException. <br>
</blockquote>
<br>
The exception is shown in the bug report: <br>
<br>
 [java.io.IOException: Permission denied <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method) <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
<br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.&lt;init&gt;(VirtualMachineImpl.java:71)
<br>
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
<br>
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
<br>
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
<br>
<br>
And if you look at the native code the EPERM from
open will cause IOException to be thrown. <br>
<br>
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c <br>
<br>
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open <br>
  (JNIEnv *env, jclass cls, jstring path) <br>
{ <br>
    jboolean isCopy; <br>
    const char* p = GetStringPlatformChars(env,
path, &amp;isCopy); <br>
    if (p == NULL) { <br>
        return 0; <br>
    } else { <br>
        int fd; <br>
        int err = 0; <br>
<br>
        fd = open(p, O_RDWR); <br>
        if (fd == -1) { <br>
            err = errno; <br>
        } <br>
<br>
        if (isCopy) { <br>
            JNU_ReleaseStringPlatformChars(env,
path, p); <br>
        } <br>
<br>
        if (fd == -1) { <br>
            if (err == ENOENT) { <br>
                JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL); <br>
            } else { <br>
                char* msg = strdup(strerror(err));
<br>
                JNU_ThrowIOException(env, msg); <br>
                if (msg != NULL) { <br>
                    free(msg); <br>
                } <br>
<br>
<br>
We should add the path to the exception message. <br>
<br>
</blockquote>
Thanks David. So if EPERM is the error and a retry
100ms later works, I think that supports my
hypothesis that the file is not quite fully created.
So Gary's fix is probably fine. The only other
possible fix I can think of that wouldn't require an
explicit delay (or multiple retries) is probably not
worth the complexity. It would require that the
attachee create two files, and the attacher try to
open the second file first. When it either opens or
returns EPERM, you know the first file can safety be
opened. <br>
<br>
Chris <br>
<blockquote type="cite">David <br>
----- <br>
<br>
<blockquote type="cite">Chris <br>
<blockquote type="cite"> <br>
<br>
<br>
On 10/2/18 4:11 PM, Chris Plummer wrote: <br>
<blockquote type="cite">Can you summarize how
the attach handshaking is suppose to work?
I'm just wondering why the attacher would
ever be looking for the file before the
attachee has created it. It seems a proper
handshake would prevent this. Maybe there's
some sort of visibility issue where the
attachee has indeed created the file, but it
is not immediately visible to the attacher
process. <br>
<br>
Chris <br>
<br>
On 10/2/18 12:27 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The problem
reproduced pretty quickly. <br>
I added a call to checkPermission and
revealed the <br>
"file not found" from the stat call when
the IOException <br>
was detected. <br>
<br>
There has been some flakiness from the
Solaris test machines today, <br>
so I'll continue with the testing a bit
longer. <br>
<br>
On 10/2/18 3:12 PM, Chris Plummer wrote: <br>
<blockquote type="cite">Without the fix
was this issue easy enough to reproduce
that you can be sure this is resolving
it? <br>
<br>
Chris <br>
<br>
On 10/2/18 8:16 AM, Gary Adams wrote: <br>
<blockquote type="cite">Solaris debug
builds are failing tests that use the
attach interface. <br>
An IOException is reported when the
java_pid file is not opened. <br>
<br>
It appears that the attempt to attach
is taking place too quickly. <br>
This workaround will allow the open
operation to be retried <br>
after a short pause. <br>
<br>
  Webrev: <a
class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Egadams/8210337/webrev/"
moz-do-not-send="true">http://cr.openjdk.java.net/~gadams/8210337/webrev/</a>
<br>
  Issue: <a
class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8210337"
moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8210337</a>
<br>
<br>
Testing is in progress. <br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>
g***@oracle.com
2018-11-08 03:43:52 UTC
Permalink
There is recovery code that cleans up old attach files,
but a permission denied error would prevent the clean
up operation from taking place. We're looking at a case of
a file owned by another user as well as a pid recycling
and failed clean up situation.
Are you saying that the PID was recycled, so the issue is a test run
from long ago leaving behind the file? If so, I thought there was
something in the attach code that cleaned up these PID files that were
left behind. In general I would not count this as an infra issue. PID
files left behind are the fault of the JVM. Maybe there is something
wrong with the cleanup code on solaris, or maybe we don't clean up on
any platform, but cycle through PIDs a lot faster on solaris.
Chris
Post by Gary Adams
If there are no further suggestions on JDK-8210337,
I plan to close it out as cannot reproduce.
Similar bugs had been filed for the "Permission denied" error
from the openDoor request failure and each was attributed
to an infrastructure issue. e.g. another user with the same
pid left a temporary file that is blocking the current test
from attaching correctly.
Post by Gary Adams
My delay and retry did not fix the problem with permission denied.
When I was diagnosing the problem I instrumented the code
to catch an IOException and call checkPermission to get
more detail about the IOException. The error reported
from calling checkPermission was ENOENT (stat).
The code change I then proposed was catch the IOException,
delay, and retry the open. That fixed the problem of
ENOENT, but had nothing to do with "permission denied".
But I also thought you said the delay and retry fixed the problem.
How could fix the problem if it is just duplicating something that
is already in place?
Chris
Post by Gary Adams
My delay and retry just duplicated the openDoor retry.
The normal processing of FileNotFoundException(ENOENT) is to retry
several times until the file is available.
But the original problem reported is a "Permission denied"
(EACCESS or EPERM).
Delay and retry will not resolve a permissions error.
Didn't the retry after 100ms delay work? If yes, why would it if
the problem is that a java_pid was not cleaned up?
Chris
Post by Gary Adams
First, let me retract the proposed change,
it is not the right solution to the problem originally
reported.
Second, as a bit of explanation consider the code fragments below.
The high level processing calls openDoor which is willing to retry
the operation as long as the error is flagged specifically
as a FileNotFoundException.
  VirtualMachineImpl.java:72
  VirtualMachineImpl.c:81
During my testing I had added a check VirtualMachineImpl.java:214
and when an IOException was detected made a call to checkPermissions
to get more detailed information about the IOException. The error
I saw was an ENOENT from the stat call. And not the detailed checks for
specific permissions issues (VirtualMachineImpl.c:143)
    VirtualMachineImpl.c:118
    VirtualMachineImpl.c:147
What I missed in the original proposed solution was a
FileNotFoundException
extends IOException. That means my delay and retry just
duplicates the higher
level retry around the openDoor call.
java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
had to have come from
  VirtualMachineImpl.c:70
  VirtualMachineImpl.c:84
which means the actual open call reported the file does exist
but the permissions do not allow the file to be accessed.
That also means the normal mechanism of removing leftover
java_pid files would not have cleaned up another user's
java_pid files.
=====
...
    67            // Opens the door file to the target VM. If
the file is not
    68            // found it might mean that the attach
mechanism isn't started in the
    69            // target VM so we attempt to start it and retry.
    70            try {
    71                fd = openDoor(pid);
    72            } catch (FileNotFoundException fnf1) {
    73                File f = createAttachFile(pid);
    74                try {
    75                    sigquit(pid);
    76
    77                    // give the target VM time to start
the attach mechanism
    78                    final int delay_step = 100;
    79                    final long timeout = attachTimeout();
    80                    long time_spend = 0;
    81                    long delay = 0;
    82                    do {
    83                        // Increase timeout on each
attempt to reduce polling
    84                        delay += delay_step;
    85                        try {
    86                            Thread.sleep(delay);
    87                        } catch (InterruptedException x) { }
    88                        try {
    89                            fd = openDoor(pid);
    90                        } catch (FileNotFoundException fnf2) {
    91                            // pass
    92                        }
    93
    94                        time_spend += delay;
    95                        if (time_spend > timeout/2 && fd
== -1) {
    96                            // Send QUIT again to give
target VM the last chance to react
    97                            sigquit(pid);
    98                        }
    99                    } while (time_spend <= timeout && fd
== -1);
   100                    if (fd  == -1) {
   101                        throw new AttachNotSupportedException(
   102 String.format("Unable to open door %s: " +
   103                              "target process %d doesn't
respond within %dms " +
   104                              "or HotSpot VM not loaded",
socket_path, pid, time_spend));
   105                    }
...
   212        // The door is attached to .java_pid<pid> in the
temporary directory.
   213        private int openDoor(int pid) throws IOException {
   214            socket_path = tmpdir + "/.java_pid" + pid;
   215            fd = open(socket_path);
   216
   217            // Check that the file owner/permission to
avoid attaching to
   218            // bogus process
   219            try {
   220                checkPermissions(socket_path);
   221            } catch (IOException ioe) {
   222                close(fd);
   223                throw ioe;
   224            }
   225            return fd;
   226        }
=====
...
    59    JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
    60      (JNIEnv *env, jclass cls, jstring path)
    61    {
    62        jboolean isCopy;
    63        const char* p = GetStringPlatformChars(env, path,
&isCopy);
    64        if (p == NULL) {
    65            return 0;
    66        } else {
    67            int fd;
    68            int err = 0;
    69
    70            fd = open(p, O_RDWR);
    71            if (fd == -1) {
    72                err = errno;
    73            }
    74
    75            if (isCopy) {
    76 JNU_ReleaseStringPlatformChars(env, path, p);
    77            }
    78
    79            if (fd == -1) {
    80                if (err == ENOENT) {
    81                    JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
    82                } else {
    83                    char* msg = strdup(strerror(err));
    84                    JNU_ThrowIOException(env, msg);
    85                    if (msg != NULL) {
    86                        free(msg);
    87                    }
    88                }
    89            }
    90            return fd;
    91        }
    92    }
...
    99    JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
   100      (JNIEnv *env, jclass cls, jstring path)
   101    {
   102        jboolean isCopy;
   103        const char* p = GetStringPlatformChars(env, path,
&isCopy);
   104        if (p != NULL) {
   105            struct stat64 sb;
   106            uid_t uid, gid;
   107            int res;
   108
   109            memset(&sb, 0, sizeof(struct stat64));
   110
   111            /*
   112             * Check that the path is owned by the
effective uid/gid of this
   113             * process. Also check that group/other access
is not allowed.
   114             */
   115            uid = geteuid();
   116            gid = getegid();
   117
   118            res = stat64(p, &sb);
   119            if (res != 0) {
   120                /* save errno */
   121                res = errno;
   122            }
   123
   124            if (res == 0) {
   125                char msg[100];
   126                jboolean isError = JNI_FALSE;
   127                if (sb.st_uid != uid && uid != ROOT_UID) {
   128                    snprintf(msg, sizeof(msg),
   129                        "file should be owned by the
current user (which is %d) but is owned by %d", uid, sb.st_uid);
   130                    isError = JNI_TRUE;
   131                } else if (sb.st_gid != gid && uid !=
ROOT_UID) {
   132                    snprintf(msg, sizeof(msg),
   133                        "file's group should be the
current group (which is %d) but the group is %d", gid, sb.st_gid);
   134                    isError = JNI_TRUE;
   135                } else if ((sb.st_mode &
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
   136                    snprintf(msg, sizeof(msg),
   137                        "file should only be readable and
writable by the owner but has 0%03o access", sb.st_mode & 0777);
   138                    isError = JNI_TRUE;
   139                }
   140                if (isError) {
   141                    char buf[256];
   142                    snprintf(buf, sizeof(buf), "well-known
file %s is not secure: %s", p, msg);
   143                    JNU_ThrowIOException(env, buf);
   144                }
   145            } else {
   146                char* msg = strdup(strerror(res));
   147                JNU_ThrowIOException(env, msg);
   148                if (msg != NULL) {
   149                    free(msg);
   150                }
   151            }
Post by David Holmes
Minor correction: EPERM -> EACCES for Solaris
Hard to see how to get a transient EACCES when opening a file
... though as it is really a door I guess there could be
additional complexity.
David
Post by Chris Plummer
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
 the attacher creates an attach_pid file in a directory
where the attachee is runnning
 issues a signal to the attacheee
  loops waiting for the java_pid file to be created
  default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is OK,
but IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
   attachee creates the java_pid file
   listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a fix
in attachListener_solaris.cpp to make it wait up to 10
seconds for initialization to complete before failing the
enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the
issue is that the file is in the process of being created,
but is not fully created yet. Maybe it is there, but
owner/group/permissions have not been set yet, and this
results in an IOException instead of FileNotFoundException.
 [java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will
cause IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
  (JNIEnv *env, jclass cls, jstring path)
{
    jboolean isCopy;
    const char* p = GetStringPlatformChars(env, path, &isCopy);
    if (p == NULL) {
        return 0;
    } else {
        int fd;
        int err = 0;
        fd = open(p, O_RDWR);
        if (fd == -1) {
            err = errno;
        }
        if (isCopy) {
            JNU_ReleaseStringPlatformChars(env, path, p);
        }
        if (fd == -1) {
            if (err == ENOENT) {
                JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
            } else {
                char* msg = strdup(strerror(err));
                JNU_ThrowIOException(env, msg);
                if (msg != NULL) {
                    free(msg);
                }
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms later
works, I think that supports my hypothesis that the file is
not quite fully created. So Gary's fix is probably fine. The
only other possible fix I can think of that wouldn't require
an explicit delay (or multiple retries) is probably not worth
the complexity. It would require that the attachee create two
files, and the attacher try to open the second file first.
When it either opens or returns EPERM, you know the first file
can safety be opened.
Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose to
work? I'm just wondering why the attacher would ever be
looking for the file before the attachee has created it.
It seems a proper handshake would prevent this. Maybe
there's some sort of visibility issue where the attachee
has indeed created the file, but it is not immediately
visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test
machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to reproduce
that you can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the
attach interface.
An IOException is reported when the java_pid file is
not opened.
It appears that the attempt to attach is taking place
too quickly.
This workaround will allow the open operation to be retried
after a short pause.
http://cr.openjdk.java.net/~gadams/8210337/webrev/
  Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-11-08 17:42:06 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Ok. Any idea what might have led to
that PID to be created by another user? Isn't all our testing done
using the same userid? Also, why is this only a solaris problem?<br>
<br>
Chris<br>
<br>
On 11/7/18 7:43 PM, <a class="moz-txt-link-abbreviated" href="mailto:***@oracle.com">***@oracle.com</a> wrote:<br>
</div>
<blockquote type="cite"
cite="mid:cc1cdb3c-6730-363d-c9b0-***@oracle.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div class="moz-cite-prefix">There is recovery code that cleans up
old attach files,<br>
but a permission denied error would prevent the clean <br>
up operation from taking place. We're looking at a case of <br>
a file owned by another user as well as a pid recycling<br>
and failed clean up situation.<br>
<br>
On 11/7/18 3:09 PM, Chris Plummer wrote:<br>
</div>
<blockquote type="cite"
cite="mid:4822c63c-3078-94f4-a754-***@oracle.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Are you saying that the PID was
recycled, so the issue is a test run from long ago leaving
behind the file? If so, I thought there was something in the
attach code that cleaned up these PID files that were left
behind. In general I would not count this as an infra issue.
PID files left behind are the fault of the JVM. Maybe there is
something wrong with the cleanup code on solaris, or maybe we
don't clean up on any platform, but cycle through PIDs a lot
faster on solaris.<br>
<br>
Chris<br>
<br>
On 11/7/18 11:20 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
If there are no further suggestions on JDK-8210337,<br>
I plan to close it out as cannot reproduce.<br>
<br>
Similar bugs had been filed for the "Permission denied" error<br>
from the openDoor request failure and each was attributed <br>
to an infrastructure issue. e.g. another user with the same <br>
pid left a temporary file that is blocking the current test <br>
from attaching correctly.<br>
<br>
On 10/4/18, 1:49 PM, Gary Adams wrote:
<blockquote cite="mid:***@oracle.com" type="cite">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry did not fix the problem with permission
denied.<br>
<br>
When I was diagnosing the problem I instrumented the code <br>
to catch an IOException and call checkPermission to get <br>
more detail about the IOException. The error reported<br>
from calling checkPermission was ENOENT (stat).<br>
<br>
The code change I then proposed was catch the IOException,<br>
delay, and retry the open. That fixed the problem of <br>
ENOENT, but had nothing to do with "permission denied".<br>
<br>
On 10/4/18, 1:25 PM, Chris Plummer wrote:
<blockquote
cite="mid:339eed9a-21e8-2a19-7a38-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">But I also thought you said
the delay and retry fixed the problem. How could fix the
problem if it is just duplicating something that is
already in place?<br>
<br>
Chris<br>
<br>
On 10/4/18 9:48 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry just duplicated the openDoor retry.<br>
The normal processing of FileNotFoundException(ENOENT)
is to retry<br>
several times until the file is available.<br>
<br>
But the original problem reported is a "Permission
denied" (EACCESS or EPERM).<br>
Delay and retry will not resolve a permissions error.<br>
<br>
On 10/4/18, 12:30 PM, Chris Plummer wrote:
<blockquote
cite="mid:7115147a-2aae-d168-2db5-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Didn't the retry after
100ms delay work? If yes, why would it if the
problem is that a java_pid was not cleaned up?<br>
<br>
Chris<br>
<br>
On 10/4/18 8:54 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
First, let me retract the proposed change,<br>
it is not the right solution to the problem
originally<br>
reported.<br>
<br>
Second, as a bit of explanation consider the code
fragments below.<br>
<br>
The high level processing calls openDoor which is
willing to retry <br>
the operation as long as the error is flagged
specifically<br>
as a FileNotFoundException.<br>
<br>
  VirtualMachineImpl.java:72<br>
  VirtualMachineImpl.c:81<br>
<br>
During my testing I had added a check
VirtualMachineImpl.java:214<br>
and when an IOException was detected made a call to
checkPermissions<br>
to get more detailed information about the
IOException. The error <br>
I saw was an ENOENT from the stat call. And not the
detailed checks for<br>
specific permissions issues
(VirtualMachineImpl.c:143)<br>
<br>
    VirtualMachineImpl.c:118<br>
    VirtualMachineImpl.c:147<br>
<br>
What I missed in the original proposed solution was
a FileNotFoundException<br>
extends IOException. That means my delay and retry
just duplicates the higher<br>
level retry around the openDoor call.<br>
<br>
Third, the original error message logged in the bug
report :<br>
<br>
<span style="caret-color: rgb(51, 51, 51); color:
rgb(51, 51, 51); font-family: Arial, sans-serif;
font-size: 14.000000953674316px; font-style:
normal; font-variant-caps: normal; font-weight:
normal; letter-spacing: normal; orphans: auto;
text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows:
auto; word-spacing: 0px; -webkit-text-size-adjust:
auto; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255);
text-decoration: none; display: inline !important;
float: none;">java.io.IOException: Permission
denied<span class="Apple-converted-space"> </span></span><br
style="caret-color: rgb(51, 51, 51); color:
rgb(51, 51, 51); font-family: Arial, sans-serif;
font-size: 14.000000953674316px; font-style:
normal; font-variant-caps: normal; font-weight:
normal; letter-spacing: normal; orphans: auto;
text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows:
auto; word-spacing: 0px; -webkit-text-size-adjust:
auto; -webkit-text-stroke-width: 0px;
text-decoration: none;">
<span style="caret-color: rgb(51, 51, 51); color:
rgb(51, 51, 51); font-family: Arial, sans-serif;
font-size: 14.000000953674316px; font-style:
normal; font-variant-caps: normal; font-weight:
normal; letter-spacing: normal; orphans: auto;
text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows:
auto; word-spacing: 0px; -webkit-text-size-adjust:
auto; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255);
text-decoration: none; display: inline !important;
float: none;">at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method)<span class="Apple-converted-space"> </span></span><br>
<br>
had to have come from<br>
<br>
  VirtualMachineImpl.c:70<br>
  VirtualMachineImpl.c:84<br>
<br>
which means the actual open call reported the file
does exist<br>
but the permissions do not allow the file to be
accessed.<br>
That also means the normal mechanism of removing
leftover <br>
java_pid files would not have cleaned up another
user's<br>
java_pid files.<br>
<br>
=====<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:<br>
...<br>
    67            // Opens the door file to the
target VM. If the file is not<br>
    68            // found it might mean that the
attach mechanism isn't started in the<br>
    69            // target VM so we attempt to
start it and retry.<br>
    70            try {<br>
    71                fd = openDoor(pid);<br>
    72            } catch (FileNotFoundException
fnf1) {<br>
    73                File f =
createAttachFile(pid);<br>
    74                try {<br>
    75                    sigquit(pid);<br>
    76    <br>
    77                    // give the target VM time
to start the attach mechanism<br>
    78                    final int delay_step =
100;<br>
    79                    final long timeout =
attachTimeout();<br>
    80                    long time_spend = 0;<br>
    81                    long delay = 0;<br>
    82                    do {<br>
    83                        // Increase timeout on
each attempt to reduce polling<br>
    84                        delay += delay_step;<br>
    85                        try {<br>
    86                           
Thread.sleep(delay);<br>
    87                        } catch
(InterruptedException x) { }<br>
    88                        try {<br>
    89                            fd =
openDoor(pid);<br>
    90                        } catch
(FileNotFoundException fnf2) {<br>
    91                            // pass<br>
    92                        }<br>
    93    <br>
    94                        time_spend += delay;<br>
    95                        if (time_spend &gt;
timeout/2 &amp;&amp; fd == -1) {<br>
    96                            // Send QUIT again
to give target VM the last chance to react<br>
    97                            sigquit(pid);<br>
    98                        }<br>
    99                    } while (time_spend &lt;=
timeout &amp;&amp; fd == -1);<br>
   100                    if (fd  == -1) {<br>
   101                        throw new
AttachNotSupportedException(<br>
   102                           
String.format("Unable to open door %s: " +<br>
   103                              "target process
%d doesn't respond within %dms " +<br>
   104                              "or HotSpot VM
not loaded", socket_path, pid, time_spend));<br>
   105                    }<br>
...<br>
   212        // The door is attached to
.java_pid&lt;pid&gt; in the temporary directory.<br>
   213        private int openDoor(int pid) throws
IOException {<br>
   214            socket_path = tmpdir +
"/.java_pid" + pid;<br>
   215            fd = open(socket_path);<br>
   216    <br>
   217            // Check that the file
owner/permission to avoid attaching to<br>
   218            // bogus process<br>
   219            try {<br>
   220                checkPermissions(socket_path);<br>
   221            } catch (IOException ioe) {<br>
   222                close(fd);<br>
   223                throw ioe;<br>
   224            }<br>
   225            return fd;<br>
   226        }<br>
<br>
=====<br>
src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:<br>
...<br>
    59    JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open<br>
    60      (JNIEnv *env, jclass cls, jstring path)<br>
    61    {<br>
    62        jboolean isCopy;<br>
    63        const char* p =
GetStringPlatformChars(env, path, &amp;isCopy);<br>
    64        if (p == NULL) {<br>
    65            return 0;<br>
    66        } else {<br>
    67            int fd;<br>
    68            int err = 0;<br>
    69    <br>
    70            fd = open(p, O_RDWR);<br>
    71            if (fd == -1) {<br>
    72                err = errno;<br>
    73            }<br>
    74    <br>
    75            if (isCopy) {<br>
    76               
JNU_ReleaseStringPlatformChars(env, path, p);<br>
    77            }<br>
    78    <br>
    79            if (fd == -1) {<br>
    80                if (err == ENOENT) {<br>
    81                    JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);<br>
    82                } else {<br>
    83                    char* msg =
strdup(strerror(err));<br>
    84                    JNU_ThrowIOException(env,
msg);<br>
    85                    if (msg != NULL) {<br>
    86                        free(msg);<br>
    87                    }<br>
    88                }<br>
    89            }<br>
    90            return fd;<br>
    91        }<br>
    92    }<br>
...<br>
    99    JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions<br>
   100      (JNIEnv *env, jclass cls, jstring path)<br>
   101    {<br>
   102        jboolean isCopy;<br>
   103        const char* p =
GetStringPlatformChars(env, path, &amp;isCopy);<br>
   104        if (p != NULL) {<br>
   105            struct stat64 sb;<br>
   106            uid_t uid, gid;<br>
   107            int res;<br>
   108    <br>
   109            memset(&amp;sb, 0, sizeof(struct
stat64));<br>
   110    <br>
   111            /*<br>
   112             * Check that the path is owned by
the effective uid/gid of this<br>
   113             * process. Also check that
group/other access is not allowed.<br>
   114             */<br>
   115            uid = geteuid();<br>
   116            gid = getegid();<br>
   117    <br>
   118            res = stat64(p, &amp;sb);<br>
   119            if (res != 0) {<br>
   120                /* save errno */<br>
   121                res = errno;<br>
   122            }<br>
   123    <br>
   124            if (res == 0) {<br>
   125                char msg[100];<br>
   126                jboolean isError = JNI_FALSE;<br>
   127                if (sb.st_uid != uid
&amp;&amp; uid != ROOT_UID) {<br>
   128                    snprintf(msg, sizeof(msg),<br>
   129                        "file should be owned
by the current user (which is %d) but is owned by
%d", uid, sb.st_uid);<br>
   130                    isError = JNI_TRUE;<br>
   131                } else if (sb.st_gid != gid
&amp;&amp; uid != ROOT_UID) {<br>
   132                    snprintf(msg, sizeof(msg),<br>
   133                        "file's group should
be the current group (which is %d) but the group is
%d", gid, sb.st_gid);<br>
   134                    isError = JNI_TRUE;<br>
   135                } else if ((sb.st_mode &amp;
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {<br>
   136                    snprintf(msg, sizeof(msg),<br>
   137                        "file should only be
readable and writable by the owner but has 0%03o
access", sb.st_mode &amp; 0777);<br>
   138                    isError = JNI_TRUE;<br>
   139                }<br>
   140                if (isError) {<br>
   141                    char buf[256];<br>
   142                    snprintf(buf, sizeof(buf),
"well-known file %s is not secure: %s", p, msg);<br>
   143                    JNU_ThrowIOException(env,
buf);<br>
   144                }<br>
   145            } else {<br>
   146                char* msg =
strdup(strerror(res));<br>
   147                JNU_ThrowIOException(env,
msg);<br>
   148                if (msg != NULL) {<br>
   149                    free(msg);<br>
   150                }<br>
   151            }<br>
<br>
On 10/2/18, 6:23 PM, David Holmes wrote:
<blockquote
cite="mid:e1adb28a-031f-5c28-74a5-***@oracle.com"
type="cite">Minor correction: EPERM -&gt; EACCES
for Solaris <br>
<br>
Hard to see how to get a transient EACCES when
opening a file ... though as it is really a door I
guess there could be additional complexity. <br>
<br>
David <br>
<br>
On 3/10/2018 7:54 AM, Chris Plummer wrote: <br>
<blockquote type="cite">On 10/2/18 2:38 PM, David
Holmes wrote: <br>
<blockquote type="cite">Chris, <br>
<br>
On 3/10/2018 6:57 AM, Chris Plummer wrote: <br>
<blockquote type="cite"> <br>
<br>
On 10/2/18 1:44 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The general attach
sequence ... <br>
<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
<br>
<br>
 the attacher creates an attach_pid file
in a directory where the attachee is
runnning <br>
 issues a signal to the attacheee <br>
<br>
  loops waiting for the java_pid file to
be created <br>
  default timeout is 10 seconds <br>
<br>
</blockquote>
So getting a FileNotFoundException while in
this loop is OK, but IOException is not. <br>
<br>
<blockquote type="cite">src/hotspot/os/solaris/attachListener_solaris.cpp
<br>
<br>
   attachee creates the java_pid file <br>
   listens til the attacher opens the door
<br>
<br>
</blockquote>
I'm don't think this is related, but
JDK-8199811 made a fix in
attachListener_solaris.cpp to make it wait
up to 10 seconds for initialization to
complete before failing the enqueue. <br>
<br>
<blockquote type="cite">... <br>
Not sure when a bare IOException is thrown
rather than the <br>
more specific FileNotFoundException. <br>
</blockquote>
Where is the IOException originating from? I
wonder if the issue is that the file is in
the process of being created, but is not
fully created yet. Maybe it is there, but
owner/group/permissions have not been set
yet, and this results in an IOException
instead of FileNotFoundException. <br>
</blockquote>
<br>
The exception is shown in the bug report: <br>
<br>
 [java.io.IOException: Permission denied <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method) <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
<br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.&lt;init&gt;(VirtualMachineImpl.java:71)
<br>
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
<br>
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
<br>
<br>
And if you look at the native code the EPERM
from open will cause IOException to be thrown.
<br>
<br>
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c <br>
<br>
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
<br>
  (JNIEnv *env, jclass cls, jstring path) <br>
{ <br>
    jboolean isCopy; <br>
    const char* p =
GetStringPlatformChars(env, path,
&amp;isCopy); <br>
    if (p == NULL) { <br>
        return 0; <br>
    } else { <br>
        int fd; <br>
        int err = 0; <br>
<br>
        fd = open(p, O_RDWR); <br>
        if (fd == -1) { <br>
            err = errno; <br>
        } <br>
<br>
        if (isCopy) { <br>
           
JNU_ReleaseStringPlatformChars(env, path, p);
<br>
        } <br>
<br>
        if (fd == -1) { <br>
            if (err == ENOENT) { <br>
                JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL); <br>
            } else { <br>
                char* msg =
strdup(strerror(err)); <br>
                JNU_ThrowIOException(env,
msg); <br>
                if (msg != NULL) { <br>
                    free(msg); <br>
                } <br>
<br>
<br>
We should add the path to the exception
message. <br>
<br>
</blockquote>
Thanks David. So if EPERM is the error and a
retry 100ms later works, I think that supports
my hypothesis that the file is not quite fully
created. So Gary's fix is probably fine. The
only other possible fix I can think of that
wouldn't require an explicit delay (or multiple
retries) is probably not worth the complexity.
It would require that the attachee create two
files, and the attacher try to open the second
file first. When it either opens or returns
EPERM, you know the first file can safety be
opened. <br>
<br>
Chris <br>
<blockquote type="cite">David <br>
----- <br>
<br>
<blockquote type="cite">Chris <br>
<blockquote type="cite"> <br>
<br>
<br>
On 10/2/18 4:11 PM, Chris Plummer wrote: <br>
<blockquote type="cite">Can you summarize
how the attach handshaking is suppose to
work? I'm just wondering why the
attacher would ever be looking for the
file before the attachee has created it.
It seems a proper handshake would
prevent this. Maybe there's some sort of
visibility issue where the attachee has
indeed created the file, but it is not
immediately visible to the attacher
process. <br>
<br>
Chris <br>
<br>
On 10/2/18 12:27 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The problem
reproduced pretty quickly. <br>
I added a call to checkPermission and
revealed the <br>
"file not found" from the stat call
when the IOException <br>
was detected. <br>
<br>
There has been some flakiness from the
Solaris test machines today, <br>
so I'll continue with the testing a
bit longer. <br>
<br>
On 10/2/18 3:12 PM, Chris Plummer
wrote: <br>
<blockquote type="cite">Without the
fix was this issue easy enough to
reproduce that you can be sure this
is resolving it? <br>
<br>
Chris <br>
<br>
On 10/2/18 8:16 AM, Gary Adams
wrote: <br>
<blockquote type="cite">Solaris
debug builds are failing tests
that use the attach interface. <br>
An IOException is reported when
the java_pid file is not opened. <br>
<br>
It appears that the attempt to
attach is taking place too
quickly. <br>
This workaround will allow the
open operation to be retried <br>
after a short pause. <br>
<br>
  Webrev: <a
class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Egadams/8210337/webrev/"
moz-do-not-send="true">http://cr.openjdk.java.net/~gadams/8210337/webrev/</a>
<br>
  Issue: <a
class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8210337"
moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8210337</a>
<br>
<br>
Testing is in progress. <br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<p><br>
</p>
</blockquote>
<p><br>
</p>
</body>
</html>
Gary Adams
2018-11-08 18:14:36 UTC
Permalink
Different ipc mechanisms are used on each target platform
solaris - doors
linux - unix domain sockets
windows - named pipes

Solaris machines have a tendency to run for years without a need to reboot.

I have not examined the actual test machine where the original problem
was reported, but I was informed once about hundreds of leftover temp files
on another of the test machines.

The files could have accumulated over time from jprt or mach5 infrastructure
transitions. The collision on pid reuse alone would make this an extremely
rare event.

If the "Permission denied" issue appears again, we should get a dump of
the temp directory from the failed machine.
Ok. Any idea what might have led to that PID to be created by another
user? Isn't all our testing done using the same userid? Also, why is
this only a solaris problem?
Chris
Post by g***@oracle.com
There is recovery code that cleans up old attach files,
but a permission denied error would prevent the clean
up operation from taking place. We're looking at a case of
a file owned by another user as well as a pid recycling
and failed clean up situation.
Are you saying that the PID was recycled, so the issue is a test run
from long ago leaving behind the file? If so, I thought there was
something in the attach code that cleaned up these PID files that
were left behind. In general I would not count this as an infra
issue. PID files left behind are the fault of the JVM. Maybe there
is something wrong with the cleanup code on solaris, or maybe we
don't clean up on any platform, but cycle through PIDs a lot faster
on solaris.
Chris
Post by Gary Adams
If there are no further suggestions on JDK-8210337,
I plan to close it out as cannot reproduce.
Similar bugs had been filed for the "Permission denied" error
from the openDoor request failure and each was attributed
to an infrastructure issue. e.g. another user with the same
pid left a temporary file that is blocking the current test
from attaching correctly.
Post by Gary Adams
My delay and retry did not fix the problem with permission denied.
When I was diagnosing the problem I instrumented the code
to catch an IOException and call checkPermission to get
more detail about the IOException. The error reported
from calling checkPermission was ENOENT (stat).
The code change I then proposed was catch the IOException,
delay, and retry the open. That fixed the problem of
ENOENT, but had nothing to do with "permission denied".
But I also thought you said the delay and retry fixed the
problem. How could fix the problem if it is just duplicating
something that is already in place?
Chris
Post by Gary Adams
My delay and retry just duplicated the openDoor retry.
The normal processing of FileNotFoundException(ENOENT) is to retry
several times until the file is available.
But the original problem reported is a "Permission denied"
(EACCESS or EPERM).
Delay and retry will not resolve a permissions error.
Didn't the retry after 100ms delay work? If yes, why would it
if the problem is that a java_pid was not cleaned up?
Chris
Post by Gary Adams
First, let me retract the proposed change,
it is not the right solution to the problem originally
reported.
Second, as a bit of explanation consider the code fragments below.
The high level processing calls openDoor which is willing to retry
the operation as long as the error is flagged specifically
as a FileNotFoundException.
VirtualMachineImpl.java:72
VirtualMachineImpl.c:81
During my testing I had added a check VirtualMachineImpl.java:214
and when an IOException was detected made a call to
checkPermissions
to get more detailed information about the IOException. The error
I saw was an ENOENT from the stat call. And not the detailed checks for
specific permissions issues (VirtualMachineImpl.c:143)
VirtualMachineImpl.c:118
VirtualMachineImpl.c:147
What I missed in the original proposed solution was a
FileNotFoundException
extends IOException. That means my delay and retry just
duplicates the higher
level retry around the openDoor call.
java.io.IOException: Permission denied
at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
had to have come from
VirtualMachineImpl.c:70
VirtualMachineImpl.c:84
which means the actual open call reported the file does exist
but the permissions do not allow the file to be accessed.
That also means the normal mechanism of removing leftover
java_pid files would not have cleaned up another user's
java_pid files.
=====
...
67 // Opens the door file to the target VM. If
the file is not
68 // found it might mean that the attach
mechanism isn't started in the
69 // target VM so we attempt to start it and retry.
70 try {
71 fd = openDoor(pid);
72 } catch (FileNotFoundException fnf1) {
73 File f = createAttachFile(pid);
74 try {
75 sigquit(pid);
76
77 // give the target VM time to start
the attach mechanism
78 final int delay_step = 100;
79 final long timeout = attachTimeout();
80 long time_spend = 0;
81 long delay = 0;
82 do {
83 // Increase timeout on each
attempt to reduce polling
84 delay += delay_step;
85 try {
86 Thread.sleep(delay);
87 } catch (InterruptedException x) { }
88 try {
89 fd = openDoor(pid);
90 } catch (FileNotFoundException fnf2) {
91 // pass
92 }
93
94 time_spend += delay;
95 if (time_spend > timeout/2 && fd == -1) {
96 // Send QUIT again to give
target VM the last chance to react
97 sigquit(pid);
98 }
99 } while (time_spend <= timeout && fd == -1);
100 if (fd == -1) {
101 throw new
AttachNotSupportedException(
102 String.format("Unable to
open door %s: " +
103 "target process %d doesn't
respond within %dms " +
104 "or HotSpot VM not
loaded", socket_path, pid, time_spend));
105 }
...
212 // The door is attached to .java_pid<pid> in the
temporary directory.
213 private int openDoor(int pid) throws IOException {
214 socket_path = tmpdir + "/.java_pid" + pid;
215 fd = open(socket_path);
216
217 // Check that the file owner/permission to
avoid attaching to
218 // bogus process
219 try {
220 checkPermissions(socket_path);
221 } catch (IOException ioe) {
222 close(fd);
223 throw ioe;
224 }
225 return fd;
226 }
=====
...
59 JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
60 (JNIEnv *env, jclass cls, jstring path)
61 {
62 jboolean isCopy;
63 const char* p = GetStringPlatformChars(env, path, &isCopy);
64 if (p == NULL) {
65 return 0;
66 } else {
67 int fd;
68 int err = 0;
69
70 fd = open(p, O_RDWR);
71 if (fd == -1) {
72 err = errno;
73 }
74
75 if (isCopy) {
76 JNU_ReleaseStringPlatformChars(env, path, p);
77 }
78
79 if (fd == -1) {
80 if (err == ENOENT) {
81 JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
82 } else {
83 char* msg = strdup(strerror(err));
84 JNU_ThrowIOException(env, msg);
85 if (msg != NULL) {
86 free(msg);
87 }
88 }
89 }
90 return fd;
91 }
92 }
...
99 JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
100 (JNIEnv *env, jclass cls, jstring path)
101 {
102 jboolean isCopy;
103 const char* p = GetStringPlatformChars(env, path, &isCopy);
104 if (p != NULL) {
105 struct stat64 sb;
106 uid_t uid, gid;
107 int res;
108
109 memset(&sb, 0, sizeof(struct stat64));
110
111 /*
112 * Check that the path is owned by the
effective uid/gid of this
113 * process. Also check that group/other
access is not allowed.
114 */
115 uid = geteuid();
116 gid = getegid();
117
118 res = stat64(p, &sb);
119 if (res != 0) {
120 /* save errno */
121 res = errno;
122 }
123
124 if (res == 0) {
125 char msg[100];
126 jboolean isError = JNI_FALSE;
127 if (sb.st_uid != uid && uid != ROOT_UID) {
128 snprintf(msg, sizeof(msg),
129 "file should be owned by the
current user (which is %d) but is owned by %d", uid, sb.st_uid);
130 isError = JNI_TRUE;
131 } else if (sb.st_gid != gid && uid != ROOT_UID) {
132 snprintf(msg, sizeof(msg),
133 "file's group should be the
current group (which is %d) but the group is %d", gid, sb.st_gid);
134 isError = JNI_TRUE;
135 } else if ((sb.st_mode &
(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
136 snprintf(msg, sizeof(msg),
137 "file should only be readable
and writable by the owner but has 0%03o access", sb.st_mode &
0777);
138 isError = JNI_TRUE;
139 }
140 if (isError) {
141 char buf[256];
142 snprintf(buf, sizeof(buf),
"well-known file %s is not secure: %s", p, msg);
143 JNU_ThrowIOException(env, buf);
144 }
145 } else {
146 char* msg = strdup(strerror(res));
147 JNU_ThrowIOException(env, msg);
148 if (msg != NULL) {
149 free(msg);
150 }
151 }
Post by David Holmes
Minor correction: EPERM -> EACCES for Solaris
Hard to see how to get a transient EACCES when opening a file
... though as it is really a door I guess there could be
additional complexity.
David
Post by Chris Plummer
Post by David Holmes
Chris,
Post by Chris Plummer
Post by g***@oracle.com
The general attach sequence ...
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
the attacher creates an attach_pid file in a directory
where the attachee is runnning
issues a signal to the attacheee
loops waiting for the java_pid file to be created
default timeout is 10 seconds
So getting a FileNotFoundException while in this loop is
OK, but IOException is not.
Post by g***@oracle.com
src/hotspot/os/solaris/attachListener_solaris.cpp
attachee creates the java_pid file
listens til the attacher opens the door
I'm don't think this is related, but JDK-8199811 made a
fix in attachListener_solaris.cpp to make it wait up to 10
seconds for initialization to complete before failing the
enqueue.
Post by g***@oracle.com
...
Not sure when a bare IOException is thrown rather than the
more specific FileNotFoundException.
Where is the IOException originating from? I wonder if the
issue is that the file is in the process of being created,
but is not fully created yet. Maybe it is there, but
owner/group/permissions have not been set yet, and this
results in an IOException instead of FileNotFoundException.
[java.io.IOException: Permission denied
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
And if you look at the native code the EPERM from open will
cause IOException to be thrown.
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
(JNIEnv *env, jclass cls, jstring path)
{
jboolean isCopy;
const char* p = GetStringPlatformChars(env, path, &isCopy);
if (p == NULL) {
return 0;
} else {
int fd;
int err = 0;
fd = open(p, O_RDWR);
if (fd == -1) {
err = errno;
}
if (isCopy) {
JNU_ReleaseStringPlatformChars(env, path, p);
}
if (fd == -1) {
if (err == ENOENT) {
JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);
} else {
char* msg = strdup(strerror(err));
JNU_ThrowIOException(env, msg);
if (msg != NULL) {
free(msg);
}
We should add the path to the exception message.
Thanks David. So if EPERM is the error and a retry 100ms
later works, I think that supports my hypothesis that the
file is not quite fully created. So Gary's fix is probably
fine. The only other possible fix I can think of that
wouldn't require an explicit delay (or multiple retries) is
probably not worth the complexity. It would require that the
attachee create two files, and the attacher try to open the
second file first. When it either opens or returns EPERM,
you know the first file can safety be opened.
Chris
Post by David Holmes
David
-----
Post by Chris Plummer
Chris
Post by g***@oracle.com
Post by Chris Plummer
Can you summarize how the attach handshaking is suppose
to work? I'm just wondering why the attacher would ever
be looking for the file before the attachee has created
it. It seems a proper handshake would prevent this.
Maybe there's some sort of visibility issue where the
attachee has indeed created the file, but it is not
immediately visible to the attacher process.
Chris
Post by g***@oracle.com
The problem reproduced pretty quickly.
I added a call to checkPermission and revealed the
"file not found" from the stat call when the IOException
was detected.
There has been some flakiness from the Solaris test
machines today,
so I'll continue with the testing a bit longer.
Post by Chris Plummer
Without the fix was this issue easy enough to
reproduce that you can be sure this is resolving it?
Chris
Post by Gary Adams
Solaris debug builds are failing tests that use the
attach interface.
An IOException is reported when the java_pid file is
not opened.
It appears that the attempt to attach is taking place
too quickly.
This workaround will allow the open operation to be
retried
after a short pause.
http://cr.openjdk.java.net/~gadams/8210337/webrev/
https://bugs.openjdk.java.net/browse/JDK-8210337
Testing is in progress.
Chris Plummer
2018-11-08 19:28:48 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Ok. You can close it as CNR then.<br>
<br>
thanks,<br>
<br>
Chris<br>
<br>
On 11/8/18 10:14 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
Different ipc mechanisms are used on each target platform<br>
  solaris - doors<br>
  linux - unix domain sockets<br>
  windows - named pipes<br>
<br>
Solaris machines have a tendency to run for years without a need
to reboot.<br>
<br>
I have not examined the actual test machine where the original
problem<br>
was reported, but I was informed once about hundreds of leftover
temp files <br>
on another of the test machines.<br>
<br>
The files could have accumulated over time from jprt or mach5
infrastructure<br>
transitions. The collision on pid reuse alone would make this an
extremely <br>
rare event.<br>
<br>
If the "Permission denied" issue appears again, we should get a
dump of<br>
the temp directory from the failed machine.<br>
<br>
On 11/8/18, 12:42 PM, Chris Plummer wrote:
<blockquote
cite="mid:27411ef2-08e7-752f-cdbd-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Ok. Any idea what might have led to
that PID to be created by another user? Isn't all our testing
done using the same userid? Also, why is this only a solaris
problem?<br>
<br>
Chris<br>
<br>
On 11/7/18 7:43 PM, <a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com">***@oracle.com</a>
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:cc1cdb3c-6730-363d-c9b0-***@oracle.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">There is recovery code that
cleans up old attach files,<br>
but a permission denied error would prevent the clean <br>
up operation from taking place. We're looking at a case of <br>
a file owned by another user as well as a pid recycling<br>
and failed clean up situation.<br>
<br>
On 11/7/18 3:09 PM, Chris Plummer wrote:<br>
</div>
<blockquote type="cite"
cite="mid:4822c63c-3078-94f4-a754-***@oracle.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Are you saying that the PID was
recycled, so the issue is a test run from long ago leaving
behind the file? If so, I thought there was something in
the attach code that cleaned up these PID files that were
left behind. In general I would not count this as an infra
issue. PID files left behind are the fault of the JVM.
Maybe there is something wrong with the cleanup code on
solaris, or maybe we don't clean up on any platform, but
cycle through PIDs a lot faster on solaris.<br>
<br>
Chris<br>
<br>
On 11/7/18 11:20 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
If there are no further suggestions on JDK-8210337,<br>
I plan to close it out as cannot reproduce.<br>
<br>
Similar bugs had been filed for the "Permission denied"
error<br>
from the openDoor request failure and each was attributed
<br>
to an infrastructure issue. e.g. another user with the
same <br>
pid left a temporary file that is blocking the current
test <br>
from attaching correctly.<br>
<br>
On 10/4/18, 1:49 PM, Gary Adams wrote:
<blockquote cite="mid:***@oracle.com"
type="cite">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry did not fix the problem with
permission denied.<br>
<br>
When I was diagnosing the problem I instrumented the
code <br>
to catch an IOException and call checkPermission to get
<br>
more detail about the IOException. The error reported<br>
from calling checkPermission was ENOENT (stat).<br>
<br>
The code change I then proposed was catch the
IOException,<br>
delay, and retry the open. That fixed the problem of <br>
ENOENT, but had nothing to do with "permission denied".<br>
<br>
On 10/4/18, 1:25 PM, Chris Plummer wrote:
<blockquote
cite="mid:339eed9a-21e8-2a19-7a38-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">But I also thought you
said the delay and retry fixed the problem. How
could fix the problem if it is just duplicating
something that is already in place?<br>
<br>
Chris<br>
<br>
On 10/4/18 9:48 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry just duplicated the openDoor
retry.<br>
The normal processing of
FileNotFoundException(ENOENT) is to retry<br>
several times until the file is available.<br>
<br>
But the original problem reported is a "Permission
denied" (EACCESS or EPERM).<br>
Delay and retry will not resolve a permissions
error.<br>
<br>
On 10/4/18, 12:30 PM, Chris Plummer wrote:
<blockquote
cite="mid:7115147a-2aae-d168-2db5-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">
<div class="moz-cite-prefix">Didn't the retry
after 100ms delay work? If yes, why would it if
the problem is that a java_pid was not cleaned
up?<br>
<br>
Chris<br>
<br>
On 10/4/18 8:54 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
First, let me retract the proposed change,<br>
it is not the right solution to the problem
originally<br>
reported.<br>
<br>
Second, as a bit of explanation consider the
code fragments below.<br>
<br>
The high level processing calls openDoor which
is willing to retry <br>
the operation as long as the error is flagged
specifically<br>
as a FileNotFoundException.<br>
<br>
  VirtualMachineImpl.java:72<br>
  VirtualMachineImpl.c:81<br>
<br>
During my testing I had added a check
VirtualMachineImpl.java:214<br>
and when an IOException was detected made a call
to checkPermissions<br>
to get more detailed information about the
IOException. The error <br>
I saw was an ENOENT from the stat call. And not
the detailed checks for<br>
specific permissions issues
(VirtualMachineImpl.c:143)<br>
<br>
    VirtualMachineImpl.c:118<br>
    VirtualMachineImpl.c:147<br>
<br>
What I missed in the original proposed solution
was a FileNotFoundException<br>
extends IOException. That means my delay and
retry just duplicates the higher<br>
level retry around the openDoor call.<br>
<br>
Third, the original error message logged in the
bug report :<br>
<br>
<span style="caret-color: rgb(51, 51, 51);
color: rgb(51, 51, 51); font-family: Arial,
sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255);
text-decoration: none; display: inline
!important; float: none;">java.io.IOException:
Permission denied<span
class="Apple-converted-space"> </span></span><br
style="caret-color: rgb(51, 51, 51); color:
rgb(51, 51, 51); font-family: Arial,
sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px;
text-decoration: none;">
<span style="caret-color: rgb(51, 51, 51);
color: rgb(51, 51, 51); font-family: Arial,
sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255);
text-decoration: none; display: inline
!important; float: none;">at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method)<span class="Apple-converted-space"> </span></span><br>
<br>
had to have come from<br>
<br>
  VirtualMachineImpl.c:70<br>
  VirtualMachineImpl.c:84<br>
<br>
which means the actual open call reported the
file does exist<br>
but the permissions do not allow the file to be
accessed.<br>
That also means the normal mechanism of removing
leftover <br>
java_pid files would not have cleaned up another
user's<br>
java_pid files.<br>
<br>
=====<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:<br>
...<br>
    67            // Opens the door file to the
target VM. If the file is not<br>
    68            // found it might mean that
the attach mechanism isn't started in the<br>
    69            // target VM so we attempt to
start it and retry.<br>
    70            try {<br>
    71                fd = openDoor(pid);<br>
    72            } catch (FileNotFoundException
fnf1) {<br>
    73                File f =
createAttachFile(pid);<br>
    74                try {<br>
    75                    sigquit(pid);<br>
    76    <br>
    77                    // give the target VM
time to start the attach mechanism<br>
    78                    final int delay_step =
100;<br>
    79                    final long timeout =
attachTimeout();<br>
    80                    long time_spend = 0;<br>
    81                    long delay = 0;<br>
    82                    do {<br>
    83                        // Increase
timeout on each attempt to reduce polling<br>
    84                        delay +=
delay_step;<br>
    85                        try {<br>
    86                           
Thread.sleep(delay);<br>
    87                        } catch
(InterruptedException x) { }<br>
    88                        try {<br>
    89                            fd =
openDoor(pid);<br>
    90                        } catch
(FileNotFoundException fnf2) {<br>
    91                            // pass<br>
    92                        }<br>
    93    <br>
    94                        time_spend +=
delay;<br>
    95                        if (time_spend
&gt; timeout/2 &amp;&amp; fd == -1) {<br>
    96                            // Send QUIT
again to give target VM the last chance to react<br>
    97                            sigquit(pid);<br>
    98                        }<br>
    99                    } while (time_spend
&lt;= timeout &amp;&amp; fd == -1);<br>
   100                    if (fd  == -1) {<br>
   101                        throw new
AttachNotSupportedException(<br>
   102                           
String.format("Unable to open door %s: " +<br>
   103                              "target
process %d doesn't respond within %dms " +<br>
   104                              "or HotSpot
VM not loaded", socket_path, pid, time_spend));<br>
   105                    }<br>
...<br>
   212        // The door is attached to
.java_pid&lt;pid&gt; in the temporary directory.<br>
   213        private int openDoor(int pid)
throws IOException {<br>
   214            socket_path = tmpdir +
"/.java_pid" + pid;<br>
   215            fd = open(socket_path);<br>
   216    <br>
   217            // Check that the file
owner/permission to avoid attaching to<br>
   218            // bogus process<br>
   219            try {<br>
   220               
checkPermissions(socket_path);<br>
   221            } catch (IOException ioe) {<br>
   222                close(fd);<br>
   223                throw ioe;<br>
   224            }<br>
   225            return fd;<br>
   226        }<br>
<br>
=====<br>
src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:<br>
...<br>
    59    JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open<br>
    60      (JNIEnv *env, jclass cls, jstring
path)<br>
    61    {<br>
    62        jboolean isCopy;<br>
    63        const char* p =
GetStringPlatformChars(env, path, &amp;isCopy);<br>
    64        if (p == NULL) {<br>
    65            return 0;<br>
    66        } else {<br>
    67            int fd;<br>
    68            int err = 0;<br>
    69    <br>
    70            fd = open(p, O_RDWR);<br>
    71            if (fd == -1) {<br>
    72                err = errno;<br>
    73            }<br>
    74    <br>
    75            if (isCopy) {<br>
    76               
JNU_ReleaseStringPlatformChars(env, path, p);<br>
    77            }<br>
    78    <br>
    79            if (fd == -1) {<br>
    80                if (err == ENOENT) {<br>
    81                    JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);<br>
    82                } else {<br>
    83                    char* msg =
strdup(strerror(err));<br>
    84                   
JNU_ThrowIOException(env, msg);<br>
    85                    if (msg != NULL) {<br>
    86                        free(msg);<br>
    87                    }<br>
    88                }<br>
    89            }<br>
    90            return fd;<br>
    91        }<br>
    92    }<br>
...<br>
    99    JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions<br>
   100      (JNIEnv *env, jclass cls, jstring
path)<br>
   101    {<br>
   102        jboolean isCopy;<br>
   103        const char* p =
GetStringPlatformChars(env, path, &amp;isCopy);<br>
   104        if (p != NULL) {<br>
   105            struct stat64 sb;<br>
   106            uid_t uid, gid;<br>
   107            int res;<br>
   108    <br>
   109            memset(&amp;sb, 0,
sizeof(struct stat64));<br>
   110    <br>
   111            /*<br>
   112             * Check that the path is
owned by the effective uid/gid of this<br>
   113             * process. Also check that
group/other access is not allowed.<br>
   114             */<br>
   115            uid = geteuid();<br>
   116            gid = getegid();<br>
   117    <br>
   118            res = stat64(p, &amp;sb);<br>
   119            if (res != 0) {<br>
   120                /* save errno */<br>
   121                res = errno;<br>
   122            }<br>
   123    <br>
   124            if (res == 0) {<br>
   125                char msg[100];<br>
   126                jboolean isError =
JNI_FALSE;<br>
   127                if (sb.st_uid != uid
&amp;&amp; uid != ROOT_UID) {<br>
   128                    snprintf(msg,
sizeof(msg),<br>
   129                        "file should be
owned by the current user (which is %d) but is
owned by %d", uid, sb.st_uid);<br>
   130                    isError = JNI_TRUE;<br>
   131                } else if (sb.st_gid !=
gid &amp;&amp; uid != ROOT_UID) {<br>
   132                    snprintf(msg,
sizeof(msg),<br>
   133                        "file's group
should be the current group (which is %d) but
the group is %d", gid, sb.st_gid);<br>
   134                    isError = JNI_TRUE;<br>
   135                } else if ((sb.st_mode
&amp; (S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {<br>
   136                    snprintf(msg,
sizeof(msg),<br>
   137                        "file should only
be readable and writable by the owner but has
0%03o access", sb.st_mode &amp; 0777);<br>
   138                    isError = JNI_TRUE;<br>
   139                }<br>
   140                if (isError) {<br>
   141                    char buf[256];<br>
   142                    snprintf(buf,
sizeof(buf), "well-known file %s is not secure:
%s", p, msg);<br>
   143                   
JNU_ThrowIOException(env, buf);<br>
   144                }<br>
   145            } else {<br>
   146                char* msg =
strdup(strerror(res));<br>
   147                JNU_ThrowIOException(env,
msg);<br>
   148                if (msg != NULL) {<br>
   149                    free(msg);<br>
   150                }<br>
   151            }<br>
<br>
On 10/2/18, 6:23 PM, David Holmes wrote:
<blockquote
cite="mid:e1adb28a-031f-5c28-74a5-***@oracle.com"
type="cite">Minor correction: EPERM -&gt;
EACCES for Solaris <br>
<br>
Hard to see how to get a transient EACCES when
opening a file ... though as it is really a
door I guess there could be additional
complexity. <br>
<br>
David <br>
<br>
On 3/10/2018 7:54 AM, Chris Plummer wrote: <br>
<blockquote type="cite">On 10/2/18 2:38 PM,
David Holmes wrote: <br>
<blockquote type="cite">Chris, <br>
<br>
On 3/10/2018 6:57 AM, Chris Plummer wrote:
<br>
<blockquote type="cite"> <br>
<br>
On 10/2/18 1:44 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The general
attach sequence ... <br>
<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
<br>
<br>
 the attacher creates an attach_pid
file in a directory where the attachee
is runnning <br>
 issues a signal to the attacheee <br>
<br>
  loops waiting for the java_pid file
to be created <br>
  default timeout is 10 seconds <br>
<br>
</blockquote>
So getting a FileNotFoundException while
in this loop is OK, but IOException is
not. <br>
<br>
<blockquote type="cite">src/hotspot/os/solaris/attachListener_solaris.cpp
<br>
<br>
   attachee creates the java_pid file
<br>
   listens til the attacher opens the
door <br>
<br>
</blockquote>
I'm don't think this is related, but
JDK-8199811 made a fix in
attachListener_solaris.cpp to make it
wait up to 10 seconds for initialization
to complete before failing the enqueue.
<br>
<br>
<blockquote type="cite">... <br>
Not sure when a bare IOException is
thrown rather than the <br>
more specific FileNotFoundException. <br>
</blockquote>
Where is the IOException originating
from? I wonder if the issue is that the
file is in the process of being created,
but is not fully created yet. Maybe it
is there, but owner/group/permissions
have not been set yet, and this results
in an IOException instead of
FileNotFoundException. <br>
</blockquote>
<br>
The exception is shown in the bug report:
<br>
<br>
 [java.io.IOException: Permission denied <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method) <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
<br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.&lt;init&gt;(VirtualMachineImpl.java:71)
<br>
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
<br>
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
<br>
<br>
And if you look at the native code the
EPERM from open will cause IOException to
be thrown. <br>
<br>
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c <br>
<br>
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
<br>
  (JNIEnv *env, jclass cls, jstring path)
<br>
{ <br>
    jboolean isCopy; <br>
    const char* p =
GetStringPlatformChars(env, path,
&amp;isCopy); <br>
    if (p == NULL) { <br>
        return 0; <br>
    } else { <br>
        int fd; <br>
        int err = 0; <br>
<br>
        fd = open(p, O_RDWR); <br>
        if (fd == -1) { <br>
            err = errno; <br>
        } <br>
<br>
        if (isCopy) { <br>
           
JNU_ReleaseStringPlatformChars(env, path,
p); <br>
        } <br>
<br>
        if (fd == -1) { <br>
            if (err == ENOENT) { <br>
                JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL); <br>
            } else { <br>
                char* msg =
strdup(strerror(err)); <br>
                JNU_ThrowIOException(env,
msg); <br>
                if (msg != NULL) { <br>
                    free(msg); <br>
                } <br>
<br>
<br>
We should add the path to the exception
message. <br>
<br>
</blockquote>
Thanks David. So if EPERM is the error and a
retry 100ms later works, I think that
supports my hypothesis that the file is not
quite fully created. So Gary's fix is
probably fine. The only other possible fix I
can think of that wouldn't require an
explicit delay (or multiple retries) is
probably not worth the complexity. It would
require that the attachee create two files,
and the attacher try to open the second file
first. When it either opens or returns
EPERM, you know the first file can safety be
opened. <br>
<br>
Chris <br>
<blockquote type="cite">David <br>
----- <br>
<br>
<blockquote type="cite">Chris <br>
<blockquote type="cite"> <br>
<br>
<br>
On 10/2/18 4:11 PM, Chris Plummer
wrote: <br>
<blockquote type="cite">Can you
summarize how the attach handshaking
is suppose to work? I'm just
wondering why the attacher would
ever be looking for the file before
the attachee has created it. It
seems a proper handshake would
prevent this. Maybe there's some
sort of visibility issue where the
attachee has indeed created the
file, but it is not immediately
visible to the attacher process. <br>
<br>
Chris <br>
<br>
On 10/2/18 12:27 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The problem
reproduced pretty quickly. <br>
I added a call to checkPermission
and revealed the <br>
"file not found" from the stat
call when the IOException <br>
was detected. <br>
<br>
There has been some flakiness from
the Solaris test machines today, <br>
so I'll continue with the testing
a bit longer. <br>
<br>
On 10/2/18 3:12 PM, Chris Plummer
wrote: <br>
<blockquote type="cite">Without
the fix was this issue easy
enough to reproduce that you can
be sure this is resolving it? <br>
<br>
Chris <br>
<br>
On 10/2/18 8:16 AM, Gary Adams
wrote: <br>
<blockquote type="cite">Solaris
debug builds are failing tests
that use the attach interface.
<br>
An IOException is reported
when the java_pid file is not
opened. <br>
<br>
It appears that the attempt to
attach is taking place too
quickly. <br>
This workaround will allow the
open operation to be retried <br>
after a short pause. <br>
<br>
  Webrev: <a
class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Egadams/8210337/webrev/"
moz-do-not-send="true">http://cr.openjdk.java.net/~gadams/8210337/webrev/</a>
<br>
  Issue: <a
class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8210337"
moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8210337</a>
<br>
<br>
Testing is in progress. <br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<p><br>
</p>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>

Loading...