<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Ok. You can close it as CNR then.<br>
<br>
thanks,<br>
<br>
Chris<br>
<br>
On 11/8/18 10:14 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite" cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
Different ipc mechanisms are used on each target platform<br>
solaris - doors<br>
linux - unix domain sockets<br>
windows - named pipes<br>
<br>
Solaris machines have a tendency to run for years without a need
to reboot.<br>
<br>
I have not examined the actual test machine where the original
problem<br>
was reported, but I was informed once about hundreds of leftover
temp files <br>
on another of the test machines.<br>
<br>
The files could have accumulated over time from jprt or mach5
infrastructure<br>
transitions. The collision on pid reuse alone would make this an
extremely <br>
rare event.<br>
<br>
If the "Permission denied" issue appears again, we should get a
dump of<br>
the temp directory from the failed machine.<br>
<br>
On 11/8/18, 12:42 PM, Chris Plummer wrote:
<blockquote
cite="mid:27411ef2-08e7-752f-cdbd-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Ok. Any idea what might have led to
that PID to be created by another user? Isn't all our testing
done using the same userid? Also, why is this only a solaris
problem?<br>
<br>
Chris<br>
<br>
On 11/7/18 7:43 PM, <a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com">***@oracle.com</a>
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:cc1cdb3c-6730-363d-c9b0-***@oracle.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">There is recovery code that
cleans up old attach files,<br>
but a permission denied error would prevent the clean <br>
up operation from taking place. We're looking at a case of <br>
a file owned by another user as well as a pid recycling<br>
and failed clean up situation.<br>
<br>
On 11/7/18 3:09 PM, Chris Plummer wrote:<br>
</div>
<blockquote type="cite"
cite="mid:4822c63c-3078-94f4-a754-***@oracle.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">Are you saying that the PID was
recycled, so the issue is a test run from long ago leaving
behind the file? If so, I thought there was something in
the attach code that cleaned up these PID files that were
left behind. In general I would not count this as an infra
issue. PID files left behind are the fault of the JVM.
Maybe there is something wrong with the cleanup code on
solaris, or maybe we don't clean up on any platform, but
cycle through PIDs a lot faster on solaris.<br>
<br>
Chris<br>
<br>
On 11/7/18 11:20 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
If there are no further suggestions on JDK-8210337,<br>
I plan to close it out as cannot reproduce.<br>
<br>
Similar bugs had been filed for the "Permission denied"
error<br>
from the openDoor request failure and each was attributed
<br>
to an infrastructure issue. e.g. another user with the
same <br>
pid left a temporary file that is blocking the current
test <br>
from attaching correctly.<br>
<br>
On 10/4/18, 1:49 PM, Gary Adams wrote:
<blockquote cite="mid:***@oracle.com"
type="cite">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry did not fix the problem with
permission denied.<br>
<br>
When I was diagnosing the problem I instrumented the
code <br>
to catch an IOException and call checkPermission to get
<br>
more detail about the IOException. The error reported<br>
from calling checkPermission was ENOENT (stat).<br>
<br>
The code change I then proposed was catch the
IOException,<br>
delay, and retry the open. That fixed the problem of <br>
ENOENT, but had nothing to do with "permission denied".<br>
<br>
On 10/4/18, 1:25 PM, Chris Plummer wrote:
<blockquote
cite="mid:339eed9a-21e8-2a19-7a38-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">But I also thought you
said the delay and retry fixed the problem. How
could fix the problem if it is just duplicating
something that is already in place?<br>
<br>
Chris<br>
<br>
On 10/4/18 9:48 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
My delay and retry just duplicated the openDoor
retry.<br>
The normal processing of
FileNotFoundException(ENOENT) is to retry<br>
several times until the file is available.<br>
<br>
But the original problem reported is a "Permission
denied" (EACCESS or EPERM).<br>
Delay and retry will not resolve a permissions
error.<br>
<br>
On 10/4/18, 12:30 PM, Chris Plummer wrote:
<blockquote
cite="mid:7115147a-2aae-d168-2db5-***@oracle.com"
type="cite">
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">
<div class="moz-cite-prefix">Didn't the retry
after 100ms delay work? If yes, why would it if
the problem is that a java_pid was not cleaned
up?<br>
<br>
Chris<br>
<br>
On 10/4/18 8:54 AM, Gary Adams wrote:<br>
</div>
<blockquote type="cite"
cite="mid:***@oracle.com">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
First, let me retract the proposed change,<br>
it is not the right solution to the problem
originally<br>
reported.<br>
<br>
Second, as a bit of explanation consider the
code fragments below.<br>
<br>
The high level processing calls openDoor which
is willing to retry <br>
the operation as long as the error is flagged
specifically<br>
as a FileNotFoundException.<br>
<br>
VirtualMachineImpl.java:72<br>
VirtualMachineImpl.c:81<br>
<br>
During my testing I had added a check
VirtualMachineImpl.java:214<br>
and when an IOException was detected made a call
to checkPermissions<br>
to get more detailed information about the
IOException. The error <br>
I saw was an ENOENT from the stat call. And not
the detailed checks for<br>
specific permissions issues
(VirtualMachineImpl.c:143)<br>
<br>
VirtualMachineImpl.c:118<br>
VirtualMachineImpl.c:147<br>
<br>
What I missed in the original proposed solution
was a FileNotFoundException<br>
extends IOException. That means my delay and
retry just duplicates the higher<br>
level retry around the openDoor call.<br>
<br>
Third, the original error message logged in the
bug report :<br>
<br>
<span style="caret-color: rgb(51, 51, 51);
color: rgb(51, 51, 51); font-family: Arial,
sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255);
text-decoration: none; display: inline
!important; float: none;">java.io.IOException:
Permission denied<span
class="Apple-converted-space"> </span></span><br
style="caret-color: rgb(51, 51, 51); color:
rgb(51, 51, 51); font-family: Arial,
sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px;
text-decoration: none;">
<span style="caret-color: rgb(51, 51, 51);
color: rgb(51, 51, 51); font-family: Arial,
sans-serif; font-size: 14.000000953674316px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255);
text-decoration: none; display: inline
!important; float: none;">at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method)<span class="Apple-converted-space"> </span></span><br>
<br>
had to have come from<br>
<br>
VirtualMachineImpl.c:70<br>
VirtualMachineImpl.c:84<br>
<br>
which means the actual open call reported the
file does exist<br>
but the permissions do not allow the file to be
accessed.<br>
That also means the normal mechanism of removing
leftover <br>
java_pid files would not have cleaned up another
user's<br>
java_pid files.<br>
<br>
=====<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:<br>
...<br>
67 // Opens the door file to the
target VM. If the file is not<br>
68 // found it might mean that
the attach mechanism isn't started in the<br>
69 // target VM so we attempt to
start it and retry.<br>
70 try {<br>
71 fd = openDoor(pid);<br>
72 } catch (FileNotFoundException
fnf1) {<br>
73 File f =
createAttachFile(pid);<br>
74 try {<br>
75 sigquit(pid);<br>
76 <br>
77 // give the target VM
time to start the attach mechanism<br>
78 final int delay_step =
100;<br>
79 final long timeout =
attachTimeout();<br>
80 long time_spend = 0;<br>
81 long delay = 0;<br>
82 do {<br>
83 // Increase
timeout on each attempt to reduce polling<br>
84 delay +=
delay_step;<br>
85 try {<br>
86
Thread.sleep(delay);<br>
87 } catch
(InterruptedException x) { }<br>
88 try {<br>
89 fd =
openDoor(pid);<br>
90 } catch
(FileNotFoundException fnf2) {<br>
91 // pass<br>
92 }<br>
93 <br>
94 time_spend +=
delay;<br>
95 if (time_spend
> timeout/2 && fd == -1) {<br>
96 // Send QUIT
again to give target VM the last chance to react<br>
97 sigquit(pid);<br>
98 }<br>
99 } while (time_spend
<= timeout && fd == -1);<br>
100 if (fd == -1) {<br>
101 throw new
AttachNotSupportedException(<br>
102
String.format("Unable to open door %s: " +<br>
103 "target
process %d doesn't respond within %dms " +<br>
104 "or HotSpot
VM not loaded", socket_path, pid, time_spend));<br>
105 }<br>
...<br>
212 // The door is attached to
.java_pid<pid> in the temporary directory.<br>
213 private int openDoor(int pid)
throws IOException {<br>
214 socket_path = tmpdir +
"/.java_pid" + pid;<br>
215 fd = open(socket_path);<br>
216 <br>
217 // Check that the file
owner/permission to avoid attaching to<br>
218 // bogus process<br>
219 try {<br>
220
checkPermissions(socket_path);<br>
221 } catch (IOException ioe) {<br>
222 close(fd);<br>
223 throw ioe;<br>
224 }<br>
225 return fd;<br>
226 }<br>
<br>
=====<br>
src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:<br>
...<br>
59 JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open<br>
60 (JNIEnv *env, jclass cls, jstring
path)<br>
61 {<br>
62 jboolean isCopy;<br>
63 const char* p =
GetStringPlatformChars(env, path, &isCopy);<br>
64 if (p == NULL) {<br>
65 return 0;<br>
66 } else {<br>
67 int fd;<br>
68 int err = 0;<br>
69 <br>
70 fd = open(p, O_RDWR);<br>
71 if (fd == -1) {<br>
72 err = errno;<br>
73 }<br>
74 <br>
75 if (isCopy) {<br>
76
JNU_ReleaseStringPlatformChars(env, path, p);<br>
77 }<br>
78 <br>
79 if (fd == -1) {<br>
80 if (err == ENOENT) {<br>
81 JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL);<br>
82 } else {<br>
83 char* msg =
strdup(strerror(err));<br>
84
JNU_ThrowIOException(env, msg);<br>
85 if (msg != NULL) {<br>
86 free(msg);<br>
87 }<br>
88 }<br>
89 }<br>
90 return fd;<br>
91 }<br>
92 }<br>
...<br>
99 JNIEXPORT void JNICALL
Java_sun_tools_attach_VirtualMachineImpl_checkPermissions<br>
100 (JNIEnv *env, jclass cls, jstring
path)<br>
101 {<br>
102 jboolean isCopy;<br>
103 const char* p =
GetStringPlatformChars(env, path, &isCopy);<br>
104 if (p != NULL) {<br>
105 struct stat64 sb;<br>
106 uid_t uid, gid;<br>
107 int res;<br>
108 <br>
109 memset(&sb, 0,
sizeof(struct stat64));<br>
110 <br>
111 /*<br>
112 * Check that the path is
owned by the effective uid/gid of this<br>
113 * process. Also check that
group/other access is not allowed.<br>
114 */<br>
115 uid = geteuid();<br>
116 gid = getegid();<br>
117 <br>
118 res = stat64(p, &sb);<br>
119 if (res != 0) {<br>
120 /* save errno */<br>
121 res = errno;<br>
122 }<br>
123 <br>
124 if (res == 0) {<br>
125 char msg[100];<br>
126 jboolean isError =
JNI_FALSE;<br>
127 if (sb.st_uid != uid
&& uid != ROOT_UID) {<br>
128 snprintf(msg,
sizeof(msg),<br>
129 "file should be
owned by the current user (which is %d) but is
owned by %d", uid, sb.st_uid);<br>
130 isError = JNI_TRUE;<br>
131 } else if (sb.st_gid !=
gid && uid != ROOT_UID) {<br>
132 snprintf(msg,
sizeof(msg),<br>
133 "file's group
should be the current group (which is %d) but
the group is %d", gid, sb.st_gid);<br>
134 isError = JNI_TRUE;<br>
135 } else if ((sb.st_mode
& (S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {<br>
136 snprintf(msg,
sizeof(msg),<br>
137 "file should only
be readable and writable by the owner but has
0%03o access", sb.st_mode & 0777);<br>
138 isError = JNI_TRUE;<br>
139 }<br>
140 if (isError) {<br>
141 char buf[256];<br>
142 snprintf(buf,
sizeof(buf), "well-known file %s is not secure:
%s", p, msg);<br>
143
JNU_ThrowIOException(env, buf);<br>
144 }<br>
145 } else {<br>
146 char* msg =
strdup(strerror(res));<br>
147 JNU_ThrowIOException(env,
msg);<br>
148 if (msg != NULL) {<br>
149 free(msg);<br>
150 }<br>
151 }<br>
<br>
On 10/2/18, 6:23 PM, David Holmes wrote:
<blockquote
cite="mid:e1adb28a-031f-5c28-74a5-***@oracle.com"
type="cite">Minor correction: EPERM ->
EACCES for Solaris <br>
<br>
Hard to see how to get a transient EACCES when
opening a file ... though as it is really a
door I guess there could be additional
complexity. <br>
<br>
David <br>
<br>
On 3/10/2018 7:54 AM, Chris Plummer wrote: <br>
<blockquote type="cite">On 10/2/18 2:38 PM,
David Holmes wrote: <br>
<blockquote type="cite">Chris, <br>
<br>
On 3/10/2018 6:57 AM, Chris Plummer wrote:
<br>
<blockquote type="cite"> <br>
<br>
On 10/2/18 1:44 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The general
attach sequence ... <br>
<br>
src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
<br>
<br>
the attacher creates an attach_pid
file in a directory where the attachee
is runnning <br>
issues a signal to the attacheee <br>
<br>
loops waiting for the java_pid file
to be created <br>
default timeout is 10 seconds <br>
<br>
</blockquote>
So getting a FileNotFoundException while
in this loop is OK, but IOException is
not. <br>
<br>
<blockquote type="cite">src/hotspot/os/solaris/attachListener_solaris.cpp
<br>
<br>
attachee creates the java_pid file
<br>
listens til the attacher opens the
door <br>
<br>
</blockquote>
I'm don't think this is related, but
JDK-8199811 made a fix in
attachListener_solaris.cpp to make it
wait up to 10 seconds for initialization
to complete before failing the enqueue.
<br>
<br>
<blockquote type="cite">... <br>
Not sure when a bare IOException is
thrown rather than the <br>
more specific FileNotFoundException. <br>
</blockquote>
Where is the IOException originating
from? I wonder if the issue is that the
file is in the process of being created,
but is not fully created yet. Maybe it
is there, but owner/group/permissions
have not been set yet, and this results
in an IOException instead of
FileNotFoundException. <br>
</blockquote>
<br>
The exception is shown in the bug report:
<br>
<br>
[java.io.IOException: Permission denied <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
Method) <br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
<br>
at
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
<br>
at
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
<br>
at
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
<br>
at
jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
<br>
<br>
And if you look at the native code the
EPERM from open will cause IOException to
be thrown. <br>
<br>
./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c <br>
<br>
JNIEXPORT jint JNICALL
Java_sun_tools_attach_VirtualMachineImpl_open
<br>
(JNIEnv *env, jclass cls, jstring path)
<br>
{ <br>
jboolean isCopy; <br>
const char* p =
GetStringPlatformChars(env, path,
&isCopy); <br>
if (p == NULL) { <br>
return 0; <br>
} else { <br>
int fd; <br>
int err = 0; <br>
<br>
fd = open(p, O_RDWR); <br>
if (fd == -1) { <br>
err = errno; <br>
} <br>
<br>
if (isCopy) { <br>
JNU_ReleaseStringPlatformChars(env, path,
p); <br>
} <br>
<br>
if (fd == -1) { <br>
if (err == ENOENT) { <br>
JNU_ThrowByName(env,
"java/io/FileNotFoundException", NULL); <br>
} else { <br>
char* msg =
strdup(strerror(err)); <br>
JNU_ThrowIOException(env,
msg); <br>
if (msg != NULL) { <br>
free(msg); <br>
} <br>
<br>
<br>
We should add the path to the exception
message. <br>
<br>
</blockquote>
Thanks David. So if EPERM is the error and a
retry 100ms later works, I think that
supports my hypothesis that the file is not
quite fully created. So Gary's fix is
probably fine. The only other possible fix I
can think of that wouldn't require an
explicit delay (or multiple retries) is
probably not worth the complexity. It would
require that the attachee create two files,
and the attacher try to open the second file
first. When it either opens or returns
EPERM, you know the first file can safety be
opened. <br>
<br>
Chris <br>
<blockquote type="cite">David <br>
----- <br>
<br>
<blockquote type="cite">Chris <br>
<blockquote type="cite"> <br>
<br>
<br>
On 10/2/18 4:11 PM, Chris Plummer
wrote: <br>
<blockquote type="cite">Can you
summarize how the attach handshaking
is suppose to work? I'm just
wondering why the attacher would
ever be looking for the file before
the attachee has created it. It
seems a proper handshake would
prevent this. Maybe there's some
sort of visibility issue where the
attachee has indeed created the
file, but it is not immediately
visible to the attacher process. <br>
<br>
Chris <br>
<br>
On 10/2/18 12:27 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:***@oracle.com"
moz-do-not-send="true">***@oracle.com</a>
wrote: <br>
<blockquote type="cite">The problem
reproduced pretty quickly. <br>
I added a call to checkPermission
and revealed the <br>
"file not found" from the stat
call when the IOException <br>
was detected. <br>
<br>
There has been some flakiness from
the Solaris test machines today, <br>
so I'll continue with the testing
a bit longer. <br>
<br>
On 10/2/18 3:12 PM, Chris Plummer
wrote: <br>
<blockquote type="cite">Without
the fix was this issue easy
enough to reproduce that you can
be sure this is resolving it? <br>
<br>
Chris <br>
<br>
On 10/2/18 8:16 AM, Gary Adams
wrote: <br>
<blockquote type="cite">Solaris
debug builds are failing tests
that use the attach interface.
<br>
An IOException is reported
when the java_pid file is not
opened. <br>
<br>
It appears that the attempt to
attach is taking place too
quickly. <br>
This workaround will allow the
open operation to be retried <br>
after a short pause. <br>
<br>
Webrev: <a
class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Egadams/8210337/webrev/"
moz-do-not-send="true">http://cr.openjdk.java.net/~gadams/8210337/webrev/</a>
<br>
Issue: <a
class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8210337"
moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8210337</a>
<br>
<br>
Testing is in progress. <br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<p><br>
</p>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</body>
</html>