Proposal: Always-on Statistical History

Discussion:

Thomas Stüfe

2018-11-14 14:57:47 UTC

Hi all,

We have that feature in our port which we would like to contribute,
and I would like to gauge opinions.

First off, I am not sure which list is correct. This is more of a
serviceability issue, but implementation wise it fit hs-runtime
better. I'll start with serviceability, but feel free crosspost if
needed.

Second, I am aware that this may require a JEP. If necessary and the
feedback is positive, I will draft one.

----

In our port we have something called "Statistics History". Basically
this is a rolling history, spanning up to 10 days, of a number of key
values. Key values range from JVM specifics like heap size, metaspace
size, number of threads etc, to platform specifics like memory
footprint, cpu load, io- and swapping activity etc.

A periodic tasks collects those values, in - by default - 15 second
intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
memory that FIFO is downsampled in two steps, so we have the last n
hours in high resolution and the last n days in low resolution (of
course all these parameters are configurable).

The history report can be triggered via jcmd, and also could get
printed in the hs.err file (open for debate).

---

Here some examples of how the whole thing looks like:

http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt

http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt

---

This feature has been really popular with our support folk over the
years. Be it that the VM is starved for resources by the OS, that we
have some slow- or fast developing leak situation etc: these values
are a first and easy way to get a first stab at a situation, before we
start more expensive analysis.

The explicit design goal of this history was to be very cheap - cheap
enough to be *always on* and getting forgotten. It is, in our port,
enabled by default. That way, if a problem occurs at a customer site,
we immediately see developments spanning the last 10 days, without
having to reproduce the issue.

It is also robust enough to be usable during error reporting without
endangering the error reporting process or falsifying the picture.

I am aware that this crosses over into JFR territory. But this feature
does not attempt to replace JFR, it is intended instead a cheap always
on first stop historical overview.

--

I have a patch which can be applied atop of jdk12:

http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch

It works, passes our nightlies and no regressions are shown in dapapo
benchmarks.

Please tell me what you think. Given enough interest, I will attempt
to contribute (drafting a JEP if necessary.)

Thanks and Kind Regards,

Thomas

Simon Roberts

2018-11-14 18:29:20 UTC

Permalink

I would say this could be pretty useful. It's almost like a
platform-independent, process specific vmstat, with JVM extras. Given the
existence of jps, this seems to fit in that ecosystem well. I find myself
having to work with windows just rarely enough that I'd have to look up how
to get this info on that host every time.
$0.02

On Wed, Nov 14, 2018 at 7:57 AM Thomas StÃŒfe <***@gmail.com>
wrote:

> Hi all,
>
> We have that feature in our port which we would like to contribute,
> and I would like to gauge opinions.
>
> First off, I am not sure which list is correct. This is more of a
> serviceability issue, but implementation wise it fit hs-runtime
> better. I'll start with serviceability, but feel free crosspost if
> needed.
>
> Second, I am aware that this may require a JEP. If necessary and the
> feedback is positive, I will draft one.
>
> ----
>
> In our port we have something called "Statistics History". Basically
> this is a rolling history, spanning up to 10 days, of a number of key
> values. Key values range from JVM specifics like heap size, metaspace
> size, number of threads etc, to platform specifics like memory
> footprint, cpu load, io- and swapping activity etc.
>
> A periodic tasks collects those values, in - by default - 15 second
> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> memory that FIFO is downsampled in two steps, so we have the last n
> hours in high resolution and the last n days in low resolution (of
> course all these parameters are configurable).
>
> The history report can be triggered via jcmd, and also could get
> printed in the hs.err file (open for debate).
>
> ---
>
> Here some examples of how the whole thing looks like:
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>
> ---
>
> This feature has been really popular with our support folk over the
> years. Be it that the VM is starved for resources by the OS, that we
> have some slow- or fast developing leak situation etc: these values
> are a first and easy way to get a first stab at a situation, before we
> start more expensive analysis.
>
> The explicit design goal of this history was to be very cheap - cheap
> enough to be *always on* and getting forgotten. It is, in our port,
> enabled by default. That way, if a problem occurs at a customer site,
> we immediately see developments spanning the last 10 days, without
> having to reproduce the issue.
>
> It is also robust enough to be usable during error reporting without
> endangering the error reporting process or falsifying the picture.
>
> I am aware that this crosses over into JFR territory. But this feature
> does not attempt to replace JFR, it is intended instead a cheap always
> on first stop historical overview.
>
> --
>
> I have a patch which can be applied atop of jdk12:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>
> It works, passes our nightlies and no regressions are shown in dapapo
> benchmarks.
>
> Please tell me what you think. Given enough interest, I will attempt
> to contribute (drafting a JEP if necessary.)
>
> Thanks and Kind Regards,
>
> Thomas
>

--
Simon Roberts
(303) 249 3613

Thomas Stüfe

2018-11-14 21:32:54 UTC

Permalink

Hi Simon,

thank you. Yes, I combined vmstat/pidstat like features etc with
internal JVM statistics. Note that part of that table is platform
specific, so it looks slightly different on BSD/Windows/Solaris etc.
The JVM values are always the same.

Best Regards, Thomas
On Wed, Nov 14, 2018 at 7:29 PM Simon Roberts
<***@dancingcloudservices.com> wrote:
>
> I would say this could be pretty useful. It's almost like a platform-independent, process specific vmstat, with JVM extras. Given the existence of jps, this seems to fit in that ecosystem well. I find myself having to work with windows just rarely enough that I'd have to look up how to get this info on that host every time.
> $0.02
>
>
> On Wed, Nov 14, 2018 at 7:57 AM Thomas Stüfe <***@gmail.com> wrote:
>>
>> Hi all,
>>
>> We have that feature in our port which we would like to contribute,
>> and I would like to gauge opinions.
>>
>> First off, I am not sure which list is correct. This is more of a
>> serviceability issue, but implementation wise it fit hs-runtime
>> better. I'll start with serviceability, but feel free crosspost if
>> needed.
>>
>> Second, I am aware that this may require a JEP. If necessary and the
>> feedback is positive, I will draft one.
>>
>> ----
>>
>> In our port we have something called "Statistics History". Basically
>> this is a rolling history, spanning up to 10 days, of a number of key
>> values. Key values range from JVM specifics like heap size, metaspace
>> size, number of threads etc, to platform specifics like memory
>> footprint, cpu load, io- and swapping activity etc.
>>
>> A periodic tasks collects those values, in - by default - 15 second
>> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>> memory that FIFO is downsampled in two steps, so we have the last n
>> hours in high resolution and the last n days in low resolution (of
>> course all these parameters are configurable).
>>
>> The history report can be triggered via jcmd, and also could get
>> printed in the hs.err file (open for debate).
>>
>> ---
>>
>> Here some examples of how the whole thing looks like:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>>
>> ---
>>
>> This feature has been really popular with our support folk over the
>> years. Be it that the VM is starved for resources by the OS, that we
>> have some slow- or fast developing leak situation etc: these values
>> are a first and easy way to get a first stab at a situation, before we
>> start more expensive analysis.
>>
>> The explicit design goal of this history was to be very cheap - cheap
>> enough to be *always on* and getting forgotten. It is, in our port,
>> enabled by default. That way, if a problem occurs at a customer site,
>> we immediately see developments spanning the last 10 days, without
>> having to reproduce the issue.
>>
>> It is also robust enough to be usable during error reporting without
>> endangering the error reporting process or falsifying the picture.
>>
>> I am aware that this crosses over into JFR territory. But this feature
>> does not attempt to replace JFR, it is intended instead a cheap always
>> on first stop historical overview.
>>
>> --
>>
>> I have a patch which can be applied atop of jdk12:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>>
>> It works, passes our nightlies and no regressions are shown in dapapo
>> benchmarks.
>>
>> Please tell me what you think. Given enough interest, I will attempt
>> to contribute (drafting a JEP if necessary.)
>>
>> Thanks and Kind Regards,
>>
>> Thomas
>
>
>
> --
> Simon Roberts
> (303) 249 3613
>

Kirk Pepperdine

2018-11-15 01:20:35 UTC

Permalink

Hi,

I agree, this could be very usefulâŠ

â Kirk

> On Nov 14, 2018, at 10:29 AM, Simon Roberts <***@dancingcloudservices.com> wrote:
>
> I would say this could be pretty useful. It's almost like a platform-independent, process specific vmstat, with JVM extras. Given the existence of jps, this seems to fit in that ecosystem well. I find myself having to work with windows just rarely enough that I'd have to look up how to get this info on that host every time.
> $0.02
>
>
> On Wed, Nov 14, 2018 at 7:57 AM Thomas StÃŒfe <***@gmail.com <mailto:***@gmail.com>> wrote:
> Hi all,
>
> We have that feature in our port which we would like to contribute,
> and I would like to gauge opinions.
>
> First off, I am not sure which list is correct. This is more of a
> serviceability issue, but implementation wise it fit hs-runtime
> better. I'll start with serviceability, but feel free crosspost if
> needed.
>
> Second, I am aware that this may require a JEP. If necessary and the
> feedback is positive, I will draft one.
>
> ----
>
> In our port we have something called "Statistics History". Basically
> this is a rolling history, spanning up to 10 days, of a number of key
> values. Key values range from JVM specifics like heap size, metaspace
> size, number of threads etc, to platform specifics like memory
> footprint, cpu load, io- and swapping activity etc.
>
> A periodic tasks collects those values, in - by default - 15 second
> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> memory that FIFO is downsampled in two steps, so we have the last n
> hours in high resolution and the last n days in low resolution (of
> course all these parameters are configurable).
>
> The history report can be triggered via jcmd, and also could get
> printed in the hs.err file (open for debate).
>
> ---
>
> Here some examples of how the whole thing looks like:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt <http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt <http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt>
>
> ---
>
> This feature has been really popular with our support folk over the
> years. Be it that the VM is starved for resources by the OS, that we
> have some slow- or fast developing leak situation etc: these values
> are a first and easy way to get a first stab at a situation, before we
> start more expensive analysis.
>
> The explicit design goal of this history was to be very cheap - cheap
> enough to be *always on* and getting forgotten. It is, in our port,
> enabled by default. That way, if a problem occurs at a customer site,
> we immediately see developments spanning the last 10 days, without
> having to reproduce the issue.
>
> It is also robust enough to be usable during error reporting without
> endangering the error reporting process or falsifying the picture.
>
> I am aware that this crosses over into JFR territory. But this feature
> does not attempt to replace JFR, it is intended instead a cheap always
> on first stop historical overview.
>
> --
>
> I have a patch which can be applied atop of jdk12:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch <http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch>
>
> It works, passes our nightlies and no regressions are shown in dapapo
> benchmarks.
>
> Please tell me what you think. Given enough interest, I will attempt
> to contribute (drafting a JEP if necessary.)
>
> Thanks and Kind Regards,
>
> Thomas
>
>
> --
> Simon Roberts
> (303) 249 3613
>

Bernd Eckenfels

2018-11-14 21:05:53 UTC

Permalink

Looks good Thomas, what would be the typical memory usage with the Default Settings? Does the downsampling support min/max style rollups?

--
http://bernd.eckenfels.net

Von: Thomas StÃŒfe
Gesendet: Mittwoch, 14. November 2018 16:29
An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
Betreff: Proposal: Always-on Statistical History

Hi all,

We have that feature in our port which we would like to contribute,
and I would like to gauge opinions.

First off, I am not sure which list is correct. This is more of a
serviceability issue, but implementation wise it fit hs-runtime
better. I'll start with serviceability, but feel free crosspost if
needed.

Second, I am aware that this may require a JEP. If necessary and the
feedback is positive, I will draft one.

----

In our port we have something called "Statistics History". Basically
this is a rolling history, spanning up to 10 days, of a number of key
values. Key values range from JVM specifics like heap size, metaspace
size, number of threads etc, to platform specifics like memory
footprint, cpu load, io- and swapping activity etc.

A periodic tasks collects those values, in - by default - 15 second
intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
memory that FIFO is downsampled in two steps, so we have the last n
hours in high resolution and the last n days in low resolution (of
course all these parameters are configurable).

The history report can be triggered via jcmd, and also could get
printed in the hs.err file (open for debate).

---

Here some examples of how the whole thing looks like:

http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt

http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt

---

This feature has been really popular with our support folk over the
years. Be it that the VM is starved for resources by the OS, that we
have some slow- or fast developing leak situation etc: these values
are a first and easy way to get a first stab at a situation, before we
start more expensive analysis.

The explicit design goal of this history was to be very cheap - cheap
enough to be *always on* and getting forgotten. It is, in our port,
enabled by default. That way, if a problem occurs at a customer site,
we immediately see developments spanning the last 10 days, without
having to reproduce the issue.

It is also robust enough to be usable during error reporting without
endangering the error reporting process or falsifying the picture.

I am aware that this crosses over into JFR territory. But this feature
does not attempt to replace JFR, it is intended instead a cheap always
on first stop historical overview.

--

I have a patch which can be applied atop of jdk12:

http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch

It works, passes our nightlies and no regressions are shown in dapapo
benchmarks.

Please tell me what you think. Given enough interest, I will attempt
to contribute (drafting a JEP if necessary.)

Thanks and Kind Regards,

Thomas

Thomas Stüfe

2018-11-14 21:27:47 UTC

Permalink

Hi Bernd,

On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
>
> Looks good Thomas,

thanks!

> what would be the typical memory usage with the Default Settings?

~ 80 Kb. Its very small.

> Does the downsampling support min/max style rollups?

Not sure what you mean. Do you mean does it preserve peaks? Not yet,
such a feature would have to be added.

Right now, downsampling is very primitive for performance reasons. For
snapshot values like heap size etc we just throw away the samples, so
you loose temporary peaks. For counter-like values-over-time (e.g.
number of pages swapped in etc), they just refer then to a larger time
span.

Best Regards, Thomas

>
>
>
> --
> http://bernd.eckenfels.net
>
>
>
> Von: Thomas Stüfe
> Gesendet: Mittwoch, 14. November 2018 16:29
> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
> Betreff: Proposal: Always-on Statistical History
>
>
>
> Hi all,
>
>
>
> We have that feature in our port which we would like to contribute,
>
> and I would like to gauge opinions.
>
>
>
> First off, I am not sure which list is correct. This is more of a
>
> serviceability issue, but implementation wise it fit hs-runtime
>
> better. I'll start with serviceability, but feel free crosspost if
>
> needed.
>
>
>
> Second, I am aware that this may require a JEP. If necessary and the
>
> feedback is positive, I will draft one.
>
>
>
> ----
>
>
>
> In our port we have something called "Statistics History". Basically
>
> this is a rolling history, spanning up to 10 days, of a number of key
>
> values. Key values range from JVM specifics like heap size, metaspace
>
> size, number of threads etc, to platform specifics like memory
>
> footprint, cpu load, io- and swapping activity etc.
>
>
>
> A periodic tasks collects those values, in - by default - 15 second
>
> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>
> memory that FIFO is downsampled in two steps, so we have the last n
>
> hours in high resolution and the last n days in low resolution (of
>
> course all these parameters are configurable).
>
>
>
> The history report can be triggered via jcmd, and also could get
>
> printed in the hs.err file (open for debate).
>
>
>
> ---
>
>
>
> Here some examples of how the whole thing looks like:
>
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>
>
>
> ---
>
>
>
> This feature has been really popular with our support folk over the
>
> years. Be it that the VM is starved for resources by the OS, that we
>
> have some slow- or fast developing leak situation etc: these values
>
> are a first and easy way to get a first stab at a situation, before we
>
> start more expensive analysis.
>
>
>
> The explicit design goal of this history was to be very cheap - cheap
>
> enough to be *always on* and getting forgotten. It is, in our port,
>
> enabled by default. That way, if a problem occurs at a customer site,
>
> we immediately see developments spanning the last 10 days, without
>
> having to reproduce the issue.
>
>
>
> It is also robust enough to be usable during error reporting without
>
> endangering the error reporting process or falsifying the picture.
>
>
>
> I am aware that this crosses over into JFR territory. But this feature
>
> does not attempt to replace JFR, it is intended instead a cheap always
>
> on first stop historical overview.
>
>
>
> --
>
>
>
> I have a patch which can be applied atop of jdk12:
>
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>
>
>
> It works, passes our nightlies and no regressions are shown in dapapo
>
> benchmarks.
>
>
>
> Please tell me what you think. Given enough interest, I will attempt
>
> to contribute (drafting a JEP if necessary.)
>
>
>
> Thanks and Kind Regards,
>
>
>
> Thomas
>
>

Roger Riggs

2018-11-15 16:40:17 UTC

Permalink

Hi,

This looks like it has significant overlap with JFR.
I don't think we want to start building in multiple mechanisms to keep
tabs on a running VM.

$.02, Roger

On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> Hi Bernd,
>
> On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
>> Looks good Thomas,
> thanks!
>
>> what would be the typical memory usage with the Default Settings?
> ~ 80 Kb. Its very small.
>
>> Does the downsampling support min/max style rollups?
> Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> such a feature would have to be added.
>
> Right now, downsampling is very primitive for performance reasons. For
> snapshot values like heap size etc we just throw away the samples, so
> you loose temporary peaks. For counter-like values-over-time (e.g.
> number of pages swapped in etc), they just refer then to a larger time
> span.
>
> Best Regards, Thomas
>
>>
>>
>> --
>> http://bernd.eckenfels.net
>>
>>
>>
>> Von: Thomas Stüfe
>> Gesendet: Mittwoch, 14. November 2018 16:29
>> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
>> Betreff: Proposal: Always-on Statistical History
>>
>>
>>
>> Hi all,
>>
>>
>>
>> We have that feature in our port which we would like to contribute,
>>
>> and I would like to gauge opinions.
>>
>>
>>
>> First off, I am not sure which list is correct. This is more of a
>>
>> serviceability issue, but implementation wise it fit hs-runtime
>>
>> better. I'll start with serviceability, but feel free crosspost if
>>
>> needed.
>>
>>
>>
>> Second, I am aware that this may require a JEP. If necessary and the
>>
>> feedback is positive, I will draft one.
>>
>>
>>
>> ----
>>
>>
>>
>> In our port we have something called "Statistics History". Basically
>>
>> this is a rolling history, spanning up to 10 days, of a number of key
>>
>> values. Key values range from JVM specifics like heap size, metaspace
>>
>> size, number of threads etc, to platform specifics like memory
>>
>> footprint, cpu load, io- and swapping activity etc.
>>
>>
>>
>> A periodic tasks collects those values, in - by default - 15 second
>>
>> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>>
>> memory that FIFO is downsampled in two steps, so we have the last n
>>
>> hours in high resolution and the last n days in low resolution (of
>>
>> course all these parameters are configurable).
>>
>>
>>
>> The history report can be triggered via jcmd, and also could get
>>
>> printed in the hs.err file (open for debate).
>>
>>
>>
>> ---
>>
>>
>>
>> Here some examples of how the whole thing looks like:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>>
>>
>>
>> ---
>>
>>
>>
>> This feature has been really popular with our support folk over the
>>
>> years. Be it that the VM is starved for resources by the OS, that we
>>
>> have some slow- or fast developing leak situation etc: these values
>>
>> are a first and easy way to get a first stab at a situation, before we
>>
>> start more expensive analysis.
>>
>>
>>
>> The explicit design goal of this history was to be very cheap - cheap
>>
>> enough to be *always on* and getting forgotten. It is, in our port,
>>
>> enabled by default. That way, if a problem occurs at a customer site,
>>
>> we immediately see developments spanning the last 10 days, without
>>
>> having to reproduce the issue.
>>
>>
>>
>> It is also robust enough to be usable during error reporting without
>>
>> endangering the error reporting process or falsifying the picture.
>>
>>
>>
>> I am aware that this crosses over into JFR territory. But this feature
>>
>> does not attempt to replace JFR, it is intended instead a cheap always
>>
>> on first stop historical overview.
>>
>>
>>
>> --
>>
>>
>>
>> I have a patch which can be applied atop of jdk12:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>>
>>
>>
>> It works, passes our nightlies and no regressions are shown in dapapo
>>
>> benchmarks.
>>
>>
>>
>> Please tell me what you think. Given enough interest, I will attempt
>>
>> to contribute (drafting a JEP if necessary.)
>>
>>
>>
>> Thanks and Kind Regards,
>>
>>
>>
>> Thomas
>>
>>

Simon Roberts

2018-11-15 17:07:52 UTC

Permalink

I don't begin to claim to know the politics, legalities, boundaries of JFR
license conditionsm and so forth" but:

Java Flight Recorder requires a commercial license for use in production."

Whereas, this as I understand is the *open* jdk list. So, I for one would
feel hard done by if your view prevailed and only the paying clients got
access to a valuable feature.

On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com> wrote:

> Hi,
>
> This looks like it has significant overlap with JFR.
> I don't think we want to start building in multiple mechanisms to keep
> tabs on a running VM.
>
> $.02, Roger
>
>
> On 11/14/2018 04:27 PM, Thomas StÃŒfe wrote:
> > Hi Bernd,
> >
> > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net>
> wrote:
> >> Looks good Thomas,
> > thanks!
> >
> >> what would be the typical memory usage with the Default Settings?
> > ~ 80 Kb. Its very small.
> >
> >> Does the downsampling support min/max style rollups?
> > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> > such a feature would have to be added.
> >
> > Right now, downsampling is very primitive for performance reasons. For
> > snapshot values like heap size etc we just throw away the samples, so
> > you loose temporary peaks. For counter-like values-over-time (e.g.
> > number of pages swapped in etc), they just refer then to a larger time
> > span.
> >
> > Best Regards, Thomas
> >
> >>
> >>
> >> --
> >> http://bernd.eckenfels.net
> >>
> >>
> >>
> >> Von: Thomas StÃŒfe
> >> Gesendet: Mittwoch, 14. November 2018 16:29
> >> An: serviceability-***@openjdk.java.net
> serviceability-***@openjdk.java.net
> >> Betreff: Proposal: Always-on Statistical History
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> We have that feature in our port which we would like to contribute,
> >>
> >> and I would like to gauge opinions.
> >>
> >>
> >>
> >> First off, I am not sure which list is correct. This is more of a
> >>
> >> serviceability issue, but implementation wise it fit hs-runtime
> >>
> >> better. I'll start with serviceability, but feel free crosspost if
> >>
> >> needed.
> >>
> >>
> >>
> >> Second, I am aware that this may require a JEP. If necessary and the
> >>
> >> feedback is positive, I will draft one.
> >>
> >>
> >>
> >> ----
> >>
> >>
> >>
> >> In our port we have something called "Statistics History". Basically
> >>
> >> this is a rolling history, spanning up to 10 days, of a number of key
> >>
> >> values. Key values range from JVM specifics like heap size, metaspace
> >>
> >> size, number of threads etc, to platform specifics like memory
> >>
> >> footprint, cpu load, io- and swapping activity etc.
> >>
> >>
> >>
> >> A periodic tasks collects those values, in - by default - 15 second
> >>
> >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> >>
> >> memory that FIFO is downsampled in two steps, so we have the last n
> >>
> >> hours in high resolution and the last n days in low resolution (of
> >>
> >> course all these parameters are configurable).
> >>
> >>
> >>
> >> The history report can be triggered via jcmd, and also could get
> >>
> >> printed in the hs.err file (open for debate).
> >>
> >>
> >>
> >> ---
> >>
> >>
> >>
> >> Here some examples of how the whole thing looks like:
> >>
> >>
> >>
> >>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
> >>
> >>
> >>
> >>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
> >>
> >>
> >>
> >> ---
> >>
> >>
> >>
> >> This feature has been really popular with our support folk over the
> >>
> >> years. Be it that the VM is starved for resources by the OS, that we
> >>
> >> have some slow- or fast developing leak situation etc: these values
> >>
> >> are a first and easy way to get a first stab at a situation, before we
> >>
> >> start more expensive analysis.
> >>
> >>
> >>
> >> The explicit design goal of this history was to be very cheap - cheap
> >>
> >> enough to be *always on* and getting forgotten. It is, in our port,
> >>
> >> enabled by default. That way, if a problem occurs at a customer site,
> >>
> >> we immediately see developments spanning the last 10 days, without
> >>
> >> having to reproduce the issue.
> >>
> >>
> >>
> >> It is also robust enough to be usable during error reporting without
> >>
> >> endangering the error reporting process or falsifying the picture.
> >>
> >>
> >>
> >> I am aware that this crosses over into JFR territory. But this feature
> >>
> >> does not attempt to replace JFR, it is intended instead a cheap always
> >>
> >> on first stop historical overview.
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >> I have a patch which can be applied atop of jdk12:
> >>
> >>
> >>
> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
> >>
> >>
> >>
> >> It works, passes our nightlies and no regressions are shown in dapapo
> >>
> >> benchmarks.
> >>
> >>
> >>
> >> Please tell me what you think. Given enough interest, I will attempt
> >>
> >> to contribute (drafting a JEP if necessary.)
> >>
> >>
> >>
> >> Thanks and Kind Regards,
> >>
> >>
> >>
> >> Thomas
> >>
> >>
>
>

--
Simon Roberts
(303) 249 3613

Marcus Hirt

2018-11-15 17:12:07 UTC

Permalink

JDK Flight Recorder is free, open sourced and part of OpenJDK 11+.

Kind regards,
Marcus

From: serviceability-dev <serviceability-dev-***@openjdk.java.net> on behalf of Simon Roberts <***@dancingcloudservices.com>
Date: Thursday, 15 November 2018 at 18:10
To: <***@oracle.com>
Cc: <serviceability-***@openjdk.java.net>
Subject: Re: Proposal: Always-on Statistical History

I don't begin to claim to know the politics, legalities, boundaries of JFR license conditionsm and so forth" but:

Java Flight Recorder requires a commercial license for use in production."

Whereas, this as I understand is the *open* jdk list. So, I for one would feel hard done by if your view prevailed and only the paying clients got access to a valuable feature.

On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com> wrote:

Hi,

This looks like it has significant overlap with JFR.
I don't think we want to start building in multiple mechanisms to keep
tabs on a running VM.

$.02, Roger

On 11/14/2018 04:27 PM, Thomas StÃŒfe wrote:
> Hi Bernd,
>
> On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
>> Looks good Thomas,
> thanks!
>
>> what would be the typical memory usage with the Default Settings?
> ~ 80 Kb. Its very small.
>
>> Does the downsampling support min/max style rollups?
> Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> such a feature would have to be added.
>
> Right now, downsampling is very primitive for performance reasons. For
> snapshot values like heap size etc we just throw away the samples, so
> you loose temporary peaks. For counter-like values-over-time (e.g.
> number of pages swapped in etc), they just refer then to a larger time
> span.
>
> Best Regards, Thomas
>
>>
>>
>> --
>> http://bernd.eckenfels.net
>>
>>
>>
>> Von: Thomas StÃŒfe
>> Gesendet: Mittwoch, 14. November 2018 16:29
>> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
>> Betreff: Proposal: Always-on Statistical History
>>
>>
>>
>> Hi all,
>>
>>
>>
>> We have that feature in our port which we would like to contribute,
>>
>> and I would like to gauge opinions.
>>
>>
>>
>> First off, I am not sure which list is correct. This is more of a
>>
>> serviceability issue, but implementation wise it fit hs-runtime
>>
>> better. I'll start with serviceability, but feel free crosspost if
>>
>> needed.
>>
>>
>>
>> Second, I am aware that this may require a JEP. If necessary and the
>>
>> feedback is positive, I will draft one.
>>
>>
>>
>> ----
>>
>>
>>
>> In our port we have something called "Statistics History". Basically
>>
>> this is a rolling history, spanning up to 10 days, of a number of key
>>
>> values. Key values range from JVM specifics like heap size, metaspace
>>
>> size, number of threads etc, to platform specifics like memory
>>
>> footprint, cpu load, io- and swapping activity etc.
>>
>>
>>
>> A periodic tasks collects those values, in - by default - 15 second
>>
>> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>>
>> memory that FIFO is downsampled in two steps, so we have the last n
>>
>> hours in high resolution and the last n days in low resolution (of
>>
>> course all these parameters are configurable).
>>
>>
>>
>> The history report can be triggered via jcmd, and also could get
>>
>> printed in the hs.err file (open for debate).
>>
>>
>>
>> ---
>>
>>
>>
>> Here some examples of how the whole thing looks like:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>>
>>
>>
>> ---
>>
>>
>>
>> This feature has been really popular with our support folk over the
>>
>> years. Be it that the VM is starved for resources by the OS, that we
>>
>> have some slow- or fast developing leak situation etc: these values
>>
>> are a first and easy way to get a first stab at a situation, before we
>>
>> start more expensive analysis.
>>
>>
>>
>> The explicit design goal of this history was to be very cheap - cheap
>>
>> enough to be *always on* and getting forgotten. It is, in our port,
>>
>> enabled by default. That way, if a problem occurs at a customer site,
>>
>> we immediately see developments spanning the last 10 days, without
>>
>> having to reproduce the issue.
>>
>>
>>
>> It is also robust enough to be usable during error reporting without
>>
>> endangering the error reporting process or falsifying the picture.
>>
>>
>>
>> I am aware that this crosses over into JFR territory. But this feature
>>
>> does not attempt to replace JFR, it is intended instead a cheap always
>>
>> on first stop historical overview.
>>
>>
>>
>> --
>>
>>
>>
>> I have a patch which can be applied atop of jdk12:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>>
>>
>>
>> It works, passes our nightlies and no regressions are shown in dapapo
>>
>> benchmarks.
>>
>>
>>
>> Please tell me what you think. Given enough interest, I will attempt
>>
>> to contribute (drafting a JEP if necessary.)
>>
>>
>>
>> Thanks and Kind Regards,
>>
>>
>>
>> Thomas
>>
>>

--

Simon Roberts

(303) 249 3613

Mario Torre

2018-11-15 18:55:54 UTC

Permalink

I agree with the others, and Flight Recorder is actually open sourced so the restrictions you mentioned dont apply anymore since Java 11.

That said, I want to study the proposal more, there may be something worth exploring that may be integrated in the current infrastructure.

Cheers,
Mario

Mario Torre
Associate Manager, Software Engineering
Red Hat GmbH
9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898

________________________________
From: serviceability-dev <serviceability-dev-***@openjdk.java.net> on behalf of Simon Roberts <***@dancingcloudservices.com>
Sent: Thursday, November 15, 2018 18:10
To: ***@oracle.com
Cc: serviceability-***@openjdk.java.net
Subject: Re: Proposal: Always-on Statistical History

I don't begin to claim to know the politics, legalities, boundaries of JFR license conditionsm and so forth" but:

Java Flight Recorder requires a commercial license for use in production."

Whereas, this as I understand is the *open* jdk list. So, I for one would feel hard done by if your view prevailed and only the paying clients got access to a valuable feature.

On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com<mailto:***@oracle.com>> wrote:
Hi,

This looks like it has significant overlap with JFR.
I don't think we want to start building in multiple mechanisms to keep
tabs on a running VM.

$.02, Roger

On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> Hi Bernd,
>
> On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net<mailto:***@zusammenkunft.net>> wrote:
>> Looks good Thomas,
> thanks!
>
>> what would be the typical memory usage with the Default Settings?
> ~ 80 Kb. Its very small.
>
>> Does the downsampling support min/max style rollups?
> Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> such a feature would have to be added.
>
> Right now, downsampling is very primitive for performance reasons. For
> snapshot values like heap size etc we just throw away the samples, so
> you loose temporary peaks. For counter-like values-over-time (e.g.
> number of pages swapped in etc), they just refer then to a larger time
> span.
>
> Best Regards, Thomas
>
>>
>>
>> --
>> http://bernd.eckenfels.net
>>
>>
>>
>> Von: Thomas Stüfe
>> Gesendet: Mittwoch, 14. November 2018 16:29
>> An: serviceability-***@openjdk.java.net<mailto:serviceability-***@openjdk.java.net> serviceability-***@openjdk.java.net<mailto:serviceability-***@openjdk.java.net>
>> Betreff: Proposal: Always-on Statistical History
>>
>>
>>
>> Hi all,
>>
>>
>>
>> We have that feature in our port which we would like to contribute,
>>
>> and I would like to gauge opinions.
>>
>>
>>
>> First off, I am not sure which list is correct. This is more of a
>>
>> serviceability issue, but implementation wise it fit hs-runtime
>>
>> better. I'll start with serviceability, but feel free crosspost if
>>
>> needed.
>>
>>
>>
>> Second, I am aware that this may require a JEP. If necessary and the
>>
>> feedback is positive, I will draft one.
>>
>>
>>
>> ----
>>
>>
>>
>> In our port we have something called "Statistics History". Basically
>>
>> this is a rolling history, spanning up to 10 days, of a number of key
>>
>> values. Key values range from JVM specifics like heap size, metaspace
>>
>> size, number of threads etc, to platform specifics like memory
>>
>> footprint, cpu load, io- and swapping activity etc.
>>
>>
>>
>> A periodic tasks collects those values, in - by default - 15 second
>>
>> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>>
>> memory that FIFO is downsampled in two steps, so we have the last n
>>
>> hours in high resolution and the last n days in low resolution (of
>>
>> course all these parameters are configurable).
>>
>>
>>
>> The history report can be triggered via jcmd, and also could get
>>
>> printed in the hs.err file (open for debate).
>>
>>
>>
>> ---
>>
>>
>>
>> Here some examples of how the whole thing looks like:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>>
>>
>>
>> ---
>>
>>
>>
>> This feature has been really popular with our support folk over the
>>
>> years. Be it that the VM is starved for resources by the OS, that we
>>
>> have some slow- or fast developing leak situation etc: these values
>>
>> are a first and easy way to get a first stab at a situation, before we
>>
>> start more expensive analysis.
>>
>>
>>
>> The explicit design goal of this history was to be very cheap - cheap
>>
>> enough to be *always on* and getting forgotten. It is, in our port,
>>
>> enabled by default. That way, if a problem occurs at a customer site,
>>
>> we immediately see developments spanning the last 10 days, without
>>
>> having to reproduce the issue.
>>
>>
>>
>> It is also robust enough to be usable during error reporting without
>>
>> endangering the error reporting process or falsifying the picture.
>>
>>
>>
>> I am aware that this crosses over into JFR territory. But this feature
>>
>> does not attempt to replace JFR, it is intended instead a cheap always
>>
>> on first stop historical overview.
>>
>>
>>
>> --
>>
>>
>>
>> I have a patch which can be applied atop of jdk12:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>>
>>
>>
>> It works, passes our nightlies and no regressions are shown in dapapo
>>
>> benchmarks.
>>
>>
>>
>> Please tell me what you think. Given enough interest, I will attempt
>>
>> to contribute (drafting a JEP if necessary.)
>>
>>
>>
>> Thanks and Kind Regards,
>>
>>
>>
>> Thomas
>>
>>

--
Simon Roberts
(303) 249 3613

Thomas Stüfe

2018-11-21 08:07:32 UTC

Permalink

Hi all,

(I combine my replies here, since most of your feedback was similar).

Thank you all for the feedback!

If I understand you correctly, the consensus is that you do not wish
to introduce another data collection backend beside JFR. And that,
should JFR lack features we miss, we rather improve JFR instead of
adding a second solution.

That makes sense. I understand this, and yes, I agree. So I withdraw
my proposal.

--

But I still would like to be able to do what I can do with my patch.
Since I withdraw it, I am curious how a feature like that would be
implemented in terms of JFR. Or whether JFR can do the same things out
of the box already. What I am looking for is:

1 Monitoring key values as described in the proposal - see the linked
example printouts - covering long time periods, at least short term
periods in high resolution (which means some sort of automatic
downsampling to keep memory cost low)

2 being able to leave that statistic always on. This part is really
important: our statistical history being on-by-default made a big
difference to our support, saving a lot of back-and-forth between
customers and us, and thus a lot of time and headache. In order to be
always-on, a replacement implementation should be really cheap and
robust. Note that we found our statistical history especially useful
in low-memory/cpu situations (e.g. in containers) - but there,
pressure to switch every non-essential feature off to shave off a bit
of memory cost is high. Being cheap really helps with these arguments.

3 very often we used our statistic during post-mortem analysis. It
gets dumped as part of the hs-err file in case of an error. For this
to be the monitoring must be robust and should have as little
dependencies into the VM as possible, to avoid circular errors in
error handling. Also, avoiding dynamic memory allocation and to
allocate its memory upfront, to harden it in the face of native OOMs.

Note that (3) is a bit of a stretch goal. Solutions which are not that
robust during error reporting can still be useful in many cases, but
now and then you will hit "Error during error reporting" instead of
getting the historical data.

But Erik indicated that JFR is routinely used in post mortem analysis
at Oracle. So maybe all my points are already fulfilled by the
existing implementation? If not, would it be possible to adapt JFR to
make such a statistic possible? I'm willing to help if JFR is the way
to go.

--

Note that I still think that there is some value in my proposed patch:
for older releases.

There, JFR/JMC does not exist, so this history feature could be really
useful. I was actually hoping to downport this feature to older
releases once it were to hit JDK12 mainline. But since we now decided
not to upstream it that door is barred.

So, in order to preserve this possibility at least to other downstream
OpenJDK maintainers, I put these patches (based on 8u/11u) up:
https://github.com/tstuefe/ojdk-stathist-patch . Maybe they are still
useful to someone.

Thank you all, and Best Regards,

Thomas
On Thu, Nov 15, 2018 at 7:56 PM Mario Torre <***@redhat.com> wrote:
>
> I agree with the others, and Flight Recorder is actually open sourced so the restrictions you mentioned don’t apply anymore since Java 11.
>
> That said, I want to study the proposal more, there may be something worth exploring that may be integrated in the current infrastructure.
>
> Cheers,
> Mario
>
> —
> Mario Torre
> Associate Manager, Software Engineering
> Red Hat GmbH
> 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898
>
> ________________________________
> From: serviceability-dev <serviceability-dev-***@openjdk.java.net> on behalf of Simon Roberts <***@dancingcloudservices.com>
> Sent: Thursday, November 15, 2018 18:10
> To: ***@oracle.com
> Cc: serviceability-***@openjdk.java.net
> Subject: Re: Proposal: Always-on Statistical History
>
> I don't begin to claim to know the politics, legalities, boundaries of JFR license conditionsm and so forth" but:
>
> Java Flight Recorder requires a commercial license for use in production."
>
> Whereas, this as I understand is the *open* jdk list. So, I for one would feel hard done by if your view prevailed and only the paying clients got access to a valuable feature.
>
>
> On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com> wrote:
>>
>> Hi,
>>
>> This looks like it has significant overlap with JFR.
>> I don't think we want to start building in multiple mechanisms to keep
>> tabs on a running VM.
>>
>> $.02, Roger
>>
>>
>> On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
>> > Hi Bernd,
>> >
>> > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
>> >> Looks good Thomas,
>> > thanks!
>> >
>> >> what would be the typical memory usage with the Default Settings?
>> > ~ 80 Kb. Its very small.
>> >
>> >> Does the downsampling support min/max style rollups?
>> > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
>> > such a feature would have to be added.
>> >
>> > Right now, downsampling is very primitive for performance reasons. For
>> > snapshot values like heap size etc we just throw away the samples, so
>> > you loose temporary peaks. For counter-like values-over-time (e.g.
>> > number of pages swapped in etc), they just refer then to a larger time
>> > span.
>> >
>> > Best Regards, Thomas
>> >
>> >>
>> >>
>> >> --
>> >> http://bernd.eckenfels.net
>> >>
>> >>
>> >>
>> >> Von: Thomas Stüfe
>> >> Gesendet: Mittwoch, 14. November 2018 16:29
>> >> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
>> >> Betreff: Proposal: Always-on Statistical History
>> >>
>> >>
>> >>
>> >> Hi all,
>> >>
>> >>
>> >>
>> >> We have that feature in our port which we would like to contribute,
>> >>
>> >> and I would like to gauge opinions.
>> >>
>> >>
>> >>
>> >> First off, I am not sure which list is correct. This is more of a
>> >>
>> >> serviceability issue, but implementation wise it fit hs-runtime
>> >>
>> >> better. I'll start with serviceability, but feel free crosspost if
>> >>
>> >> needed.
>> >>
>> >>
>> >>
>> >> Second, I am aware that this may require a JEP. If necessary and the
>> >>
>> >> feedback is positive, I will draft one.
>> >>
>> >>
>> >>
>> >> ----
>> >>
>> >>
>> >>
>> >> In our port we have something called "Statistics History". Basically
>> >>
>> >> this is a rolling history, spanning up to 10 days, of a number of key
>> >>
>> >> values. Key values range from JVM specifics like heap size, metaspace
>> >>
>> >> size, number of threads etc, to platform specifics like memory
>> >>
>> >> footprint, cpu load, io- and swapping activity etc.
>> >>
>> >>
>> >>
>> >> A periodic tasks collects those values, in - by default - 15 second
>> >>
>> >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>> >>
>> >> memory that FIFO is downsampled in two steps, so we have the last n
>> >>
>> >> hours in high resolution and the last n days in low resolution (of
>> >>
>> >> course all these parameters are configurable).
>> >>
>> >>
>> >>
>> >> The history report can be triggered via jcmd, and also could get
>> >>
>> >> printed in the hs.err file (open for debate).
>> >>
>> >>
>> >>
>> >> ---
>> >>
>> >>
>> >>
>> >> Here some examples of how the whole thing looks like:
>> >>
>> >>
>> >>
>> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>> >>
>> >>
>> >>
>> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>> >>
>> >>
>> >>
>> >> ---
>> >>
>> >>
>> >>
>> >> This feature has been really popular with our support folk over the
>> >>
>> >> years. Be it that the VM is starved for resources by the OS, that we
>> >>
>> >> have some slow- or fast developing leak situation etc: these values
>> >>
>> >> are a first and easy way to get a first stab at a situation, before we
>> >>
>> >> start more expensive analysis.
>> >>
>> >>
>> >>
>> >> The explicit design goal of this history was to be very cheap - cheap
>> >>
>> >> enough to be *always on* and getting forgotten. It is, in our port,
>> >>
>> >> enabled by default. That way, if a problem occurs at a customer site,
>> >>
>> >> we immediately see developments spanning the last 10 days, without
>> >>
>> >> having to reproduce the issue.
>> >>
>> >>
>> >>
>> >> It is also robust enough to be usable during error reporting without
>> >>
>> >> endangering the error reporting process or falsifying the picture.
>> >>
>> >>
>> >>
>> >> I am aware that this crosses over into JFR territory. But this feature
>> >>
>> >> does not attempt to replace JFR, it is intended instead a cheap always
>> >>
>> >> on first stop historical overview.
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >>
>> >>
>> >> I have a patch which can be applied atop of jdk12:
>> >>
>> >>
>> >>
>> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>> >>
>> >>
>> >>
>> >> It works, passes our nightlies and no regressions are shown in dapapo
>> >>
>> >> benchmarks.
>> >>
>> >>
>> >>
>> >> Please tell me what you think. Given enough interest, I will attempt
>> >>
>> >> to contribute (drafting a JEP if necessary.)
>> >>
>> >>
>> >>
>> >> Thanks and Kind Regards,
>> >>
>> >>
>> >>
>> >> Thomas
>> >>
>> >>
>>
>
>
> --
> Simon Roberts
> (303) 249 3613
>

Mario Torre

2018-11-21 08:18:22 UTC

Permalink

I didn't have time to read the patch fully, but just as a quick look,
it seems like you can create JFR events to log the same information,
JFR will take care of all the infrastructure.

So for example, every time you need to call set_value_in_record this
is would be instead a JFR. Again, I only did very quickly skim though
the patch, but looks like JFR is also more generic, you only need to
define the event and commit it, the framework does not need to know
the events in advance.

I do believe that JFR already has events for some of the stuff you are
logging, btw.

Cheers,
Mario
On Wed, Nov 21, 2018 at 9:07 AM Thomas Stüfe <***@gmail.com> wrote:
>
> Hi all,
>
> (I combine my replies here, since most of your feedback was similar).
>
> Thank you all for the feedback!
>
> If I understand you correctly, the consensus is that you do not wish
> to introduce another data collection backend beside JFR. And that,
> should JFR lack features we miss, we rather improve JFR instead of
> adding a second solution.
>
> That makes sense. I understand this, and yes, I agree. So I withdraw
> my proposal.
>
> --
>
> But I still would like to be able to do what I can do with my patch.
> Since I withdraw it, I am curious how a feature like that would be
> implemented in terms of JFR. Or whether JFR can do the same things out
> of the box already. What I am looking for is:
>
> 1 Monitoring key values as described in the proposal - see the linked
> example printouts - covering long time periods, at least short term
> periods in high resolution (which means some sort of automatic
> downsampling to keep memory cost low)
>
> 2 being able to leave that statistic always on. This part is really
> important: our statistical history being on-by-default made a big
> difference to our support, saving a lot of back-and-forth between
> customers and us, and thus a lot of time and headache. In order to be
> always-on, a replacement implementation should be really cheap and
> robust. Note that we found our statistical history especially useful
> in low-memory/cpu situations (e.g. in containers) - but there,
> pressure to switch every non-essential feature off to shave off a bit
> of memory cost is high. Being cheap really helps with these arguments.
>
> 3 very often we used our statistic during post-mortem analysis. It
> gets dumped as part of the hs-err file in case of an error. For this
> to be the monitoring must be robust and should have as little
> dependencies into the VM as possible, to avoid circular errors in
> error handling. Also, avoiding dynamic memory allocation and to
> allocate its memory upfront, to harden it in the face of native OOMs.
>
> Note that (3) is a bit of a stretch goal. Solutions which are not that
> robust during error reporting can still be useful in many cases, but
> now and then you will hit "Error during error reporting" instead of
> getting the historical data.
>
> But Erik indicated that JFR is routinely used in post mortem analysis
> at Oracle. So maybe all my points are already fulfilled by the
> existing implementation? If not, would it be possible to adapt JFR to
> make such a statistic possible? I'm willing to help if JFR is the way
> to go.
>
> --
>
> Note that I still think that there is some value in my proposed patch:
> for older releases.
>
> There, JFR/JMC does not exist, so this history feature could be really
> useful. I was actually hoping to downport this feature to older
> releases once it were to hit JDK12 mainline. But since we now decided
> not to upstream it that door is barred.
>
> So, in order to preserve this possibility at least to other downstream
> OpenJDK maintainers, I put these patches (based on 8u/11u) up:
> https://github.com/tstuefe/ojdk-stathist-patch . Maybe they are still
> useful to someone.
>
> Thank you all, and Best Regards,
>
> Thomas
> On Thu, Nov 15, 2018 at 7:56 PM Mario Torre <***@redhat.com> wrote:
> >
> > I agree with the others, and Flight Recorder is actually open sourced so the restrictions you mentioned don’t apply anymore since Java 11.
> >
> > That said, I want to study the proposal more, there may be something worth exploring that may be integrated in the current infrastructure.
> >
> > Cheers,
> > Mario
> >
> > —
> > Mario Torre
> > Associate Manager, Software Engineering
> > Red Hat GmbH
> > 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898
> >
> > ________________________________
> > From: serviceability-dev <serviceability-dev-***@openjdk.java.net> on behalf of Simon Roberts <***@dancingcloudservices.com>
> > Sent: Thursday, November 15, 2018 18:10
> > To: ***@oracle.com
> > Cc: serviceability-***@openjdk.java.net
> > Subject: Re: Proposal: Always-on Statistical History
> >
> > I don't begin to claim to know the politics, legalities, boundaries of JFR license conditionsm and so forth" but:
> >
> > Java Flight Recorder requires a commercial license for use in production."
> >
> > Whereas, this as I understand is the *open* jdk list. So, I for one would feel hard done by if your view prevailed and only the paying clients got access to a valuable feature.
> >
> >
> > On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com> wrote:
> >>
> >> Hi,
> >>
> >> This looks like it has significant overlap with JFR.
> >> I don't think we want to start building in multiple mechanisms to keep
> >> tabs on a running VM.
> >>
> >> $.02, Roger
> >>
> >>
> >> On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> >> > Hi Bernd,
> >> >
> >> > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
> >> >> Looks good Thomas,
> >> > thanks!
> >> >
> >> >> what would be the typical memory usage with the Default Settings?
> >> > ~ 80 Kb. Its very small.
> >> >
> >> >> Does the downsampling support min/max style rollups?
> >> > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> >> > such a feature would have to be added.
> >> >
> >> > Right now, downsampling is very primitive for performance reasons. For
> >> > snapshot values like heap size etc we just throw away the samples, so
> >> > you loose temporary peaks. For counter-like values-over-time (e.g.
> >> > number of pages swapped in etc), they just refer then to a larger time
> >> > span.
> >> >
> >> > Best Regards, Thomas
> >> >
> >> >>
> >> >>
> >> >> --
> >> >> http://bernd.eckenfels.net
> >> >>
> >> >>
> >> >>
> >> >> Von: Thomas Stüfe
> >> >> Gesendet: Mittwoch, 14. November 2018 16:29
> >> >> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
> >> >> Betreff: Proposal: Always-on Statistical History
> >> >>
> >> >>
> >> >>
> >> >> Hi all,
> >> >>
> >> >>
> >> >>
> >> >> We have that feature in our port which we would like to contribute,
> >> >>
> >> >> and I would like to gauge opinions.
> >> >>
> >> >>
> >> >>
> >> >> First off, I am not sure which list is correct. This is more of a
> >> >>
> >> >> serviceability issue, but implementation wise it fit hs-runtime
> >> >>
> >> >> better. I'll start with serviceability, but feel free crosspost if
> >> >>
> >> >> needed.
> >> >>
> >> >>
> >> >>
> >> >> Second, I am aware that this may require a JEP. If necessary and the
> >> >>
> >> >> feedback is positive, I will draft one.
> >> >>
> >> >>
> >> >>
> >> >> ----
> >> >>
> >> >>
> >> >>
> >> >> In our port we have something called "Statistics History". Basically
> >> >>
> >> >> this is a rolling history, spanning up to 10 days, of a number of key
> >> >>
> >> >> values. Key values range from JVM specifics like heap size, metaspace
> >> >>
> >> >> size, number of threads etc, to platform specifics like memory
> >> >>
> >> >> footprint, cpu load, io- and swapping activity etc.
> >> >>
> >> >>
> >> >>
> >> >> A periodic tasks collects those values, in - by default - 15 second
> >> >>
> >> >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> >> >>
> >> >> memory that FIFO is downsampled in two steps, so we have the last n
> >> >>
> >> >> hours in high resolution and the last n days in low resolution (of
> >> >>
> >> >> course all these parameters are configurable).
> >> >>
> >> >>
> >> >>
> >> >> The history report can be triggered via jcmd, and also could get
> >> >>
> >> >> printed in the hs.err file (open for debate).
> >> >>
> >> >>
> >> >>
> >> >> ---
> >> >>
> >> >>
> >> >>
> >> >> Here some examples of how the whole thing looks like:
> >> >>
> >> >>
> >> >>
> >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
> >> >>
> >> >>
> >> >>
> >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
> >> >>
> >> >>
> >> >>
> >> >> ---
> >> >>
> >> >>
> >> >>
> >> >> This feature has been really popular with our support folk over the
> >> >>
> >> >> years. Be it that the VM is starved for resources by the OS, that we
> >> >>
> >> >> have some slow- or fast developing leak situation etc: these values
> >> >>
> >> >> are a first and easy way to get a first stab at a situation, before we
> >> >>
> >> >> start more expensive analysis.
> >> >>
> >> >>
> >> >>
> >> >> The explicit design goal of this history was to be very cheap - cheap
> >> >>
> >> >> enough to be *always on* and getting forgotten. It is, in our port,
> >> >>
> >> >> enabled by default. That way, if a problem occurs at a customer site,
> >> >>
> >> >> we immediately see developments spanning the last 10 days, without
> >> >>
> >> >> having to reproduce the issue.
> >> >>
> >> >>
> >> >>
> >> >> It is also robust enough to be usable during error reporting without
> >> >>
> >> >> endangering the error reporting process or falsifying the picture.
> >> >>
> >> >>
> >> >>
> >> >> I am aware that this crosses over into JFR territory. But this feature
> >> >>
> >> >> does not attempt to replace JFR, it is intended instead a cheap always
> >> >>
> >> >> on first stop historical overview.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>
> >> >>
> >> >> I have a patch which can be applied atop of jdk12:
> >> >>
> >> >>
> >> >>
> >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
> >> >>
> >> >>
> >> >>
> >> >> It works, passes our nightlies and no regressions are shown in dapapo
> >> >>
> >> >> benchmarks.
> >> >>
> >> >>
> >> >>
> >> >> Please tell me what you think. Given enough interest, I will attempt
> >> >>
> >> >> to contribute (drafting a JEP if necessary.)
> >> >>
> >> >>
> >> >>
> >> >> Thanks and Kind Regards,
> >> >>
> >> >>
> >> >>
> >> >> Thomas
> >> >>
> >> >>
> >>
> >
> >
> > --
> > Simon Roberts
> > (303) 249 3613
> >

--
Mario Torre
Associate Manager, Software Engineering
Red Hat GmbH <https://www.redhat.com>
9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898

Thomas Stüfe

2018-11-21 13:17:09 UTC

Permalink

Thank you Mario. I will take a closer look at JFR.
On Wed, Nov 21, 2018 at 9:19 AM Mario Torre <***@redhat.com> wrote:
>
> I didn't have time to read the patch fully, but just as a quick look,
> it seems like you can create JFR events to log the same information,
> JFR will take care of all the infrastructure.
>
> So for example, every time you need to call set_value_in_record this
> is would be instead a JFR. Again, I only did very quickly skim though
> the patch, but looks like JFR is also more generic, you only need to
> define the event and commit it, the framework does not need to know
> the events in advance.
>
> I do believe that JFR already has events for some of the stuff you are
> logging, btw.
>
> Cheers,
> Mario
> On Wed, Nov 21, 2018 at 9:07 AM Thomas Stüfe <***@gmail.com> wrote:
> >
> > Hi all,
> >
> > (I combine my replies here, since most of your feedback was similar).
> >
> > Thank you all for the feedback!
> >
> > If I understand you correctly, the consensus is that you do not wish
> > to introduce another data collection backend beside JFR. And that,
> > should JFR lack features we miss, we rather improve JFR instead of
> > adding a second solution.
> >
> > That makes sense. I understand this, and yes, I agree. So I withdraw
> > my proposal.
> >
> > --
> >
> > But I still would like to be able to do what I can do with my patch.
> > Since I withdraw it, I am curious how a feature like that would be
> > implemented in terms of JFR. Or whether JFR can do the same things out
> > of the box already. What I am looking for is:
> >
> > 1 Monitoring key values as described in the proposal - see the linked
> > example printouts - covering long time periods, at least short term
> > periods in high resolution (which means some sort of automatic
> > downsampling to keep memory cost low)
> >
> > 2 being able to leave that statistic always on. This part is really
> > important: our statistical history being on-by-default made a big
> > difference to our support, saving a lot of back-and-forth between
> > customers and us, and thus a lot of time and headache. In order to be
> > always-on, a replacement implementation should be really cheap and
> > robust. Note that we found our statistical history especially useful
> > in low-memory/cpu situations (e.g. in containers) - but there,
> > pressure to switch every non-essential feature off to shave off a bit
> > of memory cost is high. Being cheap really helps with these arguments.
> >
> > 3 very often we used our statistic during post-mortem analysis. It
> > gets dumped as part of the hs-err file in case of an error. For this
> > to be the monitoring must be robust and should have as little
> > dependencies into the VM as possible, to avoid circular errors in
> > error handling. Also, avoiding dynamic memory allocation and to
> > allocate its memory upfront, to harden it in the face of native OOMs.
> >
> > Note that (3) is a bit of a stretch goal. Solutions which are not that
> > robust during error reporting can still be useful in many cases, but
> > now and then you will hit "Error during error reporting" instead of
> > getting the historical data.
> >
> > But Erik indicated that JFR is routinely used in post mortem analysis
> > at Oracle. So maybe all my points are already fulfilled by the
> > existing implementation? If not, would it be possible to adapt JFR to
> > make such a statistic possible? I'm willing to help if JFR is the way
> > to go.
> >
> > --
> >
> > Note that I still think that there is some value in my proposed patch:
> > for older releases.
> >
> > There, JFR/JMC does not exist, so this history feature could be really
> > useful. I was actually hoping to downport this feature to older
> > releases once it were to hit JDK12 mainline. But since we now decided
> > not to upstream it that door is barred.
> >
> > So, in order to preserve this possibility at least to other downstream
> > OpenJDK maintainers, I put these patches (based on 8u/11u) up:
> > https://github.com/tstuefe/ojdk-stathist-patch . Maybe they are still
> > useful to someone.
> >
> > Thank you all, and Best Regards,
> >
> > Thomas
> > On Thu, Nov 15, 2018 at 7:56 PM Mario Torre <***@redhat.com> wrote:
> > >
> > > I agree with the others, and Flight Recorder is actually open sourced so the restrictions you mentioned don’t apply anymore since Java 11.
> > >
> > > That said, I want to study the proposal more, there may be something worth exploring that may be integrated in the current infrastructure.
> > >
> > > Cheers,
> > > Mario
> > >
> > > —
> > > Mario Torre
> > > Associate Manager, Software Engineering
> > > Red Hat GmbH
> > > 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898
> > >
> > > ________________________________
> > > From: serviceability-dev <serviceability-dev-***@openjdk.java.net> on behalf of Simon Roberts <***@dancingcloudservices.com>
> > > Sent: Thursday, November 15, 2018 18:10
> > > To: ***@oracle.com
> > > Cc: serviceability-***@openjdk.java.net
> > > Subject: Re: Proposal: Always-on Statistical History
> > >
> > > I don't begin to claim to know the politics, legalities, boundaries of JFR license conditionsm and so forth" but:
> > >
> > > Java Flight Recorder requires a commercial license for use in production."
> > >
> > > Whereas, this as I understand is the *open* jdk list. So, I for one would feel hard done by if your view prevailed and only the paying clients got access to a valuable feature.
> > >
> > >
> > > On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> This looks like it has significant overlap with JFR.
> > >> I don't think we want to start building in multiple mechanisms to keep
> > >> tabs on a running VM.
> > >>
> > >> $.02, Roger
> > >>
> > >>
> > >> On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> > >> > Hi Bernd,
> > >> >
> > >> > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
> > >> >> Looks good Thomas,
> > >> > thanks!
> > >> >
> > >> >> what would be the typical memory usage with the Default Settings?
> > >> > ~ 80 Kb. Its very small.
> > >> >
> > >> >> Does the downsampling support min/max style rollups?
> > >> > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> > >> > such a feature would have to be added.
> > >> >
> > >> > Right now, downsampling is very primitive for performance reasons. For
> > >> > snapshot values like heap size etc we just throw away the samples, so
> > >> > you loose temporary peaks. For counter-like values-over-time (e.g.
> > >> > number of pages swapped in etc), they just refer then to a larger time
> > >> > span.
> > >> >
> > >> > Best Regards, Thomas
> > >> >
> > >> >>
> > >> >>
> > >> >> --
> > >> >> http://bernd.eckenfels.net
> > >> >>
> > >> >>
> > >> >>
> > >> >> Von: Thomas Stüfe
> > >> >> Gesendet: Mittwoch, 14. November 2018 16:29
> > >> >> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
> > >> >> Betreff: Proposal: Always-on Statistical History
> > >> >>
> > >> >>
> > >> >>
> > >> >> Hi all,
> > >> >>
> > >> >>
> > >> >>
> > >> >> We have that feature in our port which we would like to contribute,
> > >> >>
> > >> >> and I would like to gauge opinions.
> > >> >>
> > >> >>
> > >> >>
> > >> >> First off, I am not sure which list is correct. This is more of a
> > >> >>
> > >> >> serviceability issue, but implementation wise it fit hs-runtime
> > >> >>
> > >> >> better. I'll start with serviceability, but feel free crosspost if
> > >> >>
> > >> >> needed.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Second, I am aware that this may require a JEP. If necessary and the
> > >> >>
> > >> >> feedback is positive, I will draft one.
> > >> >>
> > >> >>
> > >> >>
> > >> >> ----
> > >> >>
> > >> >>
> > >> >>
> > >> >> In our port we have something called "Statistics History". Basically
> > >> >>
> > >> >> this is a rolling history, spanning up to 10 days, of a number of key
> > >> >>
> > >> >> values. Key values range from JVM specifics like heap size, metaspace
> > >> >>
> > >> >> size, number of threads etc, to platform specifics like memory
> > >> >>
> > >> >> footprint, cpu load, io- and swapping activity etc.
> > >> >>
> > >> >>
> > >> >>
> > >> >> A periodic tasks collects those values, in - by default - 15 second
> > >> >>
> > >> >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> > >> >>
> > >> >> memory that FIFO is downsampled in two steps, so we have the last n
> > >> >>
> > >> >> hours in high resolution and the last n days in low resolution (of
> > >> >>
> > >> >> course all these parameters are configurable).
> > >> >>
> > >> >>
> > >> >>
> > >> >> The history report can be triggered via jcmd, and also could get
> > >> >>
> > >> >> printed in the hs.err file (open for debate).
> > >> >>
> > >> >>
> > >> >>
> > >> >> ---
> > >> >>
> > >> >>
> > >> >>
> > >> >> Here some examples of how the whole thing looks like:
> > >> >>
> > >> >>
> > >> >>
> > >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
> > >> >>
> > >> >>
> > >> >>
> > >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
> > >> >>
> > >> >>
> > >> >>
> > >> >> ---
> > >> >>
> > >> >>
> > >> >>
> > >> >> This feature has been really popular with our support folk over the
> > >> >>
> > >> >> years. Be it that the VM is starved for resources by the OS, that we
> > >> >>
> > >> >> have some slow- or fast developing leak situation etc: these values
> > >> >>
> > >> >> are a first and easy way to get a first stab at a situation, before we
> > >> >>
> > >> >> start more expensive analysis.
> > >> >>
> > >> >>
> > >> >>
> > >> >> The explicit design goal of this history was to be very cheap - cheap
> > >> >>
> > >> >> enough to be *always on* and getting forgotten. It is, in our port,
> > >> >>
> > >> >> enabled by default. That way, if a problem occurs at a customer site,
> > >> >>
> > >> >> we immediately see developments spanning the last 10 days, without
> > >> >>
> > >> >> having to reproduce the issue.
> > >> >>
> > >> >>
> > >> >>
> > >> >> It is also robust enough to be usable during error reporting without
> > >> >>
> > >> >> endangering the error reporting process or falsifying the picture.
> > >> >>
> > >> >>
> > >> >>
> > >> >> I am aware that this crosses over into JFR territory. But this feature
> > >> >>
> > >> >> does not attempt to replace JFR, it is intended instead a cheap always
> > >> >>
> > >> >> on first stop historical overview.
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >>
> > >> >>
> > >> >>
> > >> >> I have a patch which can be applied atop of jdk12:
> > >> >>
> > >> >>
> > >> >>
> > >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
> > >> >>
> > >> >>
> > >> >>
> > >> >> It works, passes our nightlies and no regressions are shown in dapapo
> > >> >>
> > >> >> benchmarks.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Please tell me what you think. Given enough interest, I will attempt
> > >> >>
> > >> >> to contribute (drafting a JEP if necessary.)
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thanks and Kind Regards,
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thomas
> > >> >>
> > >> >>
> > >>
> > >
> > >
> > > --
> > > Simon Roberts
> > > (303) 249 3613
> > >
>
>
>
> --
> Mario Torre
> Associate Manager, Software Engineering
> Red Hat GmbH <https://www.redhat.com>
> 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898

Kirk Pepperdine

2018-11-15 21:37:40 UTC

Permalink

This was true up until Oracle open sourced it (JDK 11).

If JFR is the framework around which we decide to get these types of metrics from the JVM (in addition to JMX), then I think that we (the community) should continue to build out JFR adding in those metrics that are not already captured.

Kind regards,
Kirk Pepperdine

> On Nov 15, 2018, at 9:07 AM, Simon Roberts <***@dancingcloudservices.com> wrote:
>
> I don't begin to claim to know the politics, legalities, boundaries of JFR license conditionsm and so forth" but:
>
> Java Flight Recorder requires a commercial license for use in production."
>
> Whereas, this as I understand is the *open* jdk list. So, I for one would feel hard done by if your view prevailed and only the paying clients got access to a valuable feature.
>
>
> On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com <mailto:***@oracle.com>> wrote:
> Hi,
>
> This looks like it has significant overlap with JFR.
> I don't think we want to start building in multiple mechanisms to keep
> tabs on a running VM.
>
> $.02, Roger
>
>
> On 11/14/2018 04:27 PM, Thomas StÃŒfe wrote:
> > Hi Bernd,
> >
> > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net <mailto:***@zusammenkunft.net>> wrote:
> >> Looks good Thomas,
> > thanks!
> >
> >> what would be the typical memory usage with the Default Settings?
> > ~ 80 Kb. Its very small.
> >
> >> Does the downsampling support min/max style rollups?
> > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> > such a feature would have to be added.
> >
> > Right now, downsampling is very primitive for performance reasons. For
> > snapshot values like heap size etc we just throw away the samples, so
> > you loose temporary peaks. For counter-like values-over-time (e.g.
> > number of pages swapped in etc), they just refer then to a larger time
> > span.
> >
> > Best Regards, Thomas
> >
> >>
> >>
> >> --
> >> http://bernd.eckenfels.net <http://bernd.eckenfels.net/>
> >>
> >>
> >>
> >> Von: Thomas StÃŒfe
> >> Gesendet: Mittwoch, 14. November 2018 16:29
> >> An: serviceability-***@openjdk.java.net <mailto:serviceability-***@openjdk.java.net> serviceability-***@openjdk.java.net <mailto:serviceability-***@openjdk.java.net>
> >> Betreff: Proposal: Always-on Statistical History
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> We have that feature in our port which we would like to contribute,
> >>
> >> and I would like to gauge opinions.
> >>
> >>
> >>
> >> First off, I am not sure which list is correct. This is more of a
> >>
> >> serviceability issue, but implementation wise it fit hs-runtime
> >>
> >> better. I'll start with serviceability, but feel free crosspost if
> >>
> >> needed.
> >>
> >>
> >>
> >> Second, I am aware that this may require a JEP. If necessary and the
> >>
> >> feedback is positive, I will draft one.
> >>
> >>
> >>
> >> ----
> >>
> >>
> >>
> >> In our port we have something called "Statistics History". Basically
> >>
> >> this is a rolling history, spanning up to 10 days, of a number of key
> >>
> >> values. Key values range from JVM specifics like heap size, metaspace
> >>
> >> size, number of threads etc, to platform specifics like memory
> >>
> >> footprint, cpu load, io- and swapping activity etc.
> >>
> >>
> >>
> >> A periodic tasks collects those values, in - by default - 15 second
> >>
> >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> >>
> >> memory that FIFO is downsampled in two steps, so we have the last n
> >>
> >> hours in high resolution and the last n days in low resolution (of
> >>
> >> course all these parameters are configurable).
> >>
> >>
> >>
> >> The history report can be triggered via jcmd, and also could get
> >>
> >> printed in the hs.err file (open for debate).
> >>
> >>
> >>
> >> ---
> >>
> >>
> >>
> >> Here some examples of how the whole thing looks like:
> >>
> >>
> >>
> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt <http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt>
> >>
> >>
> >>
> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt <http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt>
> >>
> >>
> >>
> >> ---
> >>
> >>
> >>
> >> This feature has been really popular with our support folk over the
> >>
> >> years. Be it that the VM is starved for resources by the OS, that we
> >>
> >> have some slow- or fast developing leak situation etc: these values
> >>
> >> are a first and easy way to get a first stab at a situation, before we
> >>
> >> start more expensive analysis.
> >>
> >>
> >>
> >> The explicit design goal of this history was to be very cheap - cheap
> >>
> >> enough to be *always on* and getting forgotten. It is, in our port,
> >>
> >> enabled by default. That way, if a problem occurs at a customer site,
> >>
> >> we immediately see developments spanning the last 10 days, without
> >>
> >> having to reproduce the issue.
> >>
> >>
> >>
> >> It is also robust enough to be usable during error reporting without
> >>
> >> endangering the error reporting process or falsifying the picture.
> >>
> >>
> >>
> >> I am aware that this crosses over into JFR territory. But this feature
> >>
> >> does not attempt to replace JFR, it is intended instead a cheap always
> >>
> >> on first stop historical overview.
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >> I have a patch which can be applied atop of jdk12:
> >>
> >>
> >>
> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch <http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch>
> >>
> >>
> >>
> >> It works, passes our nightlies and no regressions are shown in dapapo
> >>
> >> benchmarks.
> >>
> >>
> >>
> >> Please tell me what you think. Given enough interest, I will attempt
> >>
> >> to contribute (drafting a JEP if necessary.)
> >>
> >>
> >>
> >> Thanks and Kind Regards,
> >>
> >>
> >>
> >> Thomas
> >>
> >>
>
>
>
> --
> Simon Roberts
> (303) 249 3613
>

Marcus Hirt

2018-11-15 17:10:17 UTC

Permalink

Hi all,

I'm with Roger on this one. This is an aggregation mechanism. If we want such
an aggregation mechanism, we should probably build one into one of the already
available serviceability technologies (JMX and/or JFR). If we feel the need to
introduce a generic one that can source data from multiple serviceability
technologies (even though that smells a bit of user application/agent code),
then it should integrate well with already available serviceability
technologies (perhaps sourcing the upcoming streaming JFR and/or JMX), come
with an API to interact with it, and be general, configurable and extensible.
Either way, I think this requires more thought.

Another $.02.

Kind regards,
Marcus

On 2018-11-15, 17:42, "serviceability-dev on behalf of Roger Riggs" <serviceability-dev-***@openjdk.java.net on behalf of ***@oracle.com> wrote:

Hi,

This looks like it has significant overlap with JFR.
I don't think we want to start building in multiple mechanisms to keep
tabs on a running VM.

$.02, Roger

On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> Hi Bernd,
>
> On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
>> Looks good Thomas,
> thanks!
>
>> what would be the typical memory usage with the Default Settings?
> ~ 80 Kb. Its very small.
>
>> Does the downsampling support min/max style rollups?
> Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> such a feature would have to be added.
>
> Right now, downsampling is very primitive for performance reasons. For
> snapshot values like heap size etc we just throw away the samples, so
> you loose temporary peaks. For counter-like values-over-time (e.g.
> number of pages swapped in etc), they just refer then to a larger time
> span.
>
> Best Regards, Thomas
>
>>
>>
>> --
>> http://bernd.eckenfels.net
>>
>>
>>
>> Von: Thomas Stüfe
>> Gesendet: Mittwoch, 14. November 2018 16:29
>> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
>> Betreff: Proposal: Always-on Statistical History
>>
>>
>>
>> Hi all,
>>
>>
>>
>> We have that feature in our port which we would like to contribute,
>>
>> and I would like to gauge opinions.
>>
>>
>>
>> First off, I am not sure which list is correct. This is more of a
>>
>> serviceability issue, but implementation wise it fit hs-runtime
>>
>> better. I'll start with serviceability, but feel free crosspost if
>>
>> needed.
>>
>>
>>
>> Second, I am aware that this may require a JEP. If necessary and the
>>
>> feedback is positive, I will draft one.
>>
>>
>>
>> ----
>>
>>
>>
>> In our port we have something called "Statistics History". Basically
>>
>> this is a rolling history, spanning up to 10 days, of a number of key
>>
>> values. Key values range from JVM specifics like heap size, metaspace
>>
>> size, number of threads etc, to platform specifics like memory
>>
>> footprint, cpu load, io- and swapping activity etc.
>>
>>
>>
>> A periodic tasks collects those values, in - by default - 15 second
>>
>> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>>
>> memory that FIFO is downsampled in two steps, so we have the last n
>>
>> hours in high resolution and the last n days in low resolution (of
>>
>> course all these parameters are configurable).
>>
>>
>>
>> The history report can be triggered via jcmd, and also could get
>>
>> printed in the hs.err file (open for debate).
>>
>>
>>
>> ---
>>
>>
>>
>> Here some examples of how the whole thing looks like:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>>
>>
>>
>> ---
>>
>>
>>
>> This feature has been really popular with our support folk over the
>>
>> years. Be it that the VM is starved for resources by the OS, that we
>>
>> have some slow- or fast developing leak situation etc: these values
>>
>> are a first and easy way to get a first stab at a situation, before we
>>
>> start more expensive analysis.
>>
>>
>>
>> The explicit design goal of this history was to be very cheap - cheap
>>
>> enough to be *always on* and getting forgotten. It is, in our port,
>>
>> enabled by default. That way, if a problem occurs at a customer site,
>>
>> we immediately see developments spanning the last 10 days, without
>>
>> having to reproduce the issue.
>>
>>
>>
>> It is also robust enough to be usable during error reporting without
>>
>> endangering the error reporting process or falsifying the picture.
>>
>>
>>
>> I am aware that this crosses over into JFR territory. But this feature
>>
>> does not attempt to replace JFR, it is intended instead a cheap always
>>
>> on first stop historical overview.
>>
>>
>>
>> --
>>
>>
>>
>> I have a patch which can be applied atop of jdk12:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>>
>>
>>
>> It works, passes our nightlies and no regressions are shown in dapapo
>>
>> benchmarks.
>>
>>
>>
>> Please tell me what you think. Given enough interest, I will attempt
>>
>> to contribute (drafting a JEP if necessary.)
>>
>>
>>
>> Thanks and Kind Regards,
>>
>>
>>
>> Thomas
>>
>>