Thank you Mario. I will take a closer look at JFR.
On Wed, Nov 21, 2018 at 9:19 AM Mario Torre <***@redhat.com> wrote:
>
> I didn't have time to read the patch fully, but just as a quick look,
> it seems like you can create JFR events to log the same information,
> JFR will take care of all the infrastructure.
>
> So for example, every time you need to call set_value_in_record this
> is would be instead a JFR. Again, I only did very quickly skim though
> the patch, but looks like JFR is also more generic, you only need to
> define the event and commit it, the framework does not need to know
> the events in advance.
>
> I do believe that JFR already has events for some of the stuff you are
> logging, btw.
>
> Cheers,
> Mario
> On Wed, Nov 21, 2018 at 9:07 AM Thomas Stüfe <***@gmail.com> wrote:
> >
> > Hi all,
> >
> > (I combine my replies here, since most of your feedback was similar).
> >
> > Thank you all for the feedback!
> >
> > If I understand you correctly, the consensus is that you do not wish
> > to introduce another data collection backend beside JFR. And that,
> > should JFR lack features we miss, we rather improve JFR instead of
> > adding a second solution.
> >
> > That makes sense. I understand this, and yes, I agree. So I withdraw
> > my proposal.
> >
> > --
> >
> > But I still would like to be able to do what I can do with my patch.
> > Since I withdraw it, I am curious how a feature like that would be
> > implemented in terms of JFR. Or whether JFR can do the same things out
> > of the box already. What I am looking for is:
> >
> > 1 Monitoring key values as described in the proposal - see the linked
> > example printouts - covering long time periods, at least short term
> > periods in high resolution (which means some sort of automatic
> > downsampling to keep memory cost low)
> >
> > 2 being able to leave that statistic always on. This part is really
> > important: our statistical history being on-by-default made a big
> > difference to our support, saving a lot of back-and-forth between
> > customers and us, and thus a lot of time and headache. In order to be
> > always-on, a replacement implementation should be really cheap and
> > robust. Note that we found our statistical history especially useful
> > in low-memory/cpu situations (e.g. in containers) - but there,
> > pressure to switch every non-essential feature off to shave off a bit
> > of memory cost is high. Being cheap really helps with these arguments.
> >
> > 3 very often we used our statistic during post-mortem analysis. It
> > gets dumped as part of the hs-err file in case of an error. For this
> > to be the monitoring must be robust and should have as little
> > dependencies into the VM as possible, to avoid circular errors in
> > error handling. Also, avoiding dynamic memory allocation and to
> > allocate its memory upfront, to harden it in the face of native OOMs.
> >
> > Note that (3) is a bit of a stretch goal. Solutions which are not that
> > robust during error reporting can still be useful in many cases, but
> > now and then you will hit "Error during error reporting" instead of
> > getting the historical data.
> >
> > But Erik indicated that JFR is routinely used in post mortem analysis
> > at Oracle. So maybe all my points are already fulfilled by the
> > existing implementation? If not, would it be possible to adapt JFR to
> > make such a statistic possible? I'm willing to help if JFR is the way
> > to go.
> >
> > --
> >
> > Note that I still think that there is some value in my proposed patch:
> > for older releases.
> >
> > There, JFR/JMC does not exist, so this history feature could be really
> > useful. I was actually hoping to downport this feature to older
> > releases once it were to hit JDK12 mainline. But since we now decided
> > not to upstream it that door is barred.
> >
> > So, in order to preserve this possibility at least to other downstream
> > OpenJDK maintainers, I put these patches (based on 8u/11u) up:
> > https://github.com/tstuefe/ojdk-stathist-patch . Maybe they are still
> > useful to someone.
> >
> > Thank you all, and Best Regards,
> >
> > Thomas
> > On Thu, Nov 15, 2018 at 7:56 PM Mario Torre <***@redhat.com> wrote:
> > >
> > > I agree with the others, and Flight Recorder is actually open sourced so the restrictions you mentioned don’t apply anymore since Java 11.
> > >
> > > That said, I want to study the proposal more, there may be something worth exploring that may be integrated in the current infrastructure.
> > >
> > > Cheers,
> > > Mario
> > >
> > > —
> > > Mario Torre
> > > Associate Manager, Software Engineering
> > > Red Hat GmbH
> > > 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898
> > >
> > > ________________________________
> > > From: serviceability-dev <serviceability-dev-***@openjdk.java.net> on behalf of Simon Roberts <***@dancingcloudservices.com>
> > > Sent: Thursday, November 15, 2018 18:10
> > > To: ***@oracle.com
> > > Cc: serviceability-***@openjdk.java.net
> > > Subject: Re: Proposal: Always-on Statistical History
> > >
> > > I don't begin to claim to know the politics, legalities, boundaries of JFR license conditionsm and so forth" but:
> > >
> > > Java Flight Recorder requires a commercial license for use in production."
> > >
> > > Whereas, this as I understand is the *open* jdk list. So, I for one would feel hard done by if your view prevailed and only the paying clients got access to a valuable feature.
> > >
> > >
> > > On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <***@oracle.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> This looks like it has significant overlap with JFR.
> > >> I don't think we want to start building in multiple mechanisms to keep
> > >> tabs on a running VM.
> > >>
> > >> $.02, Roger
> > >>
> > >>
> > >> On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> > >> > Hi Bernd,
> > >> >
> > >> > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <***@zusammenkunft.net> wrote:
> > >> >> Looks good Thomas,
> > >> > thanks!
> > >> >
> > >> >> what would be the typical memory usage with the Default Settings?
> > >> > ~ 80 Kb. Its very small.
> > >> >
> > >> >> Does the downsampling support min/max style rollups?
> > >> > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> > >> > such a feature would have to be added.
> > >> >
> > >> > Right now, downsampling is very primitive for performance reasons. For
> > >> > snapshot values like heap size etc we just throw away the samples, so
> > >> > you loose temporary peaks. For counter-like values-over-time (e.g.
> > >> > number of pages swapped in etc), they just refer then to a larger time
> > >> > span.
> > >> >
> > >> > Best Regards, Thomas
> > >> >
> > >> >>
> > >> >>
> > >> >> --
> > >> >> http://bernd.eckenfels.net
> > >> >>
> > >> >>
> > >> >>
> > >> >> Von: Thomas Stüfe
> > >> >> Gesendet: Mittwoch, 14. November 2018 16:29
> > >> >> An: serviceability-***@openjdk.java.net serviceability-***@openjdk.java.net
> > >> >> Betreff: Proposal: Always-on Statistical History
> > >> >>
> > >> >>
> > >> >>
> > >> >> Hi all,
> > >> >>
> > >> >>
> > >> >>
> > >> >> We have that feature in our port which we would like to contribute,
> > >> >>
> > >> >> and I would like to gauge opinions.
> > >> >>
> > >> >>
> > >> >>
> > >> >> First off, I am not sure which list is correct. This is more of a
> > >> >>
> > >> >> serviceability issue, but implementation wise it fit hs-runtime
> > >> >>
> > >> >> better. I'll start with serviceability, but feel free crosspost if
> > >> >>
> > >> >> needed.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Second, I am aware that this may require a JEP. If necessary and the
> > >> >>
> > >> >> feedback is positive, I will draft one.
> > >> >>
> > >> >>
> > >> >>
> > >> >> ----
> > >> >>
> > >> >>
> > >> >>
> > >> >> In our port we have something called "Statistics History". Basically
> > >> >>
> > >> >> this is a rolling history, spanning up to 10 days, of a number of key
> > >> >>
> > >> >> values. Key values range from JVM specifics like heap size, metaspace
> > >> >>
> > >> >> size, number of threads etc, to platform specifics like memory
> > >> >>
> > >> >> footprint, cpu load, io- and swapping activity etc.
> > >> >>
> > >> >>
> > >> >>
> > >> >> A periodic tasks collects those values, in - by default - 15 second
> > >> >>
> > >> >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> > >> >>
> > >> >> memory that FIFO is downsampled in two steps, so we have the last n
> > >> >>
> > >> >> hours in high resolution and the last n days in low resolution (of
> > >> >>
> > >> >> course all these parameters are configurable).
> > >> >>
> > >> >>
> > >> >>
> > >> >> The history report can be triggered via jcmd, and also could get
> > >> >>
> > >> >> printed in the hs.err file (open for debate).
> > >> >>
> > >> >>
> > >> >>
> > >> >> ---
> > >> >>
> > >> >>
> > >> >>
> > >> >> Here some examples of how the whole thing looks like:
> > >> >>
> > >> >>
> > >> >>
> > >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
> > >> >>
> > >> >>
> > >> >>
> > >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
> > >> >>
> > >> >>
> > >> >>
> > >> >> ---
> > >> >>
> > >> >>
> > >> >>
> > >> >> This feature has been really popular with our support folk over the
> > >> >>
> > >> >> years. Be it that the VM is starved for resources by the OS, that we
> > >> >>
> > >> >> have some slow- or fast developing leak situation etc: these values
> > >> >>
> > >> >> are a first and easy way to get a first stab at a situation, before we
> > >> >>
> > >> >> start more expensive analysis.
> > >> >>
> > >> >>
> > >> >>
> > >> >> The explicit design goal of this history was to be very cheap - cheap
> > >> >>
> > >> >> enough to be *always on* and getting forgotten. It is, in our port,
> > >> >>
> > >> >> enabled by default. That way, if a problem occurs at a customer site,
> > >> >>
> > >> >> we immediately see developments spanning the last 10 days, without
> > >> >>
> > >> >> having to reproduce the issue.
> > >> >>
> > >> >>
> > >> >>
> > >> >> It is also robust enough to be usable during error reporting without
> > >> >>
> > >> >> endangering the error reporting process or falsifying the picture.
> > >> >>
> > >> >>
> > >> >>
> > >> >> I am aware that this crosses over into JFR territory. But this feature
> > >> >>
> > >> >> does not attempt to replace JFR, it is intended instead a cheap always
> > >> >>
> > >> >> on first stop historical overview.
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >>
> > >> >>
> > >> >>
> > >> >> I have a patch which can be applied atop of jdk12:
> > >> >>
> > >> >>
> > >> >>
> > >> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
> > >> >>
> > >> >>
> > >> >>
> > >> >> It works, passes our nightlies and no regressions are shown in dapapo
> > >> >>
> > >> >> benchmarks.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Please tell me what you think. Given enough interest, I will attempt
> > >> >>
> > >> >> to contribute (drafting a JEP if necessary.)
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thanks and Kind Regards,
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thomas
> > >> >>
> > >> >>
> > >>
> > >
> > >
> > > --
> > > Simon Roberts
> > > (303) 249 3613
> > >
>
>
>
> --
> Mario Torre
> Associate Manager, Software Engineering
> Red Hat GmbH <https://www.redhat.com>
> 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898