This repository contains a set of obfuscated sample datasets, as well as example analysis notebooks that can be used to quickly get started exploring and interacting with the Operate First Jupyterhub's operations data. We are also working on updating the notebooks to make it easy for users to pull additional data themselves.
The data provided (at the moment) contains metrics, logs, and events records from the Jupyterhub application on the operate first cluster on the Massachusetts Open Cloud (MOC). This is an Open Cloud environment, with reproducibility built-in, operated by a community with the goal of operating software in a production-grade environment.
The metrics dataset consists of five seperate timeseries, each broken into their own csv file. Each metric contains around 1 days worth of data sampled every few minutes.
The logs dataset includes roughly 1 day of infrastructure logs stored as a single json.
The events dataset was collected by a cluster admin and contains cluster wide events for Jupyterhub application for a day.