-
Notifications
You must be signed in to change notification settings - Fork 7
Are we retaining and amassing/accumulating all Telemetry json files from past runs? #265
Comments
Yes, it appears that we're not wiping out the workspace before every build. I can't locate our job DSL job in Jenkins, which we run to (re)generate the apt jobs... we should just need to update this, but as I said, I can't locate it... |
If it's been deleted then that would be unfortunate. I think it was named wpt.jobdsl or similar. Perhaps we can look on the file system and recover it? |
Another option would be to attempt to delete any telemetry JSON files before we run the tests. Without the job DSL job though we won't be able to tweak the top 50 and regenerate the jobs. |
@davehunt thanks; I don't think I'd delete that job, but perhaps I (stupidly) did, or a forced restart of the server restored the jobs from an earlier backup? I dunno. I looked tonight, but will need to look again tomorrow, perhaps with @oremj's help?
|
I've restored this using the Job Config History Plugin (!):
|
Sigh, thanks for the help so far, and I'll need further help in sorting this out. Looks like calling https://qa-preprod-master.fxtest.jenkins.stage.mozaws.net/job/wpt/job/twitch/872/artifact/ Attaching post-merge screenshot of Twitch's latest run, with a bazillion wpt-telemetry*.json files around in its workspace :-( The issue is likely: it doesn't look like we're cleaning past files up, but that might be due to Firefox crashing, and thus, the wpt build failing at a step where there's no cleanup -- or not the correct type/level. (So, look into failed(), success(), always(), Pipeline-stage robustness, etc., as well as selective before-SCM-checkout cleaning up.) |
It looks like there's a |
We're using Jenkins 2.73.3.1, which was released in 2017-11-08(!) |
@stephendonner originally I thought we could modify the wpt-job-dsl job to wipe of the workspace and force clone, instead of using the |
@davehunt yup; one of my runs (which deleted the workspace at the beginning, before SCM checkout) ended up not having -- as you've observed/noted -- the Dockerfile from that workspace, for starters. I've had #206 open and it seems like "soon" is a good time to smartly pay attention to the status codes we get returned from the API. In the case of these crashes (Vimeo, Twitch, Yandex), we should, at the least:
Note to self: we should reference both https://sites.google.com/a/webpagetest.org/docs/advanced-features/webpagetest-restful-apis and Marcel's NodeJS wrapper, to see what fits for us: https://github.com/marcelduran/webpagetest-api. |
Looking at https://qa-preprod-master.fxtest.jenkins.stage.mozaws.net/job/wpt/job/vimeo/601/console, are we really keeping all these files? (Found while investigating the dataloss over in #264...)
If indeed we're keeping these, I wonder/think that we might be a big cause of some qa-preprod out-of-disk-space (usually blamed on Docker, by me) issues.
/cc @davehunt
The text was updated successfully, but these errors were encountered: