Python 3 base dockerfile 1.3.0
themarcelor
released this
12 Aug 20:02
·
1009 commits
to master
since this release
uc-cdis/cloud-automation
New Features
- RDS cluster autoscaling can now be enabled in terraform by just setting the
variables to their desired value (#1362) - Added default encryption to rds databases. Also changed default size to
t2.small because encryption is not available for t2.micro instances. (#1346) - make mariner available for dev and qa testing (#1352)
- Move data replicate jobs from
https://github.com/uc-cdis/dcf-datareplicate/jobs to cloud-automation
(#1335) - Ignore changes if the data-upload-bucket has cors_rules, (#1331)
- Added option to assume role and refactored code (#1327)
- Add auspice service's .yaml file (#1319)
- For CSOC attached commons, logs will now be sent over onto logDNA (#1324)
- Added other run option to allow for Jenkins to get output file with
information about the run (#1317) - Added cookbook to manage adminvm (#1288)
- Added wildcard *.chef.io to squid whitelist (#1295)
- Added bucket replication job that uses aws batch (#1294)
- Whitelist *.census.org (#1280)
- Added netpolicy rule for sowerjobs to reach revproxy and utilize internal
routing (#1266) - aws batch job for bucket manifest generating tool (#1219)
- COVID19 ETL jobs: add "S3_BUCKET" optional configuration variable + handle
underscores in job names (#1252) - .adfs.federation.va.gov whitelisted (#1248)
- cognito integration for SAML authentication. (#1247)
- added mran.microsoft.com to the whitelist (#1241)
- Selenium Hub (#1232)
kube-setup-seleniumhub
script is TBD. (#1232)- Azure terraform modules. (#1226)
- Added job (#1217)
- Added option to replicate from different source account than adminvm (#1217)
- new kube-setup-sower-jobs command that sets up S3 bucket, service account,
and fine-grained IAM controls for sower jobs (#1224) - Added uwsgi timeout optional param to extend read-timeout for fence (#1120)
- AdminVM module off utility VM (utility_admin) (#1208)
- Remove old & unused jobs for covid19 etl (#1207)
- Improve running new jobs for covid19 etl: now they will have unique names
(#1207) - gen3 util for creating aws lambda function (#1189)
- gen3 awslambda create funcname description role_arn (#1189)
- New Ansible playbook to add a cronjob to commons user to check on terraform
resources on daily basis and alert if there are changes outside the
template. Would also alert if there are uncommitted changes in
cloud-automation repo locally. (#1194) - Created bucket replicate script (#1186)
- You can now choose the version you want the ElasticSearch cluster to be
deployed on. (#1183) - Notebook ETL job (#1178)
- Doc update (#1181)
- Remove PR template,
cloud-automation
will use the organization one (#1179) - Migrated non-sensitive, externally helpful docs from cdis-wiki (#1154)
- Added www.dph.illinois.gov to Squid whitelist (#1166)
- Add new kubernetes job, the
data-ingestion-job
, which is specific to
DataSTAGE. (#1012) - ETL job for Illinois Department for Public Health data (#1162)
- Ability to deploy k8s workers on a /22 subnet, allowing more workers and
pods in the cluster. (#1152) - Add COVID-19 ETL job (#1150)
- Added keys for new bdcat cluster to squid (#1140)
- get hostname to indexd for DRS field
self_uri
(#1133) - Added script to update ebs volumes (#1130)
- Run WTS DB migration during "kube-setup-wts" (#1128)
- Add empty "external_oidc" field to WTS configuration file (#1128)
gen3 squid info
to get information about the HA-proxy instances (#1137)gen3 workers-cycle
to cycle a node or all nodes (#1126)- Switch proxy, let the stand by instance become the active one, or if the
cluster has more than two instance, a single one will be picked up
(different from the current instance) as active. (#1125) - RDS module now creates an Option Group by default that you assign to the
instance for backing up against s3 (#1119) - gen3 secrets rotate postgres indexd|sheepdog|fence (#1114)
- kube-dev-namespace sets up new db users for indexd, sheepdog, and fence
db's (#1114) - added fence ssh keys from internalanvil to squid (#1115)
- Setup sower job for indexd_utils (#1066)
- AWS inspec implementation for the security team. (#1112)
- added qa-dcf key to squid (#1109)
- metadata service automation (#1087)
- Remediate CIS issues with Amazon Linux workers (#1094)
- Single squid instance type is a variable. (#1092)
- HA squid (#1046)
- add OWASP rules to default modsecurity configuration (#1082)
- ability to run gen3 commands remotely using adminVMs as proxy (#1072)
- EX: (#1072)
-
-
ssh cdistest.csoc -C "~/cloud-automation/files/script/remote-gen3.sh
kube-setup-revproxy (#1072) -
ansible a-hosts -m shell -a "cloud-automation/files/script/remote-gen3.sh
kube-setup-revproxy (#1072) -
- implement gen3 cmd for creating gs bucket for data refresh (#1060)
- Networkpolicy fixes from VA: Kubernetes YAML syntax fix (#1049)
Dependency Updates
Deployment Changes
- (#1362)
- (#1354)
- (#1353)
- (#1352)
- (#1350)
- (#1351)
- (#1345)
- (#1347)
- (#1337)
- (#1341)
- (#1334)
- (#1331)
- (#1332)
- (#1330)
- (#1327)
- (#1328)
- (#1325)
- (#1324)
- (#1318)
- (#1317)
- (#1316)
- (#1298)
- (#1309)
- (#1297)
- (#1288)
- (#1293)
- (#1292)
- (#1287)
- (#1280)
- (#1277)
- (#1274)
- (#1267)
- (#1269)
- (#1266)
- (#1248)
- (#1247)
- (#1243)
- (#1245)
- (#1241)
- (#1239)
- (#1240)
- (#1238)
- The selenium side car defined in the jenkins deployment YAMl should,
eventually, be decommissioned. (#1232) - (#1236)
- (#1228)
- (#1235)
- (#1234)
- (#1233)
- (#1231)
- (#1230)
- (#1229)
- (#1226)
- (#1217)
- (#1227)
- (#1225)
- (#1224)
- (#1221)
- (#1222)
- (#1215)
- (#1211)
- (#1208)
- (#1209)
- (#1207)
- To improve logging,
$JOB_NAME
can be added to log filename (#1207) -
- 0 0 * * * (if [ -f $HOME/cloud-automation/files/scripts/covid19-etl-job.sh
]; then JOB_NAME=<JOB_NAME> bash
$HOME/cloud-automation/files/scripts/covid19-etl-job.sh; else echo "no
codiv19-etl-job.sh"; fi) > $HOME/covid19-etl-$JOB_NAME-job.log 2>&1 (#1207) -
- (#1204)
- (#1189)
- (#1197)
- (#1195)
- (#1194)
- (#1186)
- (#1191)
- (#1184)
- (#1187)
- (#1183)
- (#1181)
- (#1180)
- All ETL now have the same image and entrypoint, depend on the environmental
variable (#1168) - Require changing the manifest version section from: (#1163)
-
- "covid19-etl": "quay.io/cdis/covid19-etl:1.0.2", (#1163)
-
- to: (#1163)
-
- "covid19-jhu-etl": "quay.io/cdis/covid19-etl:1.0.2", (#1163)
-
- For ETL to run new version entry is required: (#1150)
-
- "covid19-etl": "quay.io/cdis/covid19-etl:latest" (#1150)
-
Breaking Changes
- The
codecept.conf.js
script ingen3-qa
needs to be adjusted
accordingly. (#1232)
Improvements
- enable sa-iam-role stuff by default in terraform eks (#1360)
- bump k8s deployments to
apps/v1
apiVersion (#1360) - basic workspace-parent account landing page, and couple hooks to bypass
portal whenportal_app
set toGEN3-WORKSPACE-PARENT
(#1360) gen3 logs history byuser
command to get list of top 100 users over some
time range (#1360)revproxy
tweak handling ofStrict-Transport-Security
header - there's a
covid19 security scan jira someplace (#1360)- apply cloud-automation filter for image tag for mariner deployment (#1354)
- add mariner to revproxy (#1352)
- update mariner deployment with awsusercreds secret and jwks endpoint for
fence (#1352) - delete old unused mariner config json (#1352)
- Fluentd 1.10.2 support for logging. (#1337)
- log stream are now named after the pod they are running on. (#1337)
- kubernetes workers logs (auth, message, docker), are now sent over to
CloudWatchLogGroups (#1337) - kube-system namespace logs are now skipped, not sent to CloudWatchLogGroups
(#1337) - Added descriptors to terraform files to allow infrastructure to be easily
identified/cleaned up. Also added clean all to cleanup all tagged
infrastructure. (#1341) - Setup ssjdispatcher and its jobs with IAM-linked service accounts. (#1339)
- Auto-create ssjdispatcher upload bucket, sns, and sqs. (#1339)
- Simplify g3kubectl to work with new aws-iam-authenticator - things like
watch kubectl get pods
orecho bla | xargs kubectl
will work now (#1339) - add visitdata.org to squid whitelist to allow fetching additional mobility
data for covid-19 bayes model (#1334) - add gstatic.com to squid_wildcard_whitelist to allow fetch google mobility
data (#1332) - increase resources allocated to fenceshib deployment (#1328)
- switch from old
heptio-authenticator-aws
to recent
aws-iam-authenticator
(#1325) - configure aws ntp on ec2 admin vm's:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html (#1322) - documentation update (#1318)
- ansible playbook update (#1318)
- Only remove the terraform dir if
gen3_terraform destroy
is success (#1320) - Better job status check (#1320)
- typo correction (#1316)
- new
gen3 es garbage
helper (#1310) - new
gen3 job cron
helper (#1310) - fix for
gen3 jupyter idle
(#1310) - add
backoffLimit
to some batch jobs (#1310) - increase memory allocation for nb-etl job so it doesn't get killed OOM by
k8s (#1298) - run bayes model for IL-by-county twice as fast (#1293)
gen3 roll all --fast
(#1291)gen3 roll all
calls out togen3 dashboard gitops-sync
(#1291)gen3 awsrole
extended to support sa-linked roles (#1291)- adminvm terraform to use ubuntu18 ami (#1291)
gen3 api hostname
,environment
,namespace
, andsafe-name
helpers
(#1291)gitops-sync
jobs extended with sa linked to an iam role -gen3 dashboard gitops-sync
should work (#1291)- add sts vpc endpoint to eks terraform (#1291)
- Runs notebook etl in the morning (10am Chicago time, server time is in
UTC), instead of running it every 3 hours (#1292) - Increase Guppy wait time for ES proxy to 250s (50 cycles) (#1290)
- schedule nb-etl pods to jupyter nodes so as to not compete with gen3
services for cluster resources (#1287) - COVID19 "nb-etl": do not overwrite "command" and "args" from Dockerfile
(#1284) - Create temp bucket for storing the output manifest (#1273)
- Support provide authz for merging (#1273)
- add
gen3 shutdown namespace
helper and cron jobs (#1272) - add
gen3 api hostname
helper (#1272) - patch
kube-setup-aws-es-proxy
to not rely on$vpc_name
(#1272) - patch
gen3 ebs snapshot
to copy volume tags to snapshot (#1272) - add
kube-setup-metadata
togen3 roll all
(#1270) - Add some helpers: (#1269)
gen3 api sower-run commandFile.json $apiKey
(#1269)gen3 api sower-template pfb
(#1269)gen3 prometheus query $query $apiKey
(#1269)gen3 prometheus list $apiKey
(#1269)gen3 prometheus curl $urlBase $apiKey
(#1269)- introduce
gen3 ecr
helpers for interacting with ECR docker image
repositories. (#1265) - Add rstudio.com to whitelist (#1264)
- Added .dph.illinois.gov to Squid whitelist, removed www.dph.illinois.gov
(#1259) - COVID19 ETL: add ETL name to slack notification (#1256)
- klock handles strange lock owners better (#1255)
- setup batch jobs to reap idle hatchery app pods (12 hours of no traffic)
(#1254) - Update IAM SA doc: SAs can be created using a policy ARN, not a policy name
(#1252) - awshelper image updates - ubuntu18, aws-cli2 (#1246)
- deploy awshelper in Jenkins (#1246)
- ambassador expose metrics to prometheus (#1246)
- jupyter idle helper (#1246)
- logs in CWL now uses the date as prefix (#1245)
- remove
set -i
from batch jobs - deprecated in bash 4.4 (#1244) - update
awshelper
to packagecloud-automation/
code package (#1244) - introduce
GEN3_AWSHELPER_IMAGE
variable to simplify testing (#1244) - trim bucket name down to 50 chars so there's room for prefixes down the
line (#1239) - Support for the latest version of fluentd. (#1238)
- New fluentd configuration would also push worker nodes
messages
and
secure
logs. (#1238) - Tagging the logs in a simpler way for better visibility and manipulation
through cloudwatch. (#1238) - Moving fluentd to its own namespace to clear out the kube-system namespace
(#1238) - This should address the flakyness around Selenium/Webdriver operations:
(#1232) - To get rid of errors such as (#1232)
-
chrome not reachable (Session info: headless chrome=70.0.3538.77) (Driver
info: chromedriver=2.43.600233 (#1232) - gen3 api helper works with api keys (#1236)
- kube-setup-sftp saves crypto keys to secrets folder (#1236)
- kube-setup-hatchery harvests configmaps (#1236)
- refactored code (#1217)
- update to new lts ubuntu version in awshelper dockerfile and other cleanup
(#1224) - Run the squid auto module by its own if required at any point in time.
(#1221) - Qualys VM module updated and simplified. (#1222)
- Hardcoded regions removed. (#1222)
- VPN server can now be deployed on ubuntu18. (#1215)
- Removed hardcoded references to planx. (#1215)
- Module can now be deployed on any account besides CSOC. (#1215)
- propagate exit code in usersync and etl jobs (#1205)
- roll-all non-zero exit code if cluster not healthy at end (#1198)
- reset non-zero exit code if roll-all fails (#1198)
- jenkins cronjob cleanup old builds (#1198)
- gen3 s3 retry tfplan/tfapply if necessary (#1198)
- add squid wildcard test (#1198)
- klock truncate owner to 45 characters (#1198)
- slack message for covid19-etl-*-job (#1199)
- documentation (#1196)
- Ansible hosts updated (#1197)
- Ansible hosts files, added new hosts, and switched to hostname instead of
IPs for a bunch of them. (#1194) - Lambda function documentation. (#1191)
- install
terraform12
(#1184) - add
gen3 es create index-name mapping.json
(#1184) - add
gen3 secrets gcp ...
for service key rotation (#1184) - add
gcp.md
with instructions for GCP integration (#1184) - update npm dependencies (
npm audit fix
) (#1184) - patch
gen3 s3 ...
andgen3 aws*
to not callgen3 trash
- leave
terraform workspace in place (#1184) - remove deprecated flag in
gen3 devterm
(#1184) - Commons documentation updated. A few typos fixed. (#1187)
- Documentation (#1183)
- support
gen3 workon profile whatever__data_bucket_queue
(#1177) kubectl node drain
commands now have the--force
flag for draining
deleting "Pods not managed by ReplicationController, ReplicaSet, Job,
DaemonSet or StatefulSet" like hatchery pods. (#1175)gen3 bootstrap
subcommand (#1173)- introduce
gen3 logs cloudwatch ...
subcommands (#1160) - Restructure the ETL for COVID-19 Data Commons (#1168)
- by using
aws autoscaling terminate-instance-in-auto-scaling-group
instead
ofaws ec2 stop-instances
we are letting the ASG that we want to keep the
desired state so it tries to spin up a new instance right away. (#1169) - aws-es-proxy new release ready (#1167)
- Improve the naming for jobs for COVID-19 Data Commons (#1163)
- Update the default Gen3 logo to new version (#1161)
- Network documentation. (#1158)
- Fix deployment step for COVID-19 adminVM cronjob (#1156)
- Commons module documentation. (#1152)
- Change comment (#1153)
- enable encryption in all storage classes (#1149)
- disable TLSv1 in old docker images (#1148)
- service_releases cookie http-only, secure (#1148)
- Documentation (#1143)
- expand hatchery config for dockstore apps (#1142)
- allow workspaces to communicate directly with manifest service and revproxy
(#1142) - first cut at path-list to manifest dashboard app - ex:
https://reuben.planx-pla.net/dashboard/Public/paths-to-manifest/index.html
(#1135) - first cut at links-page dashboard app - ex:
https://reuben.planx-pla.net/dashboard/Public/marcello20191126/data/index.html
(#1135) - roll dashboard in roll-all (#1131)
- delete all pods in reset (#1131)
- revproxy modsec WAF rules pruned - can pass basic QA with enforcement
enabled, but enforcement not yet enabled (#1123) - revproxy /_status endpoint and http liveness probe (#1123)
- revproxy nginx client_buffer_size to 16k, error log to warn level, set
fallback values on some log variables (#1123) - some basic waf documentation (#1123)
- files/scripts/braincommons - for generating various dream challenge access
reports (#1118) - allow access to port 3128 from the peering connection (#1113)
- update k8s scheduling resource requests and limits (#1099)
- Removed a few hardcoded tags (#1098)
- kube-setup-fence runs migrate job, so we can disable db migration in
fence-config-public.yaml in all our commons if we want to ... (#1097) - disable migration in presigned-url deployment (#1097)
- The instance product of this module has now
deploy_before_destroy
enabled. (#1092) - gen3OnK8s.md docs - more details on setting up gen3 on a k8s cluster (#1089)
- kube-dev-namespace - setup secrets in Gen3Secrets/ (#1089)
- Update Jenkins to 2.219 with latest changes and
security updates (#1083) - Addressing PXP-5001 (#1074)
- kube-setup-autoscaler does not have version fully hardcoded. Versions can
now be passed along with script. (#1069) - If no version is provided, it'll default to the ones preloaded. (#1069)
Bug Fixes
- Added wildcard
.southsideweekly.com
to allow grabbing the CHI-NBHD
dataset for PRC (#1347) - Fixed terraform and updated bucket-manifest script (#1343)
- Use the right directory (#1340)
- Fix the container image names (#1336)
- fix the job definition name (#1326)
- Removed verify bucket access job from kube-setup-google because the job is
buggy and removes access it shouldn't (#1309) - Replacing
kubectl
withg3kubectl
, (#1289) - up resource limits on nb-etl job to prevent getting killed by k8s (#1287)
- fix handling of squid IP lookup for external network policy on admin vm
(#1278) - use empty string instead of null (#1276)
- Added the anvil to squid whitelist (#1274)
- Truncates iam role, due to aws role character limit of 64 (#1267)
- Fix typos (#1260)
- use correct secret name for job config (#1243)
- Combine type logs wouldn't make it to CWL because of a typo. This has been
corrected. (#1245) - policy has correct resources based on new bucket naming convention (#1239)
- When fluentd was set to use the latest version, the configmap created off
the configuration file would have the wrong name which ultimately resulted
in crashed pods. (#1240) - The file name is created temporarily prior the application with the right
name to avoid this issue. (#1240) - Fix for issue with policy and role names longer than 64 characters when
running the s3-bucket module. (#1228) - Lambda function for logging parsing would fail if a variable has no value.
To fix this, we have set a default value to avoid having to set that up
manually on every environment. (#1235) - don't error when aws role names are above 64 char limit (slice to limit)
(#1234) - won't error for envs with long host names (b/c of s3 bucket name char
limit) (#1233) - Revert awshelper image to unbreak jobs (#1227)
- handle errors better in iam-serviceaccounts (#1224)
- Nat gateway data resource look up sometimes fails if it finds another
resource with the same search criteria. For example you made a mistake
applying a plan and have to rebuild again, the gateway might get deleted
but not removed. Adding an additional filter solves this problem. (#1209) - presigned-url deployment works with fence 2.7 (#1205)
- Report tool would fail if kubeconfig file is located at a different folder
than${HOME}/${vpc_name}/kubeconfig
. It is now loaded from the location
returned by gen3. (#1204) - monitor-pod would sometimes hang when launching the monitor pod. (#1195)
- The lambda function module lacked of additional outputs. It'll now output
whichever option you choose, with vpc and without vpc. If any is not
deployed it'll simply not output it. (#1191) - You can now tell the ES module to deploy or not a linked role. This is
problematic when there is more than one commons using ES in a single
account. (#1183) - The rate limit config had been accidentally disabled when the
command
override was introduced to this deployment (which suppresses the parent
image'sentrypoint.sh
execution). Adding an extra call to this script to
make sure the NGINX_RATE_LIMIT check is executed. (#1182) - Using
gen3 iam-serviceaccount
results innull
in service-account
annotation. This is a fix for the issue. (#1180) gen3 iam-serviceaccount
produces wrong link (containshttps
part) in
Federated part of trust policy. Fixed now. (#1180)- Name of the environmental variable (#1171)
- Docker image name (#1171)
- Fix the name of COVID-19 job (#1170)
- When creating databases, the script uses the current user name. When the
user has hyphens in the name, like iam-anuser, postgres doesn't like it and
the script bails. (#1164) - This patch would replace hyphens for underscores before creating databases
whendb_restore
ing. (#1164) - google cron jobs check webhook validity (#1155)
- When listing hosted zones, if one had no comments, the scripts would fail
without completing any change on the hosted zone for cloud-proxy. (#1151) - handling of external requests to port 80 by the reverse proxy and network
policy (#1148) - eks workers init script had a duplicated
kubectl apply -f
on the same
line, breaking the application of the file that deploys calico. (#1147) - fix regression creating peregrine postgres user in new commons (#1138)
- vpc module tag for the actual vpc changed a few ago, causing mayor issues
when deploying new commons, bringing it back to how it was before. (#1132) - Hardcoded commons name removed (#1129)
- Switched the domain used to check for internet access from the VPC where
commons' kubernetes workers live. The previous default one returned 302
(redirection) codes forcing the check to switch the active instances every
minute. www.google.com return 200 which is acceptable as a working internet
connection. (#1124) - leftover option turning into invalid when applying (#1122)
- duplicated function name in lambda for check ups against the proxies.
(#1113) - Modified fence db migrate job to disable/wait for usersync to finish (#1107)
- disable db-migrate job in kube-setup-fence (#1110)
- db restore create restore db as server owner and grant perms to service
user (#1108) - kube-dev-namespace secrets folder fix (#1105)
- squid auto bootstrap had a reference for a branch. (#1100)
- Fixed an issue which is causing DREAM access report to have empty fields
(#1076) - Add list of namespaces to Jenkins (#1068)
- Enable Veracode domains through squid proxy (#1065)