Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-build AWS ami with Packer to minimise EC2 bootstrapping time #260

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

surf-rbood
Copy link

Reference Issues/PRS

None

What does this implement/fix?

Added the option to run the automl framework with a custom AMI to minimise EC2 instance bootstrapping time.

Also, documented the process how to build this AWS AMI with Packer and added required config/scripts for the AMI build procedure.

Other comments

Things to consider/think about:

  • Instead of an use_packer_ami option in config.py it could be a command line flag where the custom AMI ID is specified.
  • Decide if the automl user has to build a custom private AMI in their AWS account or that there are public available AWS AMI’s for the stable branche. If so, there should be a build pipeline that facilitates this.
  • The steps in the AMI build procedure need to be checked if this can be improved.

…packer.

This reduces the number of bootstrap steps in cloud-init and saves time.
Also added the instructions and config/scripts on how to build an AWS AMI with packer.
@PGijsbers PGijsbers added the aws AWS support label Mar 15, 2021
@PGijsbers
Copy link
Collaborator

PGijsbers commented May 11, 2021

I finally looked into this (sorry!). I managed to create the AMI just fine with the provided documentation, and recorded the following runtimes with and without the packer AMI running python runbenchmark.py constantpredictor test test -f 0 -m aws:

run without with
0 269 215
1 306 215
2 307 215
3 337 215

Though n is small, I think the benefit is clear (edit: I noticed I had the with and without column names swapped, fixed now).

Instead of an use_packer_ami option in config.py it could be a command line flag where the custom AMI ID is specified.

If we have an automated builder that provides public AMIs of the latest stable build, I think that should probably be the default? Additionally perhaps we should also allow to work off that AMI from a different branch and have the cloud init pull the correct branch and update the dependencies as required. That should also bring most of the benefit of faster start time to the developer workflow. When specific environments are necessary (e.g. testing an update to Python version), we can still rely on the cloud init which just builds off the plain ubuntu AMI. But for most development work I don't think version control needs to be that precise.

Decide if the automl user has to build a custom private AMI in their AWS account or that there are public available AWS AMI’s for the stable branch. If so, there should be a build pipeline that facilitates this.

From what I understand, if we have a public AMI we also have to pay for storage costs. For whatever reason I can't see the private AMI I created while testing this from the EC2 Dashboard (it lists no private AMI, search gives no result for public AMI with the ID either). S3 also has transfer prices, does that mean we pay when someone uses our image, not just storage? S3 prices don't seem terribly high, but I find it hard to make an estimation of the running costs. Looking at the prices I would find it hard to imagine rising above a (few) dollar(s) a month. Am I missing anything, and/or do you have a price estimate?

The steps in the AMI build procedure need to be checked if this can be improved.

I'll have a closer look tomorrow, but at first glance there looks to be nothing substantial left to move from cloud init to the AMI.

Again apologies for not getting to this sooner, but we really do appreciate the effort!

@surf-rbood
Copy link
Author

I've looked into the costs of an AWS AMI. When you create an AMI, an EBS snapshot is taken and stored for you. There are costs for the initial snapshot (0.05 $/GB) and there are indeed monthly storage costs for the EBS snapshot (0.05$/GB per month). From what I understand, you only pay for the actual storage your snapshot consumes. The size of the automl image is very small so the storage costs are negligible. I've tried to find the costs for storage in the billing dashboard for the test AMI images I've created in the Automl account, I could not find this (S3 costs were only 0.04$ in total and those consisted of API operations LIST/GET/PUT). The only costs I could find, were the costs for the creation of EBS snapshots, which were a few dollars in total (for the last 6 months).

Regarding the costs of public AMIs:
"You are not billed when your AMI is used by other AWS accounts to launch instances; only the accounts launching instances using the AMI are billed for the instances they launch." ( see, https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sharingamis-intro.html)

You can find your created AMI's in the EC2 dashboard. The Ami's are region bound, so you have to set the AWS console to the correct region (i.e. ec-central-1/frankfurt). That's where I can see the private automl AMI's. Perhaps your console is set to a different region, which explains why you can't see the AMI's. Public AMI's are also region bound so only available to others, in the regions where you decide to create/store/publish the images.

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebhrusen while we discussed some additional AMIs and restructuring the configuration file, I think we can approve this PR? We can refactor it later, but the PR is a sefl-contained first step: functional and well documented (at least good enough for me to understand :] ).

@PGijsbers
Copy link
Collaborator

@surf-rbood thanks for the help! The additional information on costs is also very useful. Based on that I think it's reasonable for us to build the AMIs for common regions.

Copy link
Collaborator

@sebhrusen sebhrusen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surf-rbood thanks for your contribution, @PGijsbers has been using it and it looks very useful.

@PGijsbers I agree that the PR looks mostly good (self-contained + documented as you mentioned).
I'd have 2 requirements to get this merged or to be fixed very soon after the merge:

  • address my 2 comments with a thumbs up.
  • commit the aws_ami/config/ami-automl.pkr.hcl file as "editable" (add to .gitignore, then force add) as it looks like this file needs to be edited by the end-user most of the time. Ideally, user should be able to have the file in its custom ~/.config/automlbenchmark folder but I don't see this supported right now, and therefore we should allow users update the repo in spite of changes made to this local file.

Comment on lines +45 to +47
"BRANCH=stable",
"GITREPO=https://github.com/openml/automlbenchmark",
"PYV=3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can those be turned into variables?

Comment on lines +119 to +122
use_packer_ami: false # if true, the EC2 instance will be started with the AMI ID of the pre build packer AMI.
# Note, make sure to enter the AMI ID of your packer build image in the packer_ami field (i.e. ec2.regions.[region].packer_ami).
# For more information, see the aws_ami directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather move this config under the aws.ec2 namespace

variable "source_ami" {
type = string
description = "Ubuntu Server 18.04 LTS (HVM), EBS General Purpose (SSD) VolumeType"
default = "ami-0bdf93799014acdc4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this defaults to an ami available only in eu-central-1: would be nice to have a way to automatically default to the ami defined associated to the selected region in config.yaml.
Until then, I suggest:

Suggested change
default = "ami-0bdf93799014acdc4"
default = "<use one ami defined in config.yaml, namespace aws.ec2.regions>"

aws_ami/config/ami-automl.pkr.hcl Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws AWS support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants