Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Test Case Data Files Solution #910

Open
JamesHabben opened this issue Nov 4, 2024 · 1 comment
Open

Large Test Case Data Files Solution #910

JamesHabben opened this issue Nov 4, 2024 · 1 comment

Comments

@JamesHabben
Copy link
Collaborator

Background

Our test data files fall into two categories:

  1. Regular Files (<100MB): Stay in git repository
  2. Large Files (>100MB): Need alternative storage solution

Example large file from testdata.webkit.json:

{
  "josh_ios15_ffs": {
    "artifacts": {
      "webkitCacheRecords": {
        "file_count": 4137,
        // Creates a 300MB zip file - too large for regular git
      }
    }
  }
}

Storage Options for Large Files (>100MB)

1. GitHub LFS

  • Cost: Free tier - 1GB storage/bandwidth per month
  • Pros:
    • Native git integration
    • Automatic with GitHub Actions
    • Version controlled
  • Cons:
    • Limited free tier
    • Additional bandwidth/storage = $5/50GB/month

2. Azure Blob Storage

  • Cost: ~$0.25/month for 25GB
  • Pros:
    • Cost-effective
    • Reliable
    • Easy GitHub Actions integration
    • Team maintains control
  • Cons:
    • Not free
    • Requires setup/maintenance

3. Google Drive

  • Cost: Free tier - 15GB
  • Pros:
    • Free storage
    • Familiar interface
  • Cons:
    • API setup required
    • More complex GitHub Actions integration
    • Manual version control

4. GitHub Releases

  • Cost: Free
  • Pros:
    • 2GB file limit
    • No bandwidth limits
    • Native to GitHub
  • Cons:
    • 10GB total limit
    • Manual upload process
    • Less automated

Questions for Discussion

  1. Is cost or automation more important?
  2. Do we anticipate many large test files?
  3. How important is version control for test data?
  4. Who will maintain the storage solution?
  5. Are there other cloud services to consider?
@JamesHabben
Copy link
Collaborator Author

a related underlying issue can be a subtopic but i think we will likely all agree. Brigs and I spoke about it in the video. we either cut down test data to be under size, or we figure out how to host full test data. example again for webkit. we could cut down the number of files included in the test case, but we would risk excluded different formats and whatnot in these files. i cant think of a sensible way to intelligently filter the number of files down without manual carve outs.

my thoughts:

  1. Github LFS
    i think we will exceed the storage quickly since a single webkit test case is 300 mb. we will also hit the bandwidth really quick as well since each time one of us grabs the file it counts. also each time a gh action grabs the file it counts. if we choose this route, we would likely need to get at least 1 data pack near immediate. are there other benefits to the project in having a data pack?
  2. Google Drive
    15gb seems to be a reasonable size allowance, especially without bandwidth limits in addition. could be a new gmail account for each of the LEAPPs to spread the size data. Brigs can hold creds. can it generate an API key for us all to use? can we make a read only key?
  3. Azure Blob
    I have an azure account and use blob storage currently so i am familiar with it. i know API keys are not an issue to generate - multiples and read only just fine. i have also used gh actions directly with the blob storage api in my 4n6appfinder project repo. we may not need the direct integration though depending on how the integration of splitting the files goes as that might be conditions handled in python code being run by the action.
  4. Additional Thought
    do we consider splitting a 300 mb zip file into 3-4 segmented 95 mb zip files? could be a solution depending on how many of these large test cases we end up facing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant