Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Jan Integrates Cortex.cpp as Provider #3821

Merged
merged 49 commits into from
Nov 6, 2024
Merged

Conversation

louis-jan
Copy link
Contributor

@louis-jan louis-jan commented Oct 17, 2024

Implementation Specs

Migration Path:

  1. App 0.5.8 opens
  2. Return model list from cache (given users are on 0.5.7) -> function normally.
  3. Scan JSON models (legacy logics - fresh install or older versions) -> function normally.
  4. In background, app attempts to import models and merge with legacy downloaded models (failed to import models)
  5. The app combines models returned by cortex.cpp and legacy JSON models. Cortex.cpp models are prioritized in case of the same ID (Models are imported successfully.)

Changes

Naming convention

  • inference-nitro-extension is renamed into inference-cortex-extension.
  • cortex.cpp binaries have the same name as engine releases.
  • Pre-package everything, include cuda dependencies (dll, so) so users don't have to install separately.
  • Support noavx-cuda binaries as a fallback

Simplifed

  • Deprecated ModelFile. It's no longer relevant. Now, providers define models, so it should manage how to run itself.
  • Remove install cuda toolkit UX, should be ready after installed.

Downloader

App proxies to cortex.cpp or app's downloader, depending on the cortex.cpp model support capability.

Model Hub

  • App allows extensions to register models available for download in RAM. After downloading them, the models will have their yaml or json persisted along with the model files.
  • App priorities model hub decoration (previous json metadata) over cortex.cpp metadata (such as name, size, tags)

Observability

  • cortex-extension should watch cortex.cpp server upon launch. It ensures that the cortex process runs with the application.
  • All requests will be queued and run when the server to come online, ensuring the UX remains the same. So there would be no asynchronous requests and server run introduced. E.g. Model import or start should not fail due to server not being online in time.
  • So there would be no attempt to kill the cortex process on model start every time. It is just a stop and start model, so it will not block other API requests.

Goals

  • Updated from the older version to this version, models will be imported and run normally. Models are not imported will still able to run since we will attempt to do preflight before running.
  • Users can download models or app proxies to cortex.cpp, or use the app downloader, depending on the cortex.cpp model support capability.

Subtasks

  • Pull latest cortex.cpp and engines to package
  • Keep binaries with the same name as their release name
  • Support noavx-cuda binaries as a fallback
  • Rename nitro-extension to cortex-extension
  • Deprecate ModelFile
  • Pre-package Cuda dependencies
  • Registered models should be persisted in memory.
  • App manages to download models using cortex.cpp downloader and it's legacy downloader (tensorrt-llm and clip models)
  • App gets persisted models from cortex.cpp and scans JSON models itself (legacy)
  • App prioritize model decoration metadata from Hub over cortex.cpp
  • cortex-extension watches cortex.cpp server on launch
  • Model-extension queues cortex.cpp API requests with health checks so requests do not fail due to server uptime
  • cortex.cpp supports legacy model load parameters (NGL, context-length, cache enable, clip mmproj)
  • App shares CUDA dependencies with extensions (asar.unpackaged) to improve load time and reduce app size when multiple extensions share the same artifacts (tensorrt-llm and llama-cpp)

@janhq/jan @janhq/cortex

#3825

Copy link
Contributor

github-actions bot commented Oct 17, 2024

Preview URL: https://5239fa1d.docs-9ba.pages.dev

@louis-jan louis-jan changed the title [WIP] feat: model and cortex extensions update - path to new cortex.cpp [WIP] feat: Jan Integrates Cortex.cpp as Provider Oct 17, 2024
@louis-jan louis-jan marked this pull request as draft October 17, 2024 07:02
@louis-jan
Copy link
Contributor Author

Update scenario

  1. I have an old Jan version with downloaded models.
    Screenshot 2024-10-21 at 12 28 08

  2. Updated to the newer Jan version, should maintain the models without delay or concern about cortex server corruption.
    Screenshot 2024-10-21 at 12 31 00

  3. I can load models from older versions.
    Screenshot 2024-10-21 at 12 31 05

  4. I can download new models using the Jan version.
    Screenshot 2024-10-21 at 12 32 18

  5. Switch back to old Jan versions. Still see old downloaded models (not models downloaded on a newer version).
    Screenshot 2024-10-21 at 12 33 53

  6. Switch back to the new Jan version, all downloaded models are visible
    Screenshot 2024-10-21 at 12 34 14

@louis-jan louis-jan force-pushed the feat/path-to-cortexcpp branch from 96e3919 to 3156e8a Compare October 21, 2024 09:18
@louis-jan
Copy link
Contributor Author

Rebased dev

@louis-jan
Copy link
Contributor Author

Jan's API Server works with Cortex.cpp. Later, proxy everything.

Screenshot 2024-10-22 at 16 18 48

@louis-jan louis-jan force-pushed the feat/path-to-cortexcpp branch from eebcf07 to 26a3405 Compare October 22, 2024 10:09
@dan-menlo
Copy link
Contributor

Nice 👀

Barecheck - Code coverage report

Total: 69.88%
Your code coverage diff: 0.37% ▴

Uncovered files and lines

@louis-jan
Copy link
Contributor Author

louis-jan commented Oct 24, 2024

Jan can handle multimodal download, even if cortex.cpp is not supported.

Screenshot 2024-10-24 at 14 18 01

Jan can handle its downloaded models aside from cortex.cpp /models

LlaVa 7B is downloaded by Jan, works with legacy model.json, others are handled by cortex.cpp.
Screenshot 2024-10-24 at 15 16 30

@louis-jan louis-jan force-pushed the feat/path-to-cortexcpp branch from 97f87e8 to d235e88 Compare October 24, 2024 08:23
@louis-jan louis-jan marked this pull request as ready for review October 29, 2024 07:21
@louis-jan louis-jan requested a review from a team October 29, 2024 07:21
@louis-jan louis-jan force-pushed the feat/path-to-cortexcpp branch from 2cb006c to b913af9 Compare November 4, 2024 08:37
@louis-jan
Copy link
Contributor Author

louis-jan commented Nov 5, 2024

Copy link
Contributor

@dan-menlo dan-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prays lgtm

@louis-jan louis-jan merged commit a82c701 into dev Nov 6, 2024
9 checks passed
@louis-jan louis-jan deleted the feat/path-to-cortexcpp branch November 6, 2024 08:45
@github-actions github-actions bot added this to the v0.5.8 milestone Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants