langchain openai chainlit chromadb tiktoken PyPDF2 pypdf pandas tabulate
The system uses the RecursiveCharacterTextSplitter to process text from PDF files.
A chat prompt template (ChatPromptTemplate) is created using the chainlit library to handle user interactions.
On chat start, the system initializes a retrieval-based question-answering chain (RetrievalQAWithSourcesChain).
The script handles user messages using chainlit, interacting with the defined chain to retrieve answers based on the provided PDF text.
Responses are constructed based on answers obtained from the question-answering chain, including source information and handling the display of final answers with associated sources.
Callbacks (AsyncLangchainCallbackHandler) are used to manage the flow of the chat and user sessions, storing metadata, text, and the initialized chain.
Uploaded PDF files are processed, and text is extracted from each page using PyPDF2. OpenAI embeddings are utilized for document retrieval.