Agent
M.Sc course, University of Debrecen, Department of Data Science and Visualization, 2025
Colab
This labor demonstrates building a medical chatbot using the MedQuad dataset and the LlamaIndex library. It leverages a large language model (LLM) to understand and answer medical questions based on the information within the dataset.
Theoretical Background:
- MedQuad Dataset: A medical question-answering dataset containing pairs of questions and answers related to various medical topics. It serves as the knowledge base for the chatbot.
- LlamaIndex: A framework designed to facilitate interactions with LLMs. It allows for indexing data, creating chatbots, and managing the context of conversations.
- Large Language Models (LLMs): Powerful models trained on vast amounts of text data, enabling them to understand natural language and generate human-like text. The chatbot utilizes an LLM to interpret user queries and formulate answers.
- Vector Databases: These databases store data as vectors, enabling efficient similarity search. LlamaIndex uses vector databases to index the MedQuad dataset for quick retrieval of relevant information.
Notebook Workflow:
- Installation and Imports: Installs the necessary libraries (llama-index, llama-index-embeddings-huggingface, etc.) and imports required modules for data manipulation, embedding generation, and interaction with LLMs.
- Data Loading: Loads the MedQuad dataset from Hugging Face Datasets using Pandas.
- Vector Database Creation: Creates a vector database using LlamaIndex to store and index the MedQuad data. This allows for efficient retrieval of relevant information when answering user queries.
- Chatbot Initialization: Initializes a chatbot engine that uses the specified LLM and the created vector database. It also sets the system prompt to guide the chatbot’s behavior and personality.
- Chatbot Interaction: Enters a loop that continuously prompts the user for questions and provides answers using the chatbot engine. The responses are streamed to provide a more interactive experience.
- History Reset: Clears the chat history after the interaction.
Key Concepts:
- Embedding Generation: Converts text data into numerical vectors to represent their meaning and semantic relationships.
- Similarity Search: Retrieves information from the vector database that is semantically similar to the user’s query.
- Context Management: Maintains the context of the conversation, allowing the chatbot to understand and respond to follow-up questions.