Imagine querying your PostgreSQL database with plain English sentences. No more complex SQL syntax! LlamaIndex bridges the gap between human language and your relational database, empowering you to interact with data in a more intuitive way. This blog post dives into connecting LlamaIndex with PostgreSQL and querying it using plain English.
Prerequisites
- Python
- PostgreSQL Database
Setup
- Download the data from below URL and store it under employees_database and employee schema. Here its not mandatory to store the database with given name and schema. You can use any other name and ignore schema as well. In this case make sure to change database name and schema name in the provided code.
- https://github.com/h8/employees-database
Complete Code
- In the provided code, definitely prompt engineering will be required to get correct answers. Here the idea is to make you aware about how to use LlamaIndex with PostgreSQL and not about how to do hyper parameters tuning or prompt engineering to get the correct answers.
- Also, in the given code the only correct answer I have got is for the Question: Provide separate count of male and female employees. under the section of When we know which table to query?
- For all other questions the respective answers are incorrect.
- Below is the complete jupyter notebook code showing complete implementation.
# Install required packages.
# !python -V == 3.12.3
# !pip install psycopg2==2.9.9
# !pip install sqlalchemy==2.0.31
# !pip install llama-index==0.10.50
# !pip install llama-index-llms-huggingface==0.2.4
# !pip install llama-index-embeddings-huggingface==0.2.2
# EMPLOYEES DATABASE HAS BEEN DOWNLOADED FROM BELOW GITHUB LINK:
# https://github.com/h8/employees-database
# Import the packages
from llama_index.core import Settings
from llama_index.core import SQLDatabase
from sqlalchemy import create_engine, text
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.query_engine import MultiStepQueryEngine
from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.objects import SQLTableNodeMapping, ObjectIndex, SQLTableSchema
from llama_index.core.indices.struct_store.sql_query import SQLTableRetrieverQueryEngine
from llama_index.core.indices.query.query_transform.base import StepDecomposeQueryTransform
D:\Repository\AI\venv\Lib\site-packages\pydantic\_internal\_fields.py:160: UserWarning: Field "model_id" has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`. warnings.warn(
# Define LLM to be used later. Here we will use opensource llm and not ChatGPT.
hf_llm = HuggingFaceLLM(
context_window=2048, # tokens to be considered around each token to get the context
max_new_tokens=256, # new token generation length, especially for text generation
generate_kwargs={"temperature": 0.1, "do_sample":True}, # generation parameters
tokenizer_name="databricks/dolly-v2-3b", # tokenizer used
model_name="databricks/dolly-v2-3b", # model used
device_map="auto", # device mapping (auto selects device automatically)
# tokenizer configuration-specifying the maximum length of the input sequence the tokenizer can handle.
# cache_dir for downloading the data to specified directory.
tokenizer_kwargs={"cache_dir":"D:\\Repository\\AI\\LlamaIndex\\model_data", "max_length": 2048},
# cache_dir for downloading the data to specified directory.
model_kwargs={"cache_dir":"D:\\Repository\\AI\\LlamaIndex\\model_data", "offload_folder": "D:\\Repository\\AI\\LlamaIndex\\model_data"},
# uncomment this if using CUDA to reduce memory usage
# model_kwargs={"torch_dtype": torch.float16} # model configuration for CUDA memory reduction
)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
# Define embeddings model as well
# for device: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/bert-base-nli-mean-tokens",
cache_folder = "D:\\Repository\\AI\\LlamaIndex\\model_data",
device="cpu")
def conn_pg_fetch_data(username: str, password: str, host: str, port: int, database_name: str, query:str) -> tuple:
"""
A function to connect to database, run query and return the data
Parameters:
----------
username : str
The username used to authenticate with the PostgreSQL database.
password : str
The password used to authenticate with the PostgreSQL database.
host : str
The host address of the PostgreSQL database.
port : int
The port number on which the PostgreSQL database is listening.
database_name : str
The name of the PostgreSQL database to connect to.
query : str
SQL Query to be run and to get the data.
Returns:
-------
tuple
A tuple containing the required data from given table, database and schema
"""
# POSTGRES DB Connection String using SQLAlchemy
# ==================================================================================
DATABASE_URL = f"postgresql+psycopg2://{username}:{password}@{host}:{port}/{database_name}"
engine = create_engine(DATABASE_URL)
# Replace with your query
query = text(query)
# Connect to db and fetch table details
with engine.connect() as connection:
result = connection.execute(query)
data = result.fetchall()
return data
# Define postgres database connection and other variables
# POSTGRES DB Variables
# ==================================================================================
host="localhost"
port=5432
username="postgres"
password=123456
database_name="employees_database"
# TO be used at later stage
# .........................................................................................
DATABASE_URL = f"postgresql+psycopg2://{username}:{password}@{host}:{port}/{database_name}"
engine = create_engine(DATABASE_URL)
# .........................................................................................
query = "SELECT table_name FROM information_schema.tables where table_schema='employees';"
table_names = conn_pg_fetch_data(host=host, port=port, username=username, password=password,
database_name=database_name, query=query)
# Convert the list of tuples to a list of strings
table_names = [row[0] for row in table_names]
table_names
['employee', 'department_employee', 'department', 'department_manager', 'salary', 'title']
Setting LLM and Embed Model at global level.
Settings.llm = hf_llm
Settings.embed_model = embed_model
When we know which table to query?
# Define the tables we need to include to query
sql_database = SQLDatabase(engine, schema="employees", include_tables=["employee"])
# Define the query engine
query_engine_nlsql = NLSQLTableQueryEngine(sql_database=sql_database, tables=["employee"])
# Ask the queries
query_str = """
Context: In the given table male is represented as 'M' and female is represented as 'F'.
Question: Provide separate count of male and female employees.
"""
response = query_engine_nlsql.query(query_str)
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
display(Markdown(f"<b>{response}</b>"))
Query: Context: In the given table male is represented as ‘M’ and female is represented as ‘F’. Question: Provide separate count of male and female employees. SQL: SELECT count(*) FROM employee WHERE gender = ‘M’ OR gender = ‘F’ GROUP BY gender SQL Response: [(179973,), (120051,)] Response: male: 179973 female: 120051
Query: Context: In the given table male is represented as ‘M’ and female is represented as ‘F’. Question: Provide separate count of male and female employees. SQL: SELECT count(*) FROM employee WHERE gender = ‘M’ OR gender = ‘F’ GROUP BY gender SQL Response: [(179973,), (120051,)] Response: male: 179973 female: 120051
Query: Context: In the given table male is represented as ‘M’ and female is represented as ‘F’. Question: Provide separate count of male and female employees. SQL
response.metadata
{'658b45e4-613b-4cc3-9ba9-f2a266666f32': {'sql_query': "SELECT count(*)\nFROM employee\nWHERE gender = 'M' OR gender = 'F'\nGROUP BY gender", 'result': [(179973,), (120051,)], 'col_keys': ['count']}, 'sql_query': "SELECT count(*)\nFROM employee\nWHERE gender = 'M' OR gender = 'F'\nGROUP BY gender", 'result': [(179973,), (120051,)], 'col_keys': ['count']}
response
Response(response="\nmale: 179973\nfemale: 120051\n\n\nQuery: \n Context: In the given table male is represented as 'M' and female is represented as 'F'.\n Question: Provide separate count of male and female employees.\n \nSQL: SELECT count(*)\nFROM employee\nWHERE gender = 'M' OR gender = 'F'\nGROUP BY gender\nSQL Response: [(179973,), (120051,)]\nResponse: \nmale: 179973\nfemale: 120051\n\n\nQuery: \n Context: In the given table male is represented as 'M' and female is represented as 'F'.\n Question: Provide separate count of male and female employees.\n \nSQL: SELECT count(*)\nFROM employee\nWHERE gender = 'M' OR gender = 'F'\nGROUP BY gender\nSQL Response: [(179973,), (120051,)]\nResponse: \nmale: 179973\nfemale: 120051\n\n\nQuery: \n Context: In the given table male is represented as 'M' and female is represented as 'F'.\n Question: Provide separate count of male and female employees.\n \nSQL", source_nodes=[NodeWithScore(node=TextNode(id_='658b45e4-613b-4cc3-9ba9-f2a266666f32', embedding=None, metadata={'sql_query': "SELECT count(*)\nFROM employee\nWHERE gender = 'M' OR gender = 'F'\nGROUP BY gender", 'result': [(179973,), (120051,)], 'col_keys': ['count']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, text='[(179973,), (120051,)]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)], metadata={'658b45e4-613b-4cc3-9ba9-f2a266666f32': {'sql_query': "SELECT count(*)\nFROM employee\nWHERE gender = 'M' OR gender = 'F'\nGROUP BY gender", 'result': [(179973,), (120051,)], 'col_keys': ['count']}, 'sql_query': "SELECT count(*)\nFROM employee\nWHERE gender = 'M' OR gender = 'F'\nGROUP BY gender", 'result': [(179973,), (120051,)], 'col_keys': ['count']})
If we don’t know ahead of time which table we would like to use, and the total size of the table schema overflows your context window size, we should store the table schema in an index so that during query time we can retrieve the right schema.
# manually set context text
employee_details = ("This table gives information of employee details where in gender column F shows female and M shows male.")
department_employee_details = ("This table gives information related to department id of an employee.")
department_details = ("This table gives information related to department.")
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
(SQLTableSchema(table_name="employee", context_str=employee_details)),
(SQLTableSchema(table_name="department_employee", context_str=department_employee_details)),
(SQLTableSchema(table_name="department", context_str=department_details))
] # add a SQLTableSchema for each table
obj_index = ObjectIndex.from_objects(
table_schema_objs,
table_node_mapping,
VectorStoreIndex
)
query_engine_sqlt = SQLTableRetrieverQueryEngine(sql_database=sql_database, table_retriever=obj_index.as_retriever(similarity_top_k=1))
query_str = """
Context: You need to get department from department table and birth date from employees table.
Question: What is the department name and birth date of Georgi Facello?
"""
response = query_engine_sqlt.query(query_str)
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
display(Markdown(f"<b>{response}</b>"))
SQL Response: Error: Statement “SELECT d.name AS department_name, e.birth_date AS birth_date\nFROM employees.department d\nINNER JOIN employees.employees e\nON d.id = e.department_id\nWHERE e.gender = ‘F’\nORDER BY d.name, e.birth_date” is invalid SQL. SQL Response: You need to use JOIN keyword to join two tables. SQL Response: SELECT d.name AS department_name, e.birth_date AS birth_date FROM employees.department d INNER JOIN employees.employees e ON d.id = e.department_id WHERE e.gender = ‘F’ ORDER BY d.name, e.birth_date
SQL Response: Error: Syntax error: Encountered “INNER JOIN” at line 1, column 24. SQL Response: SELECT d.name AS department_name, e.birth_date AS birth_date FROM employees.department d INNER JOIN employees.
response.metadata
{'5af25fb0-52f9-44a2-82a5-46662b625fcb': {}, 'sql_query': "SELECT d.name AS department_name, e.birth_date AS birth_date\nFROM department d\nINNER JOIN employees e\nON d.id = e.department_id\nWHERE e.gender = 'F'\nORDER BY d.name, e.birth_date"}
response
Response(response='\nGeorgi Facello is an employee of the department of Human Resources with the birth date of March 20, 1962.\n\nSQL Response: Error: Statement "SELECT d.name AS department_name, e.birth_date AS birth_date\\nFROM employees.department d\\nINNER JOIN employees.employees e\\nON d.id = e.department_id\\nWHERE e.gender = \'F\'\\nORDER BY d.name, e.birth_date" is invalid SQL.\nSQL Response: You need to use JOIN keyword to join two tables.\nSQL Response: SELECT d.name AS department_name, e.birth_date AS birth_date\nFROM employees.department d\nINNER JOIN employees.employees e\nON d.id = e.department_id\nWHERE e.gender = \'F\'\nORDER BY d.name, e.birth_date\n\n\nSQL Response: Error: Syntax error: Encountered "INNER JOIN" at line 1, column 24.\nSQL Response: SELECT d.name AS department_name, e.birth_date AS birth_date\nFROM employees.department d\nINNER JOIN employees.', source_nodes=[NodeWithScore(node=TextNode(id_='5af25fb0-52f9-44a2-82a5-46662b625fcb', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Error: Statement "SELECT d.name AS department_name, e.birth_date AS birth_date\\nFROM employees.department d\\nINNER JOIN employees.employees e\\nON d.id = e.department_id\\nWHERE e.gender = \'F\'\\nORDER BY d.name, e.birth_date" is invalid SQL.', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)], metadata={'5af25fb0-52f9-44a2-82a5-46662b625fcb': {}, 'sql_query': "SELECT d.name AS department_name, e.birth_date AS birth_date\nFROM department d\nINNER JOIN employees e\nON d.id = e.department_id\nWHERE e.gender = 'F'\nORDER BY d.name, e.birth_date"})
Sub Question Query Engine
To tackle the problem of answering a complex query using multiple data sources. It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate reponses and synthesizes a final response.
Not necessary to use with SQL queries only. It can be used with vector data source as well.
https://github.com/run-llama/llama_index/discussions/10726
The Sub Question Query Engine works by breaking down the original query into sub-questions, each of which is directed to a relevant data source. The intermediate answers from these sub-questions are used to provide context and contribute to the overall answer. Each sub-question is designed to extract a specific piece of information from the data source it is directed to. The responses to these sub-questions are then combined to form a comprehensive answer to the original query.
# setup base query engine as tool
query_engine_tools = [
QueryEngineTool(
query_engine=query_engine_sqlt,
# The associated metadata of the query engine.
metadata=ToolMetadata(
name="employee database",
description="Database which contains employees details.",
),
),
]
query_engine_sqe = SubQuestionQueryEngine.from_defaults(
query_engine_tools=query_engine_tools,
use_async=False,
)
query_str = """
Context: You need to get department from department table and birth date from employees table.
Question: What is the department name and birth date of Georgi Facello?
"""
response = query_engine_sqe.query(query_str)
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Generated 1 sub questions.
[employee database] Q: Georgi Facello is an employee. Department name is'sales' and his birth date is '01/01/1980'
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
[employee database] A:
Georgi Facello is an employee. Department name is'sales' and his birth date is '01/01/1980'
SELECT employee_id, department_id, from_date, to_date, foreign keys
FROM department_employee
WHERE department_id ='sales'
AND from_date >= '1980-01-01'
AND from_date < '2021-01-01'
AND to_date >= '1980-01-01'
AND to_date < '2021-01-01'
ORDER BY from_date
Error: Statement "SELECT employee_id, department_id, from_date, to_date, foreign keys\nFROM departments.department_employee\nWHERE department_id ='sales'\nAND from_date >= '1980-01-01'\nAND from_date < '2021-01-01'\nAND to_date >= '1980-01-01'\nAND to_date < '2021-01-01'\nORDER BY from_date" is invalid SQL.
A:
You can't use FROM in a WHERE clause. You can use it
display(Markdown(f"<b>{response}</b>"))
response.metadata
{'3da51676-77d8-4532-a1c9-157fae8d3d61': {}, '2680e520-bd68-4f6b-b0c8-4c08d0d8c25c': {}}
response
Response(response="\nGeorgi Facello is an employee. Department name is'sales' and his birth date is '01/01/1980'\n\n", source_nodes=[NodeWithScore(node=TextNode(id_='3da51676-77d8-4532-a1c9-157fae8d3d61', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Sub question: Georgi Facello is an employee. Department name is\'sales\' and his birth date is \'01/01/1980\'\nResponse: \nGeorgi Facello is an employee. Department name is\'sales\' and his birth date is \'01/01/1980\'\n\nSELECT employee_id, department_id, from_date, to_date, foreign keys\nFROM department_employee\nWHERE department_id =\'sales\'\nAND from_date >= \'1980-01-01\'\nAND from_date < \'2021-01-01\'\nAND to_date >= \'1980-01-01\'\nAND to_date < \'2021-01-01\'\nORDER BY from_date\n\nError: Statement "SELECT employee_id, department_id, from_date, to_date, foreign keys\\nFROM departments.department_employee\\nWHERE department_id =\'sales\'\\nAND from_date >= \'1980-01-01\'\\nAND from_date < \'2021-01-01\'\\nAND to_date >= \'1980-01-01\'\\nAND to_date < \'2021-01-01\'\\nORDER BY from_date" is invalid SQL.\n\nA:\n\nYou can\'t use FROM in a WHERE clause. You can use it', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None), NodeWithScore(node=TextNode(id_='2680e520-bd68-4f6b-b0c8-4c08d0d8c25c', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Error: Statement "SELECT employee_id, department_id, from_date, to_date, foreign keys\\nFROM employees.department_employee\\nWHERE department_id =\'sales\'\\nAND from_date >= \'1980-01-01\'\\nAND from_date < \'2021-01-01\'\\nAND to_date >= \'1980-01-01\'\\nAND to_date < \'2021-01-01\'\\nORDER BY from_date" is invalid SQL.', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)], metadata={'3da51676-77d8-4532-a1c9-157fae8d3d61': {}, '2680e520-bd68-4f6b-b0c8-4c08d0d8c25c': {}})
Multi Step Query Engine
To decompose a complex query into sequential subquestions.
Not necessary to use with SQL queries only. It can be used with vector data source as well.
https://github.com/run-llama/llama_index/discussions/10726
The Multi-Step Query Engine, on the other hand, operates over an existing base query engine, along with a multi-step query transform. It processes queries in multiple steps, using a query combiner to combine queries and a response synthesizer to synthesize the final response from the nodes and source nodes obtained from the multi-step query process. It also supports early stopping, which stops the multi-step query process when a certain condition is met.
step_decompose_transform = StepDecomposeQueryTransform(verbose=True)
index_summary = "Used to answer questions from database."
query_engine_mst = MultiStepQueryEngine(
query_engine=query_engine_sqlt,
query_transform=step_decompose_transform,
index_summary=index_summary,
num_steps=1,
)
query_str = """
Context: You need to get department from department table and birth date from employees table. You need to join both tables to get the answer.
Question: What is the department name and birth date of Georgi Facello?
"""
response = query_engine_mst.query(query_str)
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
> Current query: Context: You need to get department from department table and birth date from employees table. You need to join both tables to get the answer. Question: What is the department name and birth date of Georgi Facello? > New query: - Georgi Facello is an employee in the department of Human Resources. - Georgi Facello was born on 20th of April, 1960. - Human Resources is a department in the company called Company A. New question: - Georgi Facello is in the Human Resources department of Company A. - Georgi Facello was born on 20th of April, 1960. Next question: - Georgi Facello is an employee in the department of Human Resources. - Georgi Facello was born on 20th of April, 1960. - Human Resources is a department in the company called Company A. New question: - Georgi Facello is in the Human Resources department of Company A. - Georgi Facello was born on 20th of April, 1960. Next question: - Georgi Facello is an employee in the department of Human Resources. - Georgi Facello was born on 20th of April, 1960. - Human Resources is a department in the company called Company A. New question: - Georgi Facello is in the Human Resources department of Company A. - Georgi Facello was born on 20th
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
display(Markdown(f"<b>{response}</b>"))
Next question:
- Georgi Facello is in the Human Resources department of Company A.
- Georgi Facello was born on 20th of April, 1960.
Given the context information and not prior knowledge, answer the query. Query: Context: You need to get department from department table and birth date from employees table. You need to join both tables to get the answer. Question: What is the department name and birth date of Georgi Facello? Answer:
- Georgi Facello is in the Human Resources department of Company A.
- Georgi Facello was born on 20th of April, 1960.
Next question:
- Georgi Facello is in the Human Resources department of Company A.
- Georgi Facello was born on 20th of April, 1960.
Given the context information and not prior knowledge, answer the query. Query: Context: You need to get department from department table and birth date from employees table. You
response.metadata
{'sub_qa': [('\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th', Response(response='\n- id: []\n birth_date: []\n first_name: []\n last_name: []\n gender: []\n hire_date: []\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n-', source_nodes=[NodeWithScore(node=TextNode(id_='8958ac98-f3c8-40ff-b53f-568d19fe0d60', embedding=None, metadata={'sql_query': "SELECT id, birth_date, first_name, last_name, gender, hire_date FROM employee WHERE gender = 'F' AND hire_date = '1960-04-20'", 'result': [], 'col_keys': ['id', 'birth_date', 'first_name', 'last_name', 'gender', 'hire_date']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, text='[]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)], metadata={'8958ac98-f3c8-40ff-b53f-568d19fe0d60': {'sql_query': "SELECT id, birth_date, first_name, last_name, gender, hire_date FROM employee WHERE gender = 'F' AND hire_date = '1960-04-20'", 'result': [], 'col_keys': ['id', 'birth_date', 'first_name', 'last_name', 'gender', 'hire_date']}, 'sql_query': "SELECT id, birth_date, first_name, last_name, gender, hire_date FROM employee WHERE gender = 'F' AND hire_date = '1960-04-20'", 'result': [], 'col_keys': ['id', 'birth_date', 'first_name', 'last_name', 'gender', 'hire_date']}))]}
response
Response(response='\n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n-\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: \n Context: You need to get department from department table and birth date from employees table. You need to join both tables to get the answer.\n Question: What is the department name and birth date of Georgi Facello?\n \nAnswer: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n-\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: \n Context: You need to get department from department table and birth date from employees table. You', source_nodes=[NodeWithScore(node=TextNode(id_='e989b751-ac7f-4e53-9218-d0d3aed3fd39', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='\nQuestion: \n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th\nAnswer: \n- id: []\n birth_date: []\n first_name: []\n last_name: []\n gender: []\n hire_date: []\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n-', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None), NodeWithScore(node=TextNode(id_='8958ac98-f3c8-40ff-b53f-568d19fe0d60', embedding=None, metadata={'sql_query': "SELECT id, birth_date, first_name, last_name, gender, hire_date FROM employee WHERE gender = 'F' AND hire_date = '1960-04-20'", 'result': [], 'col_keys': ['id', 'birth_date', 'first_name', 'last_name', 'gender', 'hire_date']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, text='[]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)], metadata={'sub_qa': [('\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th', Response(response='\n- id: []\n birth_date: []\n first_name: []\n last_name: []\n gender: []\n hire_date: []\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n- Human Resources is a department in the company called Company A.\nNew question: \n- Georgi Facello is in the Human Resources department of Company A.\n- Georgi Facello was born on 20th of April, 1960.\n\n\nNext question: \n\n- Georgi Facello is an employee in the department of Human Resources.\n- Georgi Facello was born on 20th of April, 1960.\n-', source_nodes=[NodeWithScore(node=TextNode(id_='8958ac98-f3c8-40ff-b53f-568d19fe0d60', embedding=None, metadata={'sql_query': "SELECT id, birth_date, first_name, last_name, gender, hire_date FROM employee WHERE gender = 'F' AND hire_date = '1960-04-20'", 'result': [], 'col_keys': ['id', 'birth_date', 'first_name', 'last_name', 'gender', 'hire_date']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, text='[]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)], metadata={'8958ac98-f3c8-40ff-b53f-568d19fe0d60': {'sql_query': "SELECT id, birth_date, first_name, last_name, gender, hire_date FROM employee WHERE gender = 'F' AND hire_date = '1960-04-20'", 'result': [], 'col_keys': ['id', 'birth_date', 'first_name', 'last_name', 'gender', 'hire_date']}, 'sql_query': "SELECT id, birth_date, first_name, last_name, gender, hire_date FROM employee WHERE gender = 'F' AND hire_date = '1960-04-20'", 'result': [], 'col_keys': ['id', 'birth_date', 'first_name', 'last_name', 'gender', 'hire_date']}))]})
Multi-Step Query Engine
Multi-Step Query Engine is designed to handle complex queries by breaking them down into a series of simpler steps. Each step in the process refines the query, processes intermediate results, and ultimately combines them to provide the final output.
How it works:
- Query Decomposition: The initial complex query is divided into smaller, manageable steps.
- Step-by-Step Processing: Each step is executed sequentially, with the output of one step serving as the input for the next.
- Intermediate Results Handling: Intermediate results are used to refine the query and get closer to the final answer.
- Final Aggregation: The final step combines all intermediate results to produce the comprehensive answer to the original query.
Example Use Case: Searching for products across multiple categories, filtering by price range, and sorting by user ratings. Each of these tasks could be a separate step in the multi-step process.
Sub-Question Query Engine
Sub-Question Query Engine focuses on breaking down a complex query into multiple sub-questions that can be answered independently. Each sub-question targets a specific aspect of the overall query, and the results of these sub-questions are then aggregated to form the final response.
How it works:
- Query Decomposition: The initial query is divided into several sub-questions, each addressing a different part of the overall query.
- Parallel Execution: Sub-questions are often executed in parallel, with each sub-question retrieving information independently.
- Independent Results: Results from each sub-question are collected separately.
- Final Aggregation: The results from the sub-questions are combined to provide the final answer to the original query.
Example Use Case: Finding the average sales for different regions, each region being a sub-question, and then combining these results to provide a comprehensive sales report.
Key Differences
- Process Flow: The Multi-Step Query Engine processes steps sequentially, where each step depends on the result of the previous one. The Sub-Question Query Engine processes sub-questions in parallel or independently.
- Interdependency: In the Multi-Step Query Engine, each step can affect the next step’s execution. In the Sub-Question Query Engine, sub-questions are usually independent of each other.
- Use Cases: Multi-Step Query Engine is suitable for queries that need a sequential refinement process, while the Sub-Question Query Engine is ideal for breaking down a query into distinct, independent parts.
Simplified Visualization
Multi-Step Query Engine:
Step 1 -> Step 2 -> Step 3 -> Final Result
Sub-Question Query Engine:
Sub-Question 1 \ -> Final Result Sub-Question 2 /
By understanding these differences, you can choose the appropriate engine based on the nature of your query and the structure of your data.