JSONLoader
This notebook provides a quick overview for getting started with JSON document loader. For detailed documentation of all JSONLoader features and configurations head to the API reference.
- TODO: Add any other relevant links, like information about underlying API, etc.
Overview
Integration details
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
JSONLoader | langchain_community | ✅ | ❌ | ✅ |
Loader features
Source | Document Lazy Loading | Native Async Support |
---|---|---|
JSONLoader | ✅ | ❌ |
Setup
To access JSON document loader you'll need to install the langchain-community
integration package as well as the jq
python package.
Credentials
No credentials are required to use the JSONLoader
class.
If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installation
Install langchain_community and jq:
%pip install -qU langchain_community jq
Initialization
Now we can instantiate our model object and load documents:
- TODO: Update model instantiation with relevant params.
from langchain_community.document_loaders import JSONLoader
loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[].content",
text_content=False,
)
Load
docs = loader.load()
docs[0]
Document(metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}, page_content='Bye!')
print(docs[0].metadata)
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}
Lazy Load
pages = []
for doc in loader.lazy_load():
pages.append(doc)
if len(pages) >= 10:
# do some paged operation, e.g.
# index.upsert(pages)
pages = []
Read from JSON Lines file
If you want to load documents from a JSON Lines file, you pass json_lines=True
and specify jq_schema
to extract page_content
from a single JSON object.
loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".content",
text_content=False,
json_lines=True,
)
docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
Read specific content keys
Another option is to set jq_schema='.'
and provide a content_key
in order to only load specific content:
loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".",
content_key="sender_name",
json_lines=True,
)
docs = loader.load()
print(docs[0])
page_content='User 2' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
JSON file with jq schema content_key
To load documents from a JSON file using the content_key
within the jq schema, set is_content_key_jq_parsable=True
. Ensure that content_key
is compatible and can be parsed using the jq schema.
loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[]",
content_key=".content",
is_content_key_jq_parsable=True,
)
docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}