JSONLoader
This notebook provides a quick overview for getting started with JSON document loader. For detailed documentation of all JSONLoader features and configurations head to the API reference.
- TODO: Add any other relevant links, like information about underlying API, etc.
Overview
Integration details
| Class | Package | Local | Serializable | JS support | 
|---|---|---|---|---|
| JSONLoader | langchain_community | ✅ | ❌ | ✅ | 
Loader features
| Source | Document Lazy Loading | Native Async Support | 
|---|---|---|
| JSONLoader | ✅ | ❌ | 
Setup
To access JSON document loader you'll need to install the langchain-community integration package as well as the jq python package.
Credentials
No credentials are required to use the JSONLoader class.
If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installation
Install langchain_community and jq:
%pip install -qU langchain_community jq 
Initialization
Now we can instantiate our model object and load documents:
- TODO: Update model instantiation with relevant params.
from langchain_community.document_loaders import JSONLoader
loader = JSONLoader(
    file_path="./example_data/facebook_chat.json",
    jq_schema=".messages[].content",
    text_content=False,
)
Load
docs = loader.load()
docs[0]
Document(metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}, page_content='Bye!')
print(docs[0].metadata)
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}
Lazy Load
pages = []
for doc in loader.lazy_load():
    pages.append(doc)
    if len(pages) >= 10:
        # do some paged operation, e.g.
        # index.upsert(pages)
        pages = []
Read from JSON Lines file
If you want to load documents from a JSON Lines file, you pass json_lines=True
and specify jq_schema to extract page_content from a single JSON object.
loader = JSONLoader(
    file_path="./example_data/facebook_chat_messages.jsonl",
    jq_schema=".content",
    text_content=False,
    json_lines=True,
)
docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
Read specific content keys
Another option is to set jq_schema='.' and provide a content_key in order to only load specific content:
loader = JSONLoader(
    file_path="./example_data/facebook_chat_messages.jsonl",
    jq_schema=".",
    content_key="sender_name",
    json_lines=True,
)
docs = loader.load()
print(docs[0])
page_content='User 2' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
JSON file with jq schema content_key
To load documents from a JSON file using the content_key within the jq schema, set is_content_key_jq_parsable=True. Ensure that content_key is compatible and can be parsed using the jq schema.
loader = JSONLoader(
    file_path="./example_data/facebook_chat.json",
    jq_schema=".messages[]",
    content_key=".content",
    is_content_key_jq_parsable=True,
)
docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}