PebbloSafeLoader#

class langchain_community.document_loaders.pebblo.PebbloSafeLoader(langchain_loader: BaseLoader, name: str, owner: str = '', description: str = '', api_key: str | None = None, load_semantic: bool = False, classifier_url: str | None = None, *, classifier_location: str = 'local')[source]#

Pebblo Safe Loader class is a wrapper around document loaders enabling the data to be scrutinized.

Methods

`__init__`(langchain_loader, name[, owner, ...])
`alazy_load`()	A lazy loader for Documents.
`aload`()	Load data into Document objects.
`calculate_content_size`(page_content)	Calculate the content size in bytes: - Encode the string to bytes using a specific encoding (e.g., UTF-8) - Get the length of the encoded bytes.
`classify_in_batches`()	Classify documents in batches.
`get_file_owner_from_path`(file_path)	Fetch owner of local file path.
`get_source_size`(source_path)	Fetch size of source path.
`lazy_load`()	Load documents in lazy fashion.
`load`()	Load Documents.
`load_and_split`([text_splitter])	Load Documents and split into chunks.
`set_discover_sent`()
`set_loader_sent`()

Parameters:

langchain_loader (BaseLoader) –
name (str) –
owner (str) –
description (str) –
api_key (str | None) –
load_semantic (bool) –
classifier_url (str | None) –
classifier_location (str) –

__init__(langchain_loader: BaseLoader, name: str, owner: str = '', description: str = '', api_key: str | None = None, load_semantic: bool = False, classifier_url: str | None = None, *, classifier_location: str = 'local')[source]#

Parameters:

langchain_loader (BaseLoader) –
name (str) –
owner (str) –
description (str) –
api_key (str | None) –
load_semantic (bool) –
classifier_url (str | None) –
classifier_location (str) –

async alazy_load() → AsyncIterator[Document]#

A lazy loader for Documents.

Return type:: AsyncIterator[Document]

async aload() → List[Document]#

Load data into Document objects.

Return type:: List[Document]

static calculate_content_size(page_content: str) → int[source]#

Calculate the content size in bytes: - Encode the string to bytes using a specific encoding (e.g., UTF-8) - Get the length of the encoded bytes.

Parameters:: page_content (str) – Data string.
Returns:: Size of string in bytes.
Return type:: int

classify_in_batches() → None[source]#

Classify documents in batches. This is to avoid API timeouts when sending large number of documents. Batches are generated based on the page_content size.

Return type:: None

static get_file_owner_from_path(file_path: str) → str[source]#

Fetch owner of local file path.

Parameters:: file_path (str) – Local file path.
Returns:: Name of owner.
Return type:: str

get_source_size(source_path: str) → int[source]#

Fetch size of source path. Source can be a directory or a file.

Parameters:: source_path (str) – Local path of data source.
Returns:: Source size in bytes.
Return type:: int

lazy_load() → Iterator[Document][source]#

Load documents in lazy fashion.

Raises:

NotImplementedError – raised when lazy_load id not implemented
within wrapped loader. –

Yields:

list – Documents from loader’s lazy loading.

Return type:

Iterator[Document]

load() → List[Document][source]#

Load Documents.

Returns:: Documents fetched from load method of the wrapped loader.
Return type:: list

load_and_split(text_splitter: TextSplitter | None = None) → List[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns:: List of Documents.
Return type:: List[Document]

classmethod set_discover_sent() → None[source]#

Return type:: None

classmethod set_loader_sent() → None[source]#

Return type:: None

Examples using PebbloSafeLoader

Pebblo Safe DocumentLoader