PebbloSafeLoader#

class langchain_community.document_loaders.pebblo.PebbloSafeLoader(langchain_loader: BaseLoader, name: str, owner: str = '', description: str = '', api_key: str | None = None, load_semantic: bool = False, classifier_url: str | None = None, *, classifier_location: str = 'local')[source]#

Pebblo Safe Loader class is a wrapper around document loaders enabling the data to be scrutinized.

Methods

__init__(langchain_loader,Β name[,Β owner,Β ...])

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

calculate_content_size(page_content)

Calculate the content size in bytes: - Encode the string to bytes using a specific encoding (e.g., UTF-8) - Get the length of the encoded bytes.

classify_in_batches()

Classify documents in batches.

get_file_owner_from_path(file_path)

Fetch owner of local file path.

get_source_size(source_path)

Fetch size of source path.

lazy_load()

Load documents in lazy fashion.

load()

Load Documents.

load_and_split([text_splitter])

Load Documents and split into chunks.

set_discover_sent()

set_loader_sent()

Parameters:
  • langchain_loader (BaseLoader) –

  • name (str) –

  • owner (str) –

  • description (str) –

  • api_key (str | None) –

  • load_semantic (bool) –

  • classifier_url (str | None) –

  • classifier_location (str) –

__init__(langchain_loader: BaseLoader, name: str, owner: str = '', description: str = '', api_key: str | None = None, load_semantic: bool = False, classifier_url: str | None = None, *, classifier_location: str = 'local')[source]#
Parameters:
  • langchain_loader (BaseLoader) –

  • name (str) –

  • owner (str) –

  • description (str) –

  • api_key (str | None) –

  • load_semantic (bool) –

  • classifier_url (str | None) –

  • classifier_location (str) –

async alazy_load() β†’ AsyncIterator[Document]#

A lazy loader for Documents.

Return type:

AsyncIterator[Document]

async aload() β†’ List[Document]#

Load data into Document objects.

Return type:

List[Document]

static calculate_content_size(page_content: str) β†’ int[source]#

Calculate the content size in bytes: - Encode the string to bytes using a specific encoding (e.g., UTF-8) - Get the length of the encoded bytes.

Parameters:

page_content (str) – Data string.

Returns:

Size of string in bytes.

Return type:

int

classify_in_batches() β†’ None[source]#

Classify documents in batches. This is to avoid API timeouts when sending large number of documents. Batches are generated based on the page_content size.

Return type:

None

static get_file_owner_from_path(file_path: str) β†’ str[source]#

Fetch owner of local file path.

Parameters:

file_path (str) – Local file path.

Returns:

Name of owner.

Return type:

str

get_source_size(source_path: str) β†’ int[source]#

Fetch size of source path. Source can be a directory or a file.

Parameters:

source_path (str) – Local path of data source.

Returns:

Source size in bytes.

Return type:

int

lazy_load() β†’ Iterator[Document][source]#

Load documents in lazy fashion.

Raises:
  • NotImplementedError – raised when lazy_load id not implemented

  • within wrapped loader. –

Yields:

list – Documents from loader’s lazy loading.

Return type:

Iterator[Document]

load() β†’ List[Document][source]#

Load Documents.

Returns:

Documents fetched from load method of the wrapped loader.

Return type:

list

load_and_split(text_splitter: TextSplitter | None = None) β†’ List[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns:

List of Documents.

Return type:

List[Document]

classmethod set_discover_sent() β†’ None[source]#
Return type:

None

classmethod set_loader_sent() β†’ None[source]#
Return type:

None

Examples using PebbloSafeLoader