VsdxParser#

class langchain_community.document_loaders.parsers.vsdx.VsdxParser[source]#

Parser for vsdx files.

Methods

__init__()

get_pages_content(zfile, source)

Get the content of the pages of a vsdx file.

get_relationships(page, zfile, filelist, ...)

Get the relationships of a page and the relationships of its relationships, etc.

lazy_parse(blob)

Retrieve the contents of pages from a .vsdx file and insert them into documents, one document per page.

parse(blob)

Parse a vsdx file.

__init__()#
get_pages_content(zfile: ZipFile, source: str) List[Tuple[int, str, str]][source]#

Get the content of the pages of a vsdx file.

zfile#

The vsdx file under zip format.

Type:

zipfile.ZipFile

source#

The path of the vsdx file.

Type:

str

Returns:

A list of tuples containing the page number, the name of the page and the content of the page for each page of the vsdx file.

Return type:

list[tuple[int, str, str]]

Parameters:
  • zfile (ZipFile) –

  • source (str) –

get_relationships(page: str, zfile: ZipFile, filelist: List[str], pagexml_rels: List[dict]) Set[str][source]#

Get the relationships of a page and the relationships of its relationships, etc… recursively. Pages are based on other pages (ex: background page), so we need to get all the relationships to get all the content of a single page.

Parameters:
  • page (str) –

  • zfile (ZipFile) –

  • filelist (List[str]) –

  • pagexml_rels (List[dict]) –

Return type:

Set[str]

lazy_parse(blob: Blob) Iterator[Document][source]#

Retrieve the contents of pages from a .vsdx file and insert them into documents, one document per page.

Parameters:

blob (Blob) –

Return type:

Iterator[Document]

parse(blob: Blob) Iterator[Document][source]#

Parse a vsdx file.

Parameters:

blob (Blob) –

Return type:

Iterator[Document]