Skip to main content
Ctrl+K
🦜🔗 LangChain  documentation - Home
  • Reference
  • Legacy reference
Ctrl+K
Docs
  • GitHub
  • X / Twitter
Ctrl+K
  • Reference
  • Legacy reference
Docs
  • GitHub
  • X / Twitter

Section Navigation

Base packages

  • Core
  • Langchain
  • Text Splitters
    • base
      • Language
      • TextSplitter
      • TokenTextSplitter
      • Tokenizer
      • split_text_on_tokens
    • character
    • html
    • json
    • konlpy
    • latex
    • markdown
    • nltk
    • python
    • sentence_transformers
    • spacy
  • Community
  • Experimental

Integrations

  • AI21
  • Airbyte
  • Anthropic
  • AstraDB
  • AWS
  • Azure Dynamic Sessions
  • Chroma
  • Cohere
  • Couchbase
  • Elasticsearch
  • Exa
  • Fireworks
  • Google Community
  • Google GenAI
  • Google VertexAI
  • Groq
  • Huggingface
  • Milvus
  • MistralAI
  • MongoDB
  • Nomic
  • Nvidia Ai Endpoints
  • Ollama
  • OpenAI
  • Pinecone
  • Postgres
  • Prompty
  • Qdrant
  • Robocorp
  • Together
  • Unstructured
  • VoyageAI
  • Weaviate
  • LangChain Python API Reference
  • langchain-text-splitters: 0.2.3

base#

Classes

base.Language(value[, names, module, ...])

Enum of the programming languages.

base.TextSplitter(chunk_size, chunk_overlap, ...)

Interface for splitting text into chunks.

base.TokenTextSplitter([encoding_name, ...])

Splitting text to tokens using model tokenizer.

base.Tokenizer(chunk_overlap, ...)

Tokenizer data class.

Functions

base.split_text_on_tokens(*, text, tokenizer)

Split incoming text and return chunks using tokenizer.

© Copyright 2023, LangChain Inc.