rl_chain
#
RL (Reinforcement Learning) Chain leverages the Vowpal Wabbit (VW) models for reinforcement learning with a context, with the goal of modifying the prompt before the LLM call.
[Vowpal Wabbit](https://vowpalwabbit.org/) provides fast, efficient, and flexible online machine learning techniques for reinforcement learning, supervised learning, and more.
Classes
Auto selection scorer. |
|
|
Abstract class to represent an embedder. |
|
Abstract class to represent an event. |
|
Abstract class to represent a policy. |
Chain that leverages the Vowpal Wabbit (VW) model as a learned policy for reinforcement learning. |
|
Abstract class to represent the selected item. |
|
Abstract class to grade the chosen selection or the response of the llm. |
|
|
Vowpal Wabbit policy. |
Metrics Tracker Average. |
|
Metrics Tracker Rolling Window. |
|
Model Repository. |
|
Chain that leverages the Vowpal Wabbit (VW) model for reinforcement learning with a context, with the goal of modifying the prompt before the LLM call. |
|
Event class for PickBest chain. |
|
Embed the BasedOn and ToSelectFrom inputs into a format that can be used by the learning policy. |
|
Random policy for PickBest chain. |
|
Selected class for PickBest chain. |
|
Vowpal Wabbit custom logger. |
Functions
|
Wrap a value to indicate that it should be based on. |
|
Wrap a value to indicate that it should be embedded. |
|
Wrap a value to indicate that it should be embedded and kept. |
|
Wrap a value to indicate that it should be selected from. |
|
Embed the actions or context using the SentenceTransformer model (or a model that has an encode function). |
|
Embed a dictionary item. |
|
Embed a list item. |
|
Embed a string or an _Embed object. |
Get the BasedOn and ToSelectFrom from the inputs. |
|
Check if an item is a string. |
|
|
Parse the input string into a list of examples. |
Prepare the inputs for auto embedding. |
|
|
Convert an embedding to a string. |