Adaptors

bocoel.models.adaptors

bocoel.Adaptor

Bases: Protocol

Adaptors are the glue between scores and the corpus. It is designed to handle running a particular score on a particular corpus / dataset.

evaluate `abstractmethod`

evaluate(data: Mapping[str, Sequence[Any]]) -> Sequence[float] | NDArray

Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.

Parameters:

Name	Type	Description	Default
`data`	`Mapping[str, Sequence[Any]]`	A mapping from column names to the data in that column.	required

Returns:

Type	Description
`Sequence[float] \| NDArray`	The scores for each entry. Scores must be floating point numbers.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

@abc.abstractmethod
def evaluate(self, data: Mapping[str, Sequence[Any]]) -> Sequence[float] | NDArray:
    """
    Evaluate a particular set of entries with a language model.
    Returns a list of scores, one for each entry, in the same order.

    Parameters:
        data: A mapping from column names to the data in that column.

    Returns:
        The scores for each entry. Scores must be floating point numbers.
    """

    ...

on_storage

on_storage(storage: Storage, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_storage(self, storage: Storage, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a storage.
    Given indices and a storage,
    this method will extract the corresponding entries from the storage,
    and evaluate them with `Adaptor.evaluate`.

    Parameters:
        storage: The storage to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    indices = np.array(indices).astype("i")

    # Reshape the indices into 1D to evaluate.
    indices_shape = indices.shape
    indices = indices.ravel()

    items = storage[indices.tolist()]
    result = np.array(self.evaluate(data=items))

    # Reshape back.
    return result.reshape(indices_shape)

on_corpus

on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage.

Parameters:

Name	Type	Description	Default
`corpus`	`Corpus`	The corpus to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_corpus(self, corpus: Corpus, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a corpus.
    A convenience wrapper around `Adaptor.on_storage`.

    Parameters:
        corpus: The corpus to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    return self.on_storage(storage=corpus.storage, indices=indices)

bocoel.GlueAdaptor

GlueAdaptor(
    lm: ClassifierModel,
    texts: str = "text",
    label: str = "label",
    label_text: str = "label_text",
    choices: Sequence[str] = ("negative", "positive"),
)

Bases: Adaptor

The adaptor for the glue dataset provided by setfit.

Glue is a collection of datasets for natural language understanding tasks. The datasets are designed to be challenging and diverse, and they are collected from a variety of sources. They are mostly sentence-level classification tasks.

This adaptor is compatible with all classifier models, and it is designed to work with the glue dataset (in the format of setfit datasets on huggingface datasets).

Setfit datasets have the following columns:

text: The text to classify.
label: The label of the text.
label_text: The text of the label.

Initialize the adaptor.

Parameters:

Name	Type	Description	Default
`lm`	`ClassifierModel`	The language model to use for classification.	required
`texts`	`str`	The column name for the text to classify.	`'text'`
`label`	`str`	The column name for the label of the text.	`'label'`
`label_text`	`str`	The column name for the text of the label.	`'label_text'`
`choices`	`Sequence[str]`	The valid choices for the label.	`('negative', 'positive')`

Source code in src/bocoel/models/adaptors/glue/setfit.py

def __init__(
    self,
    lm: ClassifierModel,
    texts: str = "text",
    label: str = "label",
    label_text: str = "label_text",
    choices: Sequence[str] = ("negative", "positive"),
) -> None:
    """
    Initialize the adaptor.

    Parameters:
        lm: The language model to use for classification.
        texts: The column name for the text to classify.
        label: The column name for the label of the text.
        label_text: The column name for the text of the label.
        choices: The valid choices for the label.
    """

    self.lm = lm

    self.texts = texts.split()
    self.label = label
    self.label_text = label_text
    self.choices = choices

on_storage

on_storage(storage: Storage, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_storage(self, storage: Storage, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a storage.
    Given indices and a storage,
    this method will extract the corresponding entries from the storage,
    and evaluate them with `Adaptor.evaluate`.

    Parameters:
        storage: The storage to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    indices = np.array(indices).astype("i")

    # Reshape the indices into 1D to evaluate.
    indices_shape = indices.shape
    indices = indices.ravel()

    items = storage[indices.tolist()]
    result = np.array(self.evaluate(data=items))

    # Reshape back.
    return result.reshape(indices_shape)

on_corpus

on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage.

Parameters:

Name	Type	Description	Default
`corpus`	`Corpus`	The corpus to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_corpus(self, corpus: Corpus, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a corpus.
    A convenience wrapper around `Adaptor.on_storage`.

    Parameters:
        corpus: The corpus to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    return self.on_storage(storage=corpus.storage, indices=indices)

task_choices `staticmethod`

task_choices(
    name: Literal["sst2", "mrpc", "mnli", "qqp", "rte", "qnli"],
    split: Literal["train", "validation", "test"],
) -> Sequence[str]

Get the valid choices for a particular task and split.

Parameters:

Name	Type	Description	Default
`name`	`Literal['sst2', 'mrpc', 'mnli', 'qqp', 'rte', 'qnli']`	The name of the task.	required
`split`	`Literal['train', 'validation', 'test']`	The split of the task.	required

Returns:

Type	Description
`Sequence[str]`	The valid choices for the task and split.

Source code in src/bocoel/models/adaptors/glue/setfit.py

@staticmethod
def task_choices(
    name: Literal["sst2", "mrpc", "mnli", "qqp", "rte", "qnli"],
    split: Literal["train", "validation", "test"],
) -> Sequence[str]:
    """
    Get the valid choices for a particular task and split.

    Parameters:
        name: The name of the task.
        split: The split of the task.

    Returns:
        The valid choices for the task and split.
    """

    LOGGER.debug("Getting choices for task", task=name)

    # Perform checks for supported kinds of datasets.
    match name:
        case "sst2" | "mrpc" | "mnli" | "qqp" | "rte" | "qnli":
            pass
        case _:
            raise ValueError(f"Unknown task name {name}")

    # Perform checks for supported kinds of splits.
    match split:
        case "train" | "validation" | "test":
            pass
        case _:
            raise ValueError(f"Unknown split {split}")

    # The actual mux.
    match name, split:
        case "sst2", _:
            return ["negative", "positive"]
        case "mrpc", _:
            return ["not equivalent", "equivalent"]
        # All following cases all use "unlabeled" for "test".
        case _, "test":
            return ["unlabeled"]
        case "mnli", _:
            return ["entailment", "neutral", "contradiction"]
        case "qqp", _:
            return ["not duplicate", "duplicate"]
        case "rte", _:
            return ["entailment", "not entailment"]
        case "qnli", _:
            return ["entailment", "not entailment"]

    raise RuntimeError("Unreachable")

bocoel.BigBenchAdaptor

Bases: Adaptor, Protocol

evaluate `abstractmethod`

evaluate(data: Mapping[str, Sequence[Any]]) -> Sequence[float] | NDArray

Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.

Parameters:

Name	Type	Description	Default
`data`	`Mapping[str, Sequence[Any]]`	A mapping from column names to the data in that column.	required

Returns:

Type	Description
`Sequence[float] \| NDArray`	The scores for each entry. Scores must be floating point numbers.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

@abc.abstractmethod
def evaluate(self, data: Mapping[str, Sequence[Any]]) -> Sequence[float] | NDArray:
    """
    Evaluate a particular set of entries with a language model.
    Returns a list of scores, one for each entry, in the same order.

    Parameters:
        data: A mapping from column names to the data in that column.

    Returns:
        The scores for each entry. Scores must be floating point numbers.
    """

    ...

on_storage

on_storage(storage: Storage, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_storage(self, storage: Storage, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a storage.
    Given indices and a storage,
    this method will extract the corresponding entries from the storage,
    and evaluate them with `Adaptor.evaluate`.

    Parameters:
        storage: The storage to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    indices = np.array(indices).astype("i")

    # Reshape the indices into 1D to evaluate.
    indices_shape = indices.shape
    indices = indices.ravel()

    items = storage[indices.tolist()]
    result = np.array(self.evaluate(data=items))

    # Reshape back.
    return result.reshape(indices_shape)

on_corpus

on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage.

Parameters:

Name	Type	Description	Default
`corpus`	`Corpus`	The corpus to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_corpus(self, corpus: Corpus, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a corpus.
    A convenience wrapper around `Adaptor.on_storage`.

    Parameters:
        corpus: The corpus to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    return self.on_storage(storage=corpus.storage, indices=indices)

bocoel.BigBenchQuestionAnswer

BigBenchQuestionAnswer(
    lm: GenerativeModel,
    inputs: str = "inputs",
    targets: str = "targets",
    matching_type: str | BigBenchMatchType = BigBenchMatchType.EXACT,
)

Bases: BigBenchAdaptor

Source code in src/bocoel/models/adaptors/bigbench/matching.py

def __init__(
    self,
    lm: GenerativeModel,
    inputs: str = "inputs",
    targets: str = "targets",
    matching_type: str | BigBenchMatchType = BigBenchMatchType.EXACT,
) -> None:
    self.lm = lm

    self.inputs = inputs
    self.targets = targets

    self._matching_type = BigBenchMatchType.lookup(matching_type)
    self._score_fn = self._matching_type.score

on_storage

on_storage(storage: Storage, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_storage(self, storage: Storage, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a storage.
    Given indices and a storage,
    this method will extract the corresponding entries from the storage,
    and evaluate them with `Adaptor.evaluate`.

    Parameters:
        storage: The storage to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    indices = np.array(indices).astype("i")

    # Reshape the indices into 1D to evaluate.
    indices_shape = indices.shape
    indices = indices.ravel()

    items = storage[indices.tolist()]
    result = np.array(self.evaluate(data=items))

    # Reshape back.
    return result.reshape(indices_shape)

on_corpus

on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage.

Parameters:

Name	Type	Description	Default
`corpus`	`Corpus`	The corpus to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_corpus(self, corpus: Corpus, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a corpus.
    A convenience wrapper around `Adaptor.on_storage`.

    Parameters:
        corpus: The corpus to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    return self.on_storage(storage=corpus.storage, indices=indices)

bocoel.BigBenchMatchType

Bases: StrEnum

bocoel.BigBenchMultipleChoice

BigBenchMultipleChoice(
    lm: ClassifierModel,
    inputs: str = "inputs",
    multiple_choice_targets: str = "multiple_choice_targets",
    multiple_choice_scores: str = "multiple_choice_scores",
    choice_type: str | BigBenchChoiceType = BigBenchChoiceType.SUM_OF_SCORES,
)

Bases: BigBenchAdaptor

Source code in src/bocoel/models/adaptors/bigbench/multi.py

def __init__(
    self,
    lm: ClassifierModel,
    inputs: str = "inputs",
    multiple_choice_targets: str = "multiple_choice_targets",
    multiple_choice_scores: str = "multiple_choice_scores",
    choice_type: str | BigBenchChoiceType = BigBenchChoiceType.SUM_OF_SCORES,
) -> None:
    self.lm = lm

    self.inputs = inputs
    self.multiple_choice_targets = multiple_choice_targets
    self.multiple_choice_scores = multiple_choice_scores

    self._choice_type = BigBenchChoiceType.lookup(choice_type)
    self._score_fn = self._choice_type.score

on_storage

on_storage(storage: Storage, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_storage(self, storage: Storage, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a storage.
    Given indices and a storage,
    this method will extract the corresponding entries from the storage,
    and evaluate them with `Adaptor.evaluate`.

    Parameters:
        storage: The storage to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    indices = np.array(indices).astype("i")

    # Reshape the indices into 1D to evaluate.
    indices_shape = indices.shape
    indices = indices.ravel()

    items = storage[indices.tolist()]
    result = np.array(self.evaluate(data=items))

    # Reshape back.
    return result.reshape(indices_shape)

on_corpus

on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage.

Parameters:

Name	Type	Description	Default
`corpus`	`Corpus`	The corpus to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_corpus(self, corpus: Corpus, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a corpus.
    A convenience wrapper around `Adaptor.on_storage`.

    Parameters:
        corpus: The corpus to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    return self.on_storage(storage=corpus.storage, indices=indices)

numeric_choices `staticmethod`

numeric_choices(question: str, choices: Sequence[str]) -> str

Convert a multiple choice question into a numeric choice question. Returns a tuple of generated prompt and list of valid choices.

Source code in src/bocoel/models/adaptors/bigbench/multi.py

@staticmethod
def numeric_choices(question: str, choices: Sequence[str]) -> str:
    """
    Convert a multiple choice question into a numeric choice question.
    Returns a tuple of generated prompt and list of valid choices.
    """

    return (
        f"{question}\nSelect from one of the following (answer in number):\n"
        + "\n".join(f"{i}) {choice}" for i, choice in enumerate(choices, 1))
    )

bocoel.BigBenchChoiceType

Bases: StrEnum

bocoel.Sst2QuestionAnswer

Sst2QuestionAnswer(
    lm: ClassifierModel,
    sentence: str = "sentence",
    label: str = "label",
    choices: Sequence[str] = ("negative", "positive"),
)

Bases: Adaptor

The adaptor for the SST-2 dataset. This adaptor assumes that the dataset has the following columns: - idx: The index of the entry. - sentence: The sentence to classify. - label: The label of the sentence.

Each entry in the dataset must be a single sentence.

Parameters:

Name	Type	Description	Default
`lm`	`ClassifierModel`	The language model to use for classification.	required
`sentence`	`str`	The column name for the sentence to classify.	`'sentence'`
`label`	`str`	The column name for the label of the sentence.	`'label'`
`choices`	`Sequence[str]`	The valid choices for the label.	`('negative', 'positive')`

Source code in src/bocoel/models/adaptors/glue/sst.py

def __init__(
    self,
    lm: ClassifierModel,
    sentence: str = "sentence",
    label: str = "label",
    choices: Sequence[str] = ("negative", "positive"),
) -> None:
    """
    Parameters:
        lm: The language model to use for classification.
        sentence: The column name for the sentence to classify.
        label: The column name for the label of the sentence.
        choices: The valid choices for the label.
    """

    self.lm = lm

    self.sentence = sentence
    self.label = label
    self.choices = choices

on_storage

on_storage(storage: Storage, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_storage(self, storage: Storage, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a storage.
    Given indices and a storage,
    this method will extract the corresponding entries from the storage,
    and evaluate them with `Adaptor.evaluate`.

    Parameters:
        storage: The storage to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    indices = np.array(indices).astype("i")

    # Reshape the indices into 1D to evaluate.
    indices_shape = indices.shape
    indices = indices.ravel()

    items = storage[indices.tolist()]
    result = np.array(self.evaluate(data=items))

    # Reshape back.
    return result.reshape(indices_shape)

on_corpus

on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray

Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage.

Parameters:

Name	Type	Description	Default
`corpus`	`Corpus`	The corpus to evaluate.	required
`indices`	`ArrayLike`	The indices to evaluate.	required

Returns:

Type	Description
`NDArray`	The scores for each entry. The shape must be the same as the indices.

Source code in src/bocoel/models/adaptors/interfaces/adaptors.py

def on_corpus(self, corpus: Corpus, indices: ArrayLike) -> NDArray:
    """
    Evaluate a particular set of indices on a corpus.
    A convenience wrapper around `Adaptor.on_storage`.

    Parameters:
        corpus: The corpus to evaluate.
        indices: The indices to evaluate.

    Returns:
        The scores for each entry. The shape must be the same as the indices.
    """

    return self.on_storage(storage=corpus.storage, indices=indices)

Adaptors

bocoel.models.adaptors

bocoel.Adaptor

evaluate abstractmethod

on_storage

on_corpus

bocoel.GlueAdaptor

on_storage

on_corpus

task_choices staticmethod

bocoel.BigBenchAdaptor

evaluate abstractmethod

on_storage

on_corpus

bocoel.BigBenchQuestionAnswer

on_storage

on_corpus

bocoel.BigBenchMatchType

bocoel.BigBenchMultipleChoice

on_storage

on_corpus

numeric_choices staticmethod

bocoel.BigBenchChoiceType

bocoel.Sst2QuestionAnswer

on_storage

on_corpus

evaluate `abstractmethod`

task_choices `staticmethod`

evaluate `abstractmethod`

numeric_choices `staticmethod`