Adaptors
bocoel.models.adaptors
bocoel.Adaptor
Bases: Protocol
Adaptors are the glue between scores and the corpus. It is designed to handle running a particular score on a particular corpus / dataset.
evaluate abstractmethod
evaluate(data: Mapping[str, Sequence[Any]]) -> Sequence[float] | NDArray
Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | Mapping[str, Sequence[Any]] | A mapping from column names to the data in that column. | required |
Returns:
Type | Description |
---|---|
Sequence[float] | NDArray | The scores for each entry. Scores must be floating point numbers. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
on_storage
on_storage(storage: Storage, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
storage | Storage | The storage to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
on_corpus
on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus | Corpus | The corpus to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
bocoel.GlueAdaptor
GlueAdaptor(
lm: ClassifierModel,
texts: str = "text",
label: str = "label",
label_text: str = "label_text",
choices: Sequence[str] = ("negative", "positive"),
)
Bases: Adaptor
The adaptor for the glue dataset provided by setfit.
Glue is a collection of datasets for natural language understanding tasks. The datasets are designed to be challenging and diverse, and they are collected from a variety of sources. They are mostly sentence-level classification tasks.
This adaptor is compatible with all classifier models, and it is designed to work with the glue dataset (in the format of setfit datasets on huggingface datasets).
Setfit datasets have the following columns:
- text: The text to classify.
- label: The label of the text.
- label_text: The text of the label.
Initialize the adaptor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lm | ClassifierModel | The language model to use for classification. | required |
texts | str | The column name for the text to classify. | 'text' |
label | str | The column name for the label of the text. | 'label' |
label_text | str | The column name for the text of the label. | 'label_text' |
choices | Sequence[str] | The valid choices for the label. | ('negative', 'positive') |
Source code in src/bocoel/models/adaptors/glue/setfit.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
on_storage
on_storage(storage: Storage, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
storage | Storage | The storage to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
on_corpus
on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus | Corpus | The corpus to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
task_choices staticmethod
task_choices(
name: Literal["sst2", "mrpc", "mnli", "qqp", "rte", "qnli"],
split: Literal["train", "validation", "test"],
) -> Sequence[str]
Get the valid choices for a particular task and split.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name | Literal['sst2', 'mrpc', 'mnli', 'qqp', 'rte', 'qnli'] | The name of the task. | required |
split | Literal['train', 'validation', 'test'] | The split of the task. | required |
Returns:
Type | Description |
---|---|
Sequence[str] | The valid choices for the task and split. |
Source code in src/bocoel/models/adaptors/glue/setfit.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
bocoel.BigBenchAdaptor
Bases: Adaptor
, Protocol
evaluate abstractmethod
evaluate(data: Mapping[str, Sequence[Any]]) -> Sequence[float] | NDArray
Evaluate a particular set of entries with a language model. Returns a list of scores, one for each entry, in the same order.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | Mapping[str, Sequence[Any]] | A mapping from column names to the data in that column. | required |
Returns:
Type | Description |
---|---|
Sequence[float] | NDArray | The scores for each entry. Scores must be floating point numbers. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
on_storage
on_storage(storage: Storage, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
storage | Storage | The storage to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
on_corpus
on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus | Corpus | The corpus to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
bocoel.BigBenchQuestionAnswer
BigBenchQuestionAnswer(
lm: GenerativeModel,
inputs: str = "inputs",
targets: str = "targets",
matching_type: str | BigBenchMatchType = BigBenchMatchType.EXACT,
)
Bases: BigBenchAdaptor
Source code in src/bocoel/models/adaptors/bigbench/matching.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
on_storage
on_storage(storage: Storage, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
storage | Storage | The storage to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
on_corpus
on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus | Corpus | The corpus to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
bocoel.BigBenchMatchType
Bases: StrEnum
bocoel.BigBenchMultipleChoice
BigBenchMultipleChoice(
lm: ClassifierModel,
inputs: str = "inputs",
multiple_choice_targets: str = "multiple_choice_targets",
multiple_choice_scores: str = "multiple_choice_scores",
choice_type: str | BigBenchChoiceType = BigBenchChoiceType.SUM_OF_SCORES,
)
Bases: BigBenchAdaptor
Source code in src/bocoel/models/adaptors/bigbench/multi.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
on_storage
on_storage(storage: Storage, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
storage | Storage | The storage to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
on_corpus
on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus | Corpus | The corpus to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
numeric_choices staticmethod
numeric_choices(question: str, choices: Sequence[str]) -> str
Convert a multiple choice question into a numeric choice question. Returns a tuple of generated prompt and list of valid choices.
Source code in src/bocoel/models/adaptors/bigbench/multi.py
105 106 107 108 109 110 111 112 113 114 115 |
|
bocoel.BigBenchChoiceType
Bases: StrEnum
bocoel.Sst2QuestionAnswer
Sst2QuestionAnswer(
lm: ClassifierModel,
sentence: str = "sentence",
label: str = "label",
choices: Sequence[str] = ("negative", "positive"),
)
Bases: Adaptor
The adaptor for the SST-2 dataset. This adaptor assumes that the dataset has the following columns: - idx
: The index of the entry. - sentence
: The sentence to classify. - label
: The label of the sentence.
Each entry in the dataset must be a single sentence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lm | ClassifierModel | The language model to use for classification. | required |
sentence | str | The column name for the sentence to classify. | 'sentence' |
label | str | The column name for the label of the sentence. | 'label' |
choices | Sequence[str] | The valid choices for the label. | ('negative', 'positive') |
Source code in src/bocoel/models/adaptors/glue/sst.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
on_storage
on_storage(storage: Storage, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a storage. Given indices and a storage, this method will extract the corresponding entries from the storage, and evaluate them with Adaptor.evaluate
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
storage | Storage | The storage to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
on_corpus
on_corpus(corpus: Corpus, indices: ArrayLike) -> NDArray
Evaluate a particular set of indices on a corpus. A convenience wrapper around Adaptor.on_storage
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus | Corpus | The corpus to evaluate. | required |
indices | ArrayLike | The indices to evaluate. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry. The shape must be the same as the indices. |
Source code in src/bocoel/models/adaptors/interfaces/adaptors.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|