Exams
bocoel.core.exams
The exams module provides the functionality to create and manage exams. Here, an exam is used to measure how well the corpus or the model performs on a given task.
The module provides the following functionality:
Examinator
s are responsible for launch exams.Exam
s are the tests that take in an accumulated history of model / corpus and returns a score.Manager
s are responsible for managing results across runs.
bocoel.Examinator
Examinator(exams: Mapping[str, Exam])
The examinator is responsible for launching exams. Examinators take in an index and results of an optimizer run, and return a DataFrame of scores for the accumulated history performance of the optimizer.
Source code in src/bocoel/core/exams/examinators.py
22 23 |
|
examine
examine(index: Index, results: OrderedDict[int, float]) -> DataFrame
Perform the exams on the results. This method looks up results in the index and runs the exams on the results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index | Index | The index of the results. | required |
results | OrderedDict[int, float] | The results. | required |
Returns:
Type | Description |
---|---|
DataFrame | The scores of the exams. |
TODO
Run the different exams in parallel. Currently the exams are run sequentially and can be slow.
Source code in src/bocoel/core/exams/examinators.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
presets classmethod
presets() -> Self
Returns:
Type | Description |
---|---|
Self | The default examinator. |
Source code in src/bocoel/core/exams/examinators.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
bocoel.Exam
Bases: Protocol
Exams are designed to evaluate the performance of a particular index, using a particular set of results generated by the optimizer.
run
run(index: Index, results: OrderedDict[int, float]) -> NDArray
Run the exam on the given index and results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index | Index | The index to evaluate. | required |
results | OrderedDict[int, float] | The results generated by the optimizer. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry in the index. The length must be the same as the results. |
Source code in src/bocoel/core/exams/interfaces.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
_run abstractmethod
_run(index: Index, results: OrderedDict[int, float]) -> NDArray
Run the exam on the given index and results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index | Index | The index to evaluate. | required |
results | OrderedDict[int, float] | The results generated by the optimizer. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry in the index. The length must be the same as the results. |
Source code in src/bocoel/core/exams/interfaces.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
bocoel.AccType
Bases: StrEnum
Accumulation type.
MIN class-attribute
instance-attribute
MIN = 'MINIMUM'
Minimum value accumulation.
MAX class-attribute
instance-attribute
MAX = 'MAXIMUM'
Maximum value accumulation.
AVG class-attribute
instance-attribute
AVG = 'AVERAGE'
Average value accumulation.
bocoel.Accumulation
Accumulation(typ: AccType)
Bases: Exam
Accumulation is an exam designed to evaluate the min / max / avg of the history.
Source code in src/bocoel/core/exams/stats/acc.py
35 36 37 38 39 40 41 42 43 44 45 |
|
run
run(index: Index, results: OrderedDict[int, float]) -> NDArray
Run the exam on the given index and results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index | Index | The index to evaluate. | required |
results | OrderedDict[int, float] | The results generated by the optimizer. | required |
Returns:
Type | Description |
---|---|
NDArray | The scores for each entry in the index. The length must be the same as the results. |
Source code in src/bocoel/core/exams/interfaces.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
_acc staticmethod
_acc(array: NDArray, accumulate: Callable[[NDArray], NDArray]) -> NDArray
Accumulate the array using the given function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
array | NDArray | The array to accumulate. | required |
accumulate | Callable[[NDArray], NDArray] | The accumulation function to use. | required |
Returns:
Type | Description |
---|---|
NDArray | The accumulated array. |
Raises:
Type | Description |
---|---|
ValueError | If the array is not 1D. |
Source code in src/bocoel/core/exams/stats/acc.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
bocoel.Manager
Manager(root: str | Path | None = None, skip_rerun: bool = True)
The manager for running and saving evaluations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root | str | Path | None | The path to save the scores to. | None |
skip_rerun | bool | Whether to skip rerunning the optimizer if the scores already exist. | True |
Raises:
Type | Description |
---|---|
ValueError | If the path is not a directory. |
Source code in src/bocoel/core/exams/managers.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
_examinator instance-attribute
_examinator: Examinator = presets()
The examinator that would perform evaluations on the results.
run
run(
steps: int | None = None,
*,
optimizer: Optimizer,
embedder: Embedder,
corpus: Corpus,
model: GenerativeModel | ClassifierModel,
adaptor: Adaptor
) -> DataFrame
Runs the optimizer until the end. If the root path is set in the constructor, the scores are saved to the path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer | Optimizer | The optimizer to run. | required |
embedder | Embedder | The embedder to run the optimizer with. | required |
corpus | Corpus | The corpus to run the optimizer on. | required |
model | GenerativeModel | ClassifierModel | The model to run the optimizer with. | required |
adaptor | Adaptor | The adaptor to run the optimizer with. | required |
steps | int | None | The number of steps to run the optimizer for. | None |
Returns:
Type | Description |
---|---|
DataFrame | The final state of the optimizer. Keys are the indices of the queries, and values are the corresponding scores. |
Source code in src/bocoel/core/exams/managers.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
save
save(
*,
scores: DataFrame,
optimizer: Optimizer,
corpus: Corpus,
model: GenerativeModel | ClassifierModel,
adaptor: Adaptor,
embedder: Embedder,
md5: str
) -> None
Saves the scores to the path. If the root path is not set in the constructor, the scores are not saved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scores | DataFrame | The scores to save. | required |
optimizer | Optimizer | The optimizer used to generate the scores. | required |
corpus | Corpus | The corpus used to generate the scores. | required |
model | GenerativeModel | ClassifierModel | The model used to generate the scores. | required |
adaptor | Adaptor | The adaptor used to generate the scores. | required |
embedder | Embedder | The embedder used to generate the scores. | required |
md5 | str | The md5 hash of the identifier columns. | required |
Raises:
Type | Description |
---|---|
ValueError | If the path is not set. |
Source code in src/bocoel/core/exams/managers.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
with_cols
with_cols(df: DataFrame, columns: dict[str, Any]) -> DataFrame
Adds identifier columns to the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df | DataFrame | The DataFrame to add the columns to. | required |
mappings | The columns to add to the DataFrame. | required |
Returns:
Type | Description |
---|---|
DataFrame | The md5 hash of the identifier columns and the DataFrame with the columns added. |
Source code in src/bocoel/core/exams/managers.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
|
_launch staticmethod
_launch(
optimizer: Optimizer, steps: int | None = None
) -> Generator[Mapping[int, float], None, None]
Launches the optimizer as a generator.
Source code in src/bocoel/core/exams/managers.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
load staticmethod
load(path: str | Path) -> DataFrame
Loads the scores from the path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to load the scores from. | required |
Returns:
Type | Description |
---|---|
DataFrame | The loaded scores. |
Raises:
Type | Description |
---|---|
ValueError | If the path does not exist or is not a directory. |
ValueError | If no csv files are found in the path. |
Source code in src/bocoel/core/exams/managers.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
md5 staticmethod
md5(
*,
optimizer: Optimizer,
embedder: Embedder,
corpus: Corpus,
model: GenerativeModel | ClassifierModel,
adaptor: Adaptor
) -> str
Generates an md5 hash from the given data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer | Optimizer | The optimizer used to generate the scores. | required |
corpus | Corpus | The corpus used to generate the scores. | required |
model | GenerativeModel | ClassifierModel | The model used to generate the scores. | required |
adaptor | Adaptor | The adaptor used to generate the scores. | required |
embedder | Embedder | The embedder used to generate the scores. | required |
time | The time the scores were generated. | required |
Returns:
Type | Description |
---|---|
str | The md5 hash of the given data. |
Source code in src/bocoel/core/exams/managers.py
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
|
bocoel.core.exams.columns
This module contains the columns names used in the manager dataframes, which correspond to the different components and exams of the system.
components
TIME module-attribute
TIME = 'time'
Corresponds to the time at which the evaluation was performed.
INDEX module-attribute
INDEX = 'index'
Corresponds to the index.
STORAGE module-attribute
STORAGE = 'storage'
Corresponds to the storage.
EMBEDDER module-attribute
EMBEDDER = 'embedder'
Corresponds to the embedder.
OPTIMIZER module-attribute
OPTIMIZER = 'optimizer'
Corresponds to the optimizer.
MODEL module-attribute
MODEL = 'model'
Corresponds to the model.
ADAPTOR module-attribute
ADAPTOR = 'adaptor'
Corresponds to the adaptor.
MD5 module-attribute
MD5 = 'md5'
Corresponds to the MD5 hash of the evaluation. This is a hash of most movable components.
exams
ORIGINAL module-attribute
ORIGINAL = 'original'
Corresponds to the original evaluation. The raw values.
STEP_IDX module-attribute
STEP_IDX = 'step_idx'
Corresponds to the step index.
ACC_MIN module-attribute
ACC_MIN = 'acc_min'
Corresponds to the minimum accuracy.
ACC_MAX module-attribute
ACC_MAX = 'acc_max'
Corresponds to the maximum accuracy.
ACC_AVG module-attribute
ACC_AVG = 'acc_avg'
Corresponds to the average accuracy.
MST_MAX_EDGE_QUERY module-attribute
MST_MAX_EDGE_QUERY = 'mst_max_edge_query'
Corresponds to the query for the maximum edge of the minimum spanning tree.
MST_MAX_EDGE_DATA module-attribute
MST_MAX_EDGE_DATA = 'mst_max_edge_data'
Corresponds to the data for the maximum edge of the minimum spanning tree.
SEGREGATION module-attribute
SEGREGATION = 'segregation'
Corresponds to the number of unique clusters.