Skip to content

Factories

bocoel.factories.IndexName

Bases: StrEnum

The names of the indices.

FAISS class-attribute instance-attribute

FAISS = 'FAISS'

Corresponds to FaissIndex.

HNSWLIB class-attribute instance-attribute

HNSWLIB = 'HNSWLIB'

Corresponds to HnswlibIndex.

POLAR class-attribute instance-attribute

POLAR = 'POLAR'

Corresponds to PolarIndex.

WHITENING class-attribute instance-attribute

WHITENING = 'WHITENING'

Corresponds to WhiteningIndex.

bocoel.factories.index_class

index_class(name: str | IndexName) -> type[Index]

Get the index class for the given name.

Parameters:

Name Type Description Default
name str | IndexName

The name of the index.

required
Source code in src/bocoel/factories/indices.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def index_class(name: str | IndexName, /) -> type[Index]:
    """
    Get the index class for the given name.

    Parameters:
        name: The name of the index.
    """

    name = IndexName.lookup(name)

    match name:
        case IndexName.FAISS:
            return FaissIndex
        case IndexName.HNSWLIB:
            return HnswlibIndex
        case IndexName.POLAR:
            return PolarIndex
        case IndexName.WHITENING:
            return WhiteningIndex
        case _:
            raise ValueError(f"Unknown index name: {name}")

bocoel.factories.StorageName

Bases: StrEnum

The storage names.

PANDAS class-attribute instance-attribute

PANDAS = 'PANDAS'

Corresponds to PandasStorage.

DATASETS class-attribute instance-attribute

DATASETS = 'DATASETS'

Corresponds to DatasetsStorage.

bocoel.factories.storage

storage(
    storage: str | StorageName,
    /,
    *,
    path: str = "",
    name: str = "",
    split: str = "",
) -> Storage

Create a single storage.

Parameters:

Name Type Description Default
storage str | StorageName

The name of the storage.

required
path str

The path to the storage.

''
name str

The name of the storage.

''
split str

The split to use.

''

Returns:

Type Description
Storage

The storage instance.

Raises:

Type Description
ValueError

If the storage is unknown.

Source code in src/bocoel/factories/storages.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
@common.correct_kwargs
def storage(
    storage: str | StorageName, /, *, path: str = "", name: str = "", split: str = ""
) -> Storage:
    """
    Create a single storage.

    Parameters:
        storage: The name of the storage.
        path: The path to the storage.
        name: The name of the storage.
        split: The split to use.

    Returns:
        The storage instance.

    Raises:
        ValueError: If the storage is unknown.
    """

    storage = StorageName.lookup(storage)
    match storage:
        case StorageName.PANDAS:
            return common.correct_kwargs(PandasStorage.from_jsonl_file)(path)
        case StorageName.DATASETS:
            return common.correct_kwargs(DatasetsStorage)(
                path=path, name=name, split=split
            )
        case _:
            raise ValueError(f"Unknown storage name {storage}")

bocoel.factories.EmbedderName

Bases: StrEnum

The names of the embedders.

SBERT class-attribute instance-attribute

SBERT = 'SBERT'

Corresponds to SbertEmbedder.

HUGGINGFACE class-attribute instance-attribute

HUGGINGFACE = 'HUGGINGFACE'

Corresponds to HuggingfaceEmbedder.

HUGGINGFACE_ENSEMBLE class-attribute instance-attribute

HUGGINGFACE_ENSEMBLE = 'HUGGINGFACE_ENSEMBLE'

Corresponds to EnsembleEmbedder concatenating HuggingfaceEmbedder.

bocoel.factories.embedder

embedder(
    name: str | EmbedderName,
    /,
    *,
    model_name: str | list[str],
    device: str = "auto",
    batch_size: int,
) -> Embedder

Create an embedder.

Parameters:

Name Type Description Default
name str | EmbedderName

The name of the embedder.

required
model_name str | list[str]

The model name to use.

required
device str

The device to use.

'auto'
batch_size int

The batch size to use.

required

Returns:

Type Description
Embedder

The embedder instance.

Raises:

Type Description
ValueError

If the name is unknown.

TypeError

If the model name is not a string for SBERT or Huggingface, or not a list of strings for HuggingfaceEnsemble.

Source code in src/bocoel/factories/embedders.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def embedder(
    name: str | EmbedderName,
    /,
    *,
    model_name: str | list[str],
    device: str = "auto",
    batch_size: int,
) -> Embedder:
    """
    Create an embedder.

    Parameters:
        name: The name of the embedder.
        model_name: The model name to use.
        device: The device to use.
        batch_size: The batch size to use.

    Returns:
        The embedder instance.

    Raises:
        ValueError: If the name is unknown.
        TypeError: If the model name is not a string for SBERT or Huggingface,
            or not a list of strings for HuggingfaceEnsemble.
    """

    match EmbedderName.lookup(name):
        case EmbedderName.SBERT:
            if not isinstance(model_name, str):
                raise TypeError(
                    "SbertEmbedder requires a single model name. "
                    f"Got {model_name} instead."
                )

            return common.correct_kwargs(SbertEmbedder)(
                model_name=model_name,
                device=common.auto_device(device),
                batch_size=batch_size,
            )
        case EmbedderName.HUGGINGFACE:
            if not isinstance(model_name, str):
                raise TypeError(
                    "HuggingfaceEmbedder requires a single model name. "
                    f"Got {model_name} instead."
                )
            return common.correct_kwargs(HuggingfaceEmbedder)(
                path=model_name,
                device=common.auto_device(device),
                batch_size=batch_size,
            )
        case EmbedderName.HUGGINGFACE_ENSEMBLE:
            if not isinstance(model_name, list):
                raise TypeError(
                    "HuggingfaceEnsembleEmbedder requires a list of model names. "
                    f"Got {model_name} instead."
                )

            device_list = common.auto_device_list(device, len(model_name))
            return common.correct_kwargs(EnsembleEmbedder)(
                [
                    HuggingfaceEmbedder(path=model, device=dev, batch_size=batch_size)
                    for model, dev in zip(model_name, device_list)
                ]
            )
        case _:
            raise ValueError(f"Unknown embedder name: {name}")

bocoel.factories.CorpusName

Bases: StrEnum

The names of the corpus.

COMPOSED class-attribute instance-attribute

COMPOSED = 'COMPOSED'

Corresponds to ComposedCorpus.

bocoel.factories.corpus

corpus(
    name: str | CorpusName = CorpusName.COMPOSED,
    /,
    *,
    storage: Storage,
    embedder: Embedder,
    keys: Sequence[str],
    index_name: str | IndexName,
    **index_kwargs: Any,
) -> Corpus

Create a corpus.

Parameters:

Name Type Description Default
name str | CorpusName

The name of the corpus.

COMPOSED
storage Storage

The storage to use.

required
embedder Embedder

The embedder to use.

required
keys Sequence[str]

The key to use for the index.

required
index_name str | IndexName

The name of the index backend to use.

required
**index_kwargs Any

The keyword arguments to pass to the index backend.

{}

Returns:

Type Description
Corpus

The corpus instance.

Raises:

Type Description
ValueError

If the name is unknown.

Source code in src/bocoel/factories/corpora.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def corpus(
    name: str | CorpusName = CorpusName.COMPOSED,
    /,
    *,
    storage: Storage,
    embedder: Embedder,
    keys: Sequence[str],
    index_name: str | IndexName,
    **index_kwargs: Any,
) -> Corpus:
    """
    Create a corpus.

    Parameters:
        name: The name of the corpus.
        storage: The storage to use.
        embedder: The embedder to use.
        keys: The key to use for the index.
        index_name: The name of the index backend to use.
        **index_kwargs: The keyword arguments to pass to the index backend.

    Returns:
        The corpus instance.

    Raises:
        ValueError: If the name is unknown.
    """

    if CorpusName.lookup(name) is not CorpusName.COMPOSED:
        raise ValueError(f"Unknown corpus name: {name}")

    return common.correct_kwargs(ComposedCorpus.index_storage)(
        storage=storage,
        embedder=embedder,
        keys=keys,
        index_backend=indices.index_class(index_name),
        **indices.index_set_backends(index_kwargs),
    )

bocoel.factories.adaptor

adaptor(name: str | AdaptorName, /, **kwargs: Any) -> Adaptor

Create an adaptor.

Parameters:

Name Type Description Default
name str | AdaptorName

The name of the adaptor.

required
**kwargs Any

The keyword arguments to pass to the adaptor. See the documentation of the corresponding adaptor for details.

{}

Returns:

Type Description
Adaptor

The adaptor instance.

Raises:

Type Description
ValueError

If the name is unknown.

Source code in src/bocoel/factories/adaptors.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def adaptor(name: str | AdaptorName, /, **kwargs: Any) -> Adaptor:
    """
    Create an adaptor.

    Parameters:
        name: The name of the adaptor.
        **kwargs: The keyword arguments to pass to the adaptor.
            See the documentation of the corresponding adaptor for details.

    Returns:
        The adaptor instance.

    Raises:
        ValueError: If the name is unknown.
    """

    name = AdaptorName.lookup(name)

    match name:
        case AdaptorName.BIGBENCH_MC:
            return common.correct_kwargs(BigBenchMultipleChoice)(**kwargs)
        case AdaptorName.BIGBENCH_QA:
            return common.correct_kwargs(BigBenchQuestionAnswer)(**kwargs)
        case AdaptorName.SST2:
            return common.correct_kwargs(Sst2QuestionAnswer)(**kwargs)
        case AdaptorName.GLUE:
            return common.correct_kwargs(GlueAdaptor)(**kwargs)
        case _:
            raise ValueError(f"Unknown adaptor name: {name}")

bocoel.factories.AdaptorName

Bases: StrEnum

The names of the adaptors.

BIGBENCH_MC class-attribute instance-attribute

BIGBENCH_MC = 'BIGBENCH_MULTIPLE_CHOICE'

Corresponds to BigBenchMultipleChoice.

BIGBENCH_QA class-attribute instance-attribute

BIGBENCH_QA = 'BIGBENCH_QUESTION_ANSWER'

Corresponds to BigBenchQuestionAnswer.

SST2 class-attribute instance-attribute

SST2 = 'SST2'

Corresponds to Sst2QuestionAnswer.

GLUE class-attribute instance-attribute

GLUE = 'GLUE'

Corresponds to GlueAdaptor.

bocoel.factories.GeneratorName

Bases: StrEnum

The generator names.

HUGGINGFACE_GENERATIVE class-attribute instance-attribute

HUGGINGFACE_GENERATIVE = 'HUGGINGFACE_GENERATIVE'

Corresponds to HuggingfaceGenerativeLM.

bocoel.factories.generative

generative(
    name: str | GeneratorName,
    /,
    *,
    model_path: str,
    batch_size: int,
    device: str = "auto",
    add_sep_token: bool = False,
) -> GenerativeModel

Create a generative model.

Parameters:

Name Type Description Default
name str | GeneratorName

The name of the model.

required
model_path str

The path to the model.

required
batch_size int

The batch size to use.

required
device str

The device to use.

'auto'
add_sep_token bool

Whether to add the sep token.

False

Returns:

Type Description
GenerativeModel

The generative model instance.

Raises:

Type Description
ValueError

If the name is unknown.

Source code in src/bocoel/factories/lms.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def generative(
    name: str | GeneratorName,
    /,
    *,
    model_path: str,
    batch_size: int,
    device: str = "auto",
    add_sep_token: bool = False,
) -> GenerativeModel:
    """
    Create a generative model.

    Parameters:
        name: The name of the model.
        model_path: The path to the model.
        batch_size: The batch size to use.
        device: The device to use.
        add_sep_token: Whether to add the sep token.

    Returns:
        The generative model instance.

    Raises:
        ValueError: If the name is unknown.
    """

    device = common.auto_device(device)

    match GeneratorName.lookup(name):
        case GeneratorName.HUGGINGFACE_GENERATIVE:
            return common.correct_kwargs(HuggingfaceGenerativeLM)(
                model_path=model_path,
                batch_size=batch_size,
                device=device,
                add_sep_token=add_sep_token,
            )
        case _:
            raise ValueError(f"Unknown LM name {name}")

bocoel.factories.ClassifierName

Bases: StrEnum

The classifier names.

HUGGINGFACE_LOGITS class-attribute instance-attribute

HUGGINGFACE_LOGITS = 'HUGGINGFACE_LOGITS'

Corresponds to HuggingfaceLogitsLM.

HUGGINGFACE_SEQUENCE class-attribute instance-attribute

HUGGINGFACE_SEQUENCE = 'HUGGINGFACE_SEQUENCE'

Corresponds to HuggingfaceSequenceLM.

bocoel.factories.classifier

classifier(
    name: str | ClassifierName,
    /,
    *,
    model_path: str,
    batch_size: int,
    choices: Sequence[str],
    device: str = "auto",
    add_sep_token: bool = False,
) -> ClassifierModel
Source code in src/bocoel/factories/lms.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def classifier(
    name: str | ClassifierName,
    /,
    *,
    model_path: str,
    batch_size: int,
    choices: Sequence[str],
    device: str = "auto",
    add_sep_token: bool = False,
) -> ClassifierModel:
    device = common.auto_device(device)

    match ClassifierName.lookup(name):
        case ClassifierName.HUGGINGFACE_LOGITS:
            return common.correct_kwargs(HuggingfaceLogitsLM)(
                model_path=model_path,
                batch_size=batch_size,
                device=device,
                choices=choices,
                add_sep_token=add_sep_token,
            )
        case ClassifierName.HUGGINGFACE_SEQUENCE:
            return common.correct_kwargs(HuggingfaceSequenceLM)(
                model_path=model_path,
                device=device,
                choices=choices,
                add_sep_token=add_sep_token,
            )
        case _:
            raise ValueError(f"Unknown LM name {name}")

bocoel.factories.OptimizerName

Bases: StrEnum

The names of the optimizers.

BAYESIAN class-attribute instance-attribute

BAYESIAN = 'BAYESIAN'

Corresponds to AxServiceOptimizer.

KMEANS class-attribute instance-attribute

KMEANS = 'KMEANS'

Corresponds to KMeansOptimizer.

KMEDOIDS class-attribute instance-attribute

KMEDOIDS = 'KMEDOIDS'

Corresponds to KMedoidsOptimizer.

RANDOM class-attribute instance-attribute

RANDOM = 'RANDOM'

Corresponds to RandomOptimizer.

BRUTE class-attribute instance-attribute

BRUTE = 'BRUTE'

Corresponds to BruteForceOptimizer.

UNIFORM class-attribute instance-attribute

UNIFORM = 'UNIFORM'

Corresponds to UniformOptimizer.

bocoel.factories.optimizer

optimizer(
    name: str | OptimizerName,
    /,
    *,
    corpus: Corpus,
    adaptor: Adaptor,
    **kwargs: Any,
) -> Optimizer

Create an optimizer instance.

Parameters:

Name Type Description Default
name str | OptimizerName

The name of the optimizer.

required
corpus Corpus

The corpus to optimize.

required
adaptor Adaptor

The adaptor to use.

required
**kwargs Any

Additional keyword arguments to pass to the optimizer. See the documentation for the specific optimizer for details.

{}

Returns:

Type Description
Optimizer

The optimizer instance.

Raises:

Type Description
ValueError

If the name is unknown.

Source code in src/bocoel/factories/optim.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def optimizer(
    name: str | OptimizerName, /, *, corpus: Corpus, adaptor: Adaptor, **kwargs: Any
) -> Optimizer:
    """
    Create an optimizer instance.

    Parameters:
        name: The name of the optimizer.
        corpus: The corpus to optimize.
        adaptor: The adaptor to use.
        **kwargs: Additional keyword arguments to pass to the optimizer.
            See the documentation for the specific optimizer for details.

    Returns:
        The optimizer instance.

    Raises:
        ValueError: If the name is unknown.
    """

    name = OptimizerName.lookup(name)

    klass: type[Optimizer]

    match name:
        case OptimizerName.BAYESIAN:
            klass = AxServiceOptimizer
        case OptimizerName.KMEANS:
            klass = KMeansOptimizer
        case OptimizerName.KMEDOIDS:
            klass = KMedoidsOptimizer
        case OptimizerName.BRUTE:
            klass = BruteForceOptimizer
        case OptimizerName.RANDOM:
            klass = RandomOptimizer
        case OptimizerName.UNIFORM:
            klass = UniformOptimizer
        case _:
            raise ValueError(f"Unknown optimizer name: {name}")

    corpus_evaluator = CorpusEvaluator(corpus=corpus, adaptor=adaptor)
    return klass(index_eval=corpus_evaluator, index=corpus.index, **kwargs)