Storages
bocoel.Storage
Bases: Protocol
Storage is responsible for storing the data. This can be thought of as a table.
__len__ abstractmethod
__len__() -> int
Returns the number of rows in the storage.
Source code in src/bocoel/corpora/storages/interfaces.py
21 22 23 24 25 26 27 |
|
_getitem abstractmethod
_getitem(idx: int) -> Mapping[str, Any]
Returns the row at the given index.
Source code in src/bocoel/corpora/storages/interfaces.py
49 50 51 52 53 54 55 |
|
bocoel.PandasStorage
PandasStorage(df: DataFrame)
Bases: Storage
Storage for pandas DataFrame. Since pandas DataFrames are in-memory, this storage is fast, but might be memory inefficient and require a lot of RAM.
Source code in src/bocoel/corpora/storages/pandas.py
19 20 |
|
from_jsonl_file classmethod
from_jsonl_file(path: str | Path) -> Self
Load data from a JSONL file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to the file. | required |
Returns:
Type | Description |
---|---|
Self | A |
Source code in src/bocoel/corpora/storages/pandas.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
from_jsonl classmethod
from_jsonl(data: Sequence[Mapping[str, str]]) -> Self
Load data from a JSONL object or a list of JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | Sequence[Mapping[str, str]] | The JSONL object or list of JSON. | required |
Returns:
Type | Description |
---|---|
Self | A |
Source code in src/bocoel/corpora/storages/pandas.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
bocoel.DatasetsStorage
DatasetsStorage(path: str, name: str | None = None, split: str | None = None)
Bases: Storage
Storage for datasets from HuggingFace Datasets library. Datasets are loaded on disk, so they might be slow(er) to load, but are more memory efficient.
Source code in src/bocoel/corpora/storages/datasets.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
bocoel.ConcatStorage
ConcatStorage(storages: Sequence[Storage])
Bases: Storage
Storage that concatenates multiple storages together. Concatenation is done on the first dimension. The resulting storage is read-only and has length equal to the sum of the lengths of the storages.
Source code in src/bocoel/corpora/storages/concat.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|