phenonaut.transforms package

Submodules

phenonaut.transforms.dimensionality_reduction module

class phenonaut.transforms.dimensionality_reduction.LDA

Bases: object

LDA dimensionality reduction

Once instantiated, can be called like:

lda=LDA()
lda(dataset)
Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object containing one dataset on which to apply the transformation.

  • ndims (int, optional) – Number of dimensions to embed the data into, by default 2

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the embedding space on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • center_by_median (bool, optional) – By default, any dataset centering will be performed on the median of controls or perturbations. If this argument is False, then centering is performed on the mean, by default True.

  • predict_proba (bool) – If True, then probabilities of each datapoint belonging to every other class is calculated and used in place of output features.

__call__(dataset: Dataset, ndims=2, center_on_perturbation_id=None, center_by_median: bool = True, predict_proba: bool = False)

LDA dimensionality reduction

Once instantiated, can be called like:

lda=LDA()
lda(dataset)
Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object containing one dataset on which to apply the transformation.

  • ndims (int, optional) – Number of dimensions to embed the data into, by default 2

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the embedding space on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • center_by_median (bool, optional) – By default, any dataset centering will be performed on the median of controls or perturbations. If this argument is False, then centering is performed on the mean, by default True.

  • predict_proba (bool) – If True, then probabilities of each datapoint belonging to every other class is calculated and used in place of output features.

make_scree_plot(output_filename: None, title='Scree plot')

Produce a Scree plot showing ndims vs explained variance

Parameters:
  • output_filename (Optional[Union[str, Path]]) – Output filename (ending in .png) indicating where the Scree plot should be saved. If None, then the plot is displayed interactively. By default, None.

  • title (str, optional) – Plot title, by default “Scree plot”

Raises:

DataError – LDA must have been fitted before a Scree plot can be made.

class phenonaut.transforms.dimensionality_reduction.PCA(new_feature_names='PC', ndims: int = 2)

Bases: Transformer

Principal Component Analysis (PCA) dimensionality reduction.

Can be instantiated and called like:

pca=PCA()
pca(dataset)

PCA transformer is rather verbose, copying a lot of functionality supplied by its parent class simply becuase explaining feature variance is often important along with generating skree plots etc.

Parameters:
  • dataset (Dataset) – The Phenonaut dataset on which to apply PCA

  • new_feature_names (Union[list[str], str]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. By default “PC”

  • ndims (int, optional) – Number of dimensions to embed the data into, by default 2

__call__(dataset: ~phenonaut.data.dataset.Dataset | ~phenonaut.phenonaut.Phenonaut, ndims: int | None = None, new_feature_names: list[str] | str = 'PC', groupby: str | list[str] | None = None, center_on_perturbation_id: str | None = None, centering_function: ~collections.abc.Callable = <function median>, fit_perturbation_ids: str | list | None = None, fit_query: str | None = None, explain_variance_in_features: bool = False)

Principal Component Analysis (PCA) dimensionality reduction

Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object containing one dataset on which to apply PCA.

  • ndims (int) – Number of dimensions to embed the data into. If None, then the value of ndims passed to the constructor is used.

  • new_feature_names (Union[list[str], str]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. By default “PC”

  • groupby (Optional[Union[str, list[str]]], optional) – Often we would like to apply transformations on a plate-by-plate basis. This groupby argument allows definition of a string which is used to identify columns with which unique values define a group or plate. It works with the pandas groupby function and accepts a list of strings if multiple columns must be used to define groups. By default None.

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the PCA on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • centering_function (Callable) – Used with center_on_perturbation, this function is applied to all data for matching perturbations. By default, we use the median of perturbations. This behavior can be overridden by supplying a different function here. By default np.median.

  • fit_perturbation_ids (Optional[Union[str, list]], optional) – If only a subset of the data should be used for fitting, then pertubations for fitting may be listed here. If none, then every datapoint is used for fitting, by default None.

  • fit_query (str, optional) – A pandas style query may be supplied to perform fitting. By default None.

  • explain_variance_in_features (bool, optional) – If True, then the percentage explained variance of each PCA dimension is included in the new PCA descriptor feature name. Overrides new_feature_names. By default False.

save_scree_plot(output_filename: str | Path | None = None, title='Scree plot')

Produce a Scree plot showing ndims vs explained variance

Parameters:
  • output_filename (Optional[Union[str, Path]]) – Output filename (ending in .png) indicating where the Scree plot should be saved. If None, then the plot is displayed interactively. By default, None.

  • title (str, optional) – Plot title, by default “Scree plot”

Raises:

DataError – PCA must have been fitted before a Scree plot can be made.

class phenonaut.transforms.dimensionality_reduction.TSNE(constructor_kwargs={}, new_feature_names='TSNE', ndims: int = 2)

Bases: Transformer

t-SNE dimensionality reduction

Can be instantiated and called like:

tsne=TSNE()
tsne(dataset)
Parameters:
  • dataset (Dataset) – The Phenonaut dataset on which to apply the transformation

  • new_feature_names (Union[list[str], str]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. By default “TSNE”

  • ndims (int, optional) – Number of dimensions to embed the data into, by default 2

__call__(dataset: ~phenonaut.data.dataset.Dataset | ~phenonaut.phenonaut.Phenonaut, ndims: int = 2, new_feature_names: list[str] | str = 'tSNE', groupby: str | list[str] | None = None, center_on_perturbation_id: str | None = None, centering_function: ~collections.abc.Callable = <function median>)

t-SNE dimensionality reduction

Once instantiated, can be called directly, like:

tsne=TSNE()
tsne(dataset)
Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object containing one dataset on which to apply t-SNE.

  • ndims (int) – Number of dimensions to embed the data into, by default 2

  • new_feature_names (Union[list[str], str]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. By default “tSNE”

  • groupby (Optional[Union[str, list[str]]], optional) – Often we would like to apply transformations on a plate-by-plate basis. This groupby argument allows definition of a string which is used to identify columns with which unique values define a group or plate. It works with the pandas groupby function and accepts a list of strings if multiple columns must be used to define groups. By default None.

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the t-SNE on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • centering_function (Callable) – Used with center_on_perturbation, this function is applied to all data for matching perturbations. By default, we use the median of perturbations. This behavior can be overridden by supplying a different function here. By default np.median.

class phenonaut.transforms.dimensionality_reduction.UMAP(new_feature_names='UMAP', ndims: int = 2, umap_kwargs: dict = {})

Bases: Transformer

UMAP dimensionality reduction

Can be instantiated and called like:

umap=UMAP()
umap(dataset)
Parameters:
  • dataset (Dataset) – The Phenonaut dataset on which to apply the transformation

  • new_feature_names (Union[list[str], str]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. By default “UMAP”

  • ndims (int, optional) – Number of dimensions to embed the data into, by default 2

  • umap_kwargs (dict) – Keyword arguments to pass to UMAP-lean constructor. Often the number of neighbors requires changing and this can be achieved here by passing in {‘n_neighbors’: 50} for example, to run UMAP with 50 neighbors. Any value of an n_components key within this dictionary will be overwritten by the value of the ndims argument.

__call__(dataset: ~phenonaut.data.dataset.Dataset | ~phenonaut.phenonaut.Phenonaut, ndims: int = 2, new_feature_names: list[str] | str = 'UMAP', groupby: str | list[str] | None = None, center_on_perturbation_id: str | None = None, centering_function: ~collections.abc.Callable = <function median>)

UMAP dimensionality reduction

Once instantiated, can be called directly, like:

uamp=UMAP()
umap(dataset)
Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object containing one dataset on which to apply UMAP.

  • ndims (int) – Number of dimensions to embed the data into, by default 2

  • new_feature_names (Union[list[str], str]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. By default “UMAP”

  • groupby (Optional[Union[str, list[str]]], optional) – Often we would like to apply transformations on a plate-by-plate basis. This groupby argument allows definition of a string which is used to identify columns with which unique values define a group or plate. It works with the pandas groupby function and accepts a list of strings if multiple columns must be used to define groups. By default None.

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the UMAP on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • centering_function (Callable) – Used with center_on_perturbation, this function is applied to all data for matching perturbations. By default, we use the median of perturbations. This behavior can be overridden by supplying a different function here. By default np.median.

phenonaut.transforms.generic_transformations module

class phenonaut.transforms.generic_transformations.Log2

Bases: Transformer

phenonaut.transforms.imputers module

class phenonaut.transforms.imputers.KNNImputer(n_neighbors: int = 5, weights: str = 'uniform', new_feature_names: list[str] | str = 'KNNImputed_')

Bases: Transformer

SciKitLearn KNNImputer

SciKit’s KNNImputer wrapped in a Phenonaut Transformer. Allows passing a numer of neighbors to use for imputation.

Can be used as follows:

imputer=KNNImputer()
imputer(dataset)
Parameters:
  • n_neighbors (int) – Use this many neighboring samples for imputation.

  • weights (str) – Weight function for neighbor points. Can be one of ‘uniform’ (All points in neighborhood weighted equally), ‘distance’ (contribution of neighbor points by inverse of their distance), or a callable accepting distance matrix and returning an array of the same shape containing weights, see https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html for further info.

  • new_feature_names (Union[list[str], str], optional) – Name of new features. If ending in _, then this is prepended to existing features. By default ‘KNNImputed_’.

class phenonaut.transforms.imputers.RFImputer(rf_kwargs={'n_jobs': -1}, max_iter: int = 25, tol: float = 0.001, new_feature_names: list[str] | str = 'RFImputed_')

Bases: Transformer

RandomForestImputer

RandomForestImputer inspired by SKLearn documentation here: https://scikit-learn.org/stable/auto_examples/impute/plot_iterative_imputer_variants_comparison.html#sphx-glr-auto-examples-impute-plot-iterative-imputer-variants-comparison-py

Can be very computationally expensive, so careful setting of max_iter is advised.

Can be used as follows:

imputer=RFImputer()
imputer(dataset)
Parameters:
  • rf_kwargs (dict) – Dictionary to use in constructing the SciKitLearn RandomForestRegressor, by default {‘n_jobs’:-1} (implying that all available processors should be used to fit the random forest regressor). May take any values shown in sklearn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

  • max_iter (int) – The maximum number of times to impute the missing values and check convergence as dictated by tol.

  • tol (float) – The tolerance at which convergence is accepted. Once changes between iterations are smaller than this value, then iteration stops, unless max_iter has been reached in which case, iteration stops before tol is reached. By default 1e-3.

  • new_feature_names (Union[list[str], str], optional) – Name of new features. If ending in _, then this is prepended to existing features. By default ‘RFImputed_’.

class phenonaut.transforms.imputers.SimpleImputer(strategy: str = 'median', fill_value: str | float | None = None, new_feature_names: list[str] | str = 'Imputed_')

Bases: Transformer

SciKitLearn SimpleImputer

SciKit’s SimpleImputer wrapped in a Phenonaut Transformer. Allows passing a strategy argument containing ‘mean’, ‘median’, ‘most_frequent’, or ‘constant’. If constant is passed, then must supply a fill_value argument.

Can be used as follows:

imputer=SimpleImputer()
imputer(dataset)
Parameters:
  • strategy (str, optional) – The imputation strategy to use, can be ‘mean’, ‘median’, ‘most_frequent’, or ‘constant’, by default “median”.

  • fill_value (Optional[Union[str, float]], optional) – If constant is passed as the strategy, then this argument should contain the constant value to fill with. By default None.

  • new_feature_names (Union[list[str], str], optional) – Name of new features. If ending in _, then this is prepended to existing features. By default ‘Imputed_’.

phenonaut.transforms.preparative module

class phenonaut.transforms.preparative.RemoveHighestCorrelatedThenVIF(verbose: bool = False)

Bases: object

__call__(ds: Dataset, n_before_vif=1000, vif_cutoff: float = 5.0, min_features=2, drop_columns=False, **corr_kwargs)

Run RemoveHighestCorrelatedThenVIF

Parameters:
  • ds (Dataset) – The dataset from which features are to be removed

  • n_before_vif (int, optional) – The number of features to remove before applying VIF. This is required when dealing with large datasets which would be too time consuming to process entirely with VIF. Features are removed iteratively, selecting the most correlated features and removing them, by default 1000

  • vif_cutoff (float, optional) – The VIF cutoff value, above which features are removed. Features with VIF scores above 5.0 are considered highly correlated, by default 5.0.

  • min_features (int, optional) – Remove features by VIF score all the way down to a minimum given by this argument, by default 2

  • drop_columns (bool, optional) – If drop columns is True, then not only will features be removed from the dataset features list, but the columns for these features will be removed from the dataframe, by default False

  • corr_kwargs (dict) – Keyword arguments which may be passed to the pd.Dataframe.corr function, allowing chaniging of the correlation calculation method from pearson to otheres.

filter(ds: Dataset, n_before_vif=1000, vif_cutoff: float = 5.0, min_features=2, drop_columns=False, **corr_kwargs)

Run RemoveHighestCorrelatedThenVIF

Parameters:
  • ds (Dataset) – The dataset from which features are to be removed

  • n_before_vif (int, optional) – The number of features to remove before applying VIF. This is required when dealing with large datasets which would be too time consuming to process entirely with VIF. Features are removed iteratively, selecting the most correlated features and removing them, by default 1000

  • vif_cutoff (float, optional) – The VIF cutoff value, above which features are removed. Features with VIF scores above 5.0 are considered highly correlated, by default 5.0.

  • min_features (int, optional) – Remove features by VIF score all the way down to a minimum given by this argument, by default 2

  • drop_columns (bool, optional) – If drop columns is True, then not only will features be removed from the dataset features list, but the columns for these features will be removed from the dataframe, by default False

  • corr_kwargs (dict) – Keyword arguments which may be passed to the pd.Dataframe.corr function, allowing chaniging of the correlation calculation method from pearson to otheres.

class phenonaut.transforms.preparative.RemoveHighlyCorrelated(verbose: bool = False)

Bases: object

__call__(ds: Dataset, threshold: float | None = 0.9, min_features: int | None = None, drop_columns: bool = False, **corr_kwargs)

Run RemoveHighlyCorrelated - identical to calling the filter function.

Parameters:
  • ds (Dataset) – The dataset from which features are to be removed

  • threshold (Union[float, None]) – The threshold value for calculated correlations, above which, a feature should be removed. If None, then it is expected that the n argument is given and features with the highest correlation are then iteratively removed. By default 0.9

  • min_features (int, optional) – The number of features to keep. If the threshold argument is None, then features are iteratively removed, ordered by the most correlated until the number of features is equal to n. If threshold is not None, and is a float, then n acts as a minimum number of features and feature removal will stop, no matter the correlations present in the dataset.

  • drop_columns (bool, optional) – If drop columns is True, then not only will features be removed from the dataset features list, but the columns for these features will be removed from the dataframe, by default False

  • corr_kwargs (dict) – Keyword arguments which may be passed to the pd.Dataframe.corr function, allowing chaniging of the correlation calculation method from pearson to otheres.

filter(ds: Dataset, threshold: float | None = 0.9, min_features: int | None = None, drop_columns: bool = False, **corr_kwargs)

Run the RemoveHighlyCorrelated filter

Parameters:
  • ds (Dataset) – The dataset from which features are to be removed

  • threshold (Union[float, None]) – The threshold value for calculated correlations, above which, a features should be removed. If None, then it is expected that the n argument is given and features with the highest correlation are then iteratively removed. Note that the absolute value of the calculated correlation coefficients are taken, so this number should always be positive, by default 0.9.

  • min_features (int, optional) – The number of features to keep. If the threshold argument is None, then features are iteratively removed, ordered by the most correlated until the number of features is equal to n. If threshold is not None, and is a float, then n acts as a minimum number of features and feature removal will stop, no matter the correlations present in the dataset.

  • drop_columns (bool, optional) – If drop columns is True, then not only will features be removed from the dataset features list, but the columns for these features will be removed from the dataframe, by default False

  • corr_kwargs (dict) – Keyword arguments which may be passed to the pd.Dataframe.corr function, allowing chaniging of the correlation calculation method from pearson to otheres.

class phenonaut.transforms.preparative.RobustMAD(mad_scale: str | float = 'normal', epsilon: float | None = None, new_feature_names: list[str] | str = 'RobustMAD_')

Bases: Transformer

class phenonaut.transforms.preparative.StandardScaler(new_feature_names='StdScaler_')

Bases: Transformer

class phenonaut.transforms.preparative.VIF(verbose: bool = False)

Bases: object

__call__(ds, vif_cutoff: float = 5.0, min_features=2, drop_columns: bool = False)

Run the variance inflation factor filter, the same as calling the filter method.

Parameters:
  • ds (Dataset) – The phenonaut dataset to be operated on

  • vif_cutoff (float, optional) – The variance inflation factor cutoff. Values above 5.0 indicate strong correlation, by default 5.0

  • min_features (int, optional) – Remove features by VIF score all the way down to a minimum given by this argument, by default 2

  • drop_columns (bool, optional) – If drop columns is True, then not only will features be removed from the dataset features list, but the columns for these features will be removed from the dataframe, by default False

filter(ds, vif_cutoff: float = 5.0, min_features=2, drop_columns: bool = False)

Run the variance inflation factor filter

Parameters:
  • ds (Dataset) – The phenonaut dataset to be operated on

  • vif_cutoff (float, optional) – The variance inflation factor cutoff. Values above 5.0 indicate strong correlation, by default 5.0

  • min_features (int, optional) – Remove features by VIF score all the way down to a minimum given by this argument, by default 2

  • drop_columns (bool, optional) – If drop columns is True, then not only will features be removed from the dataset features list, but the columns for these features will be removed from the dataframe, by default False

get_vif_scores(ds: Dataset, use_features: list[str] | None = None) dict[str, float]

Get VIF scores dictionary from a phenonaut Dataset

Parameters:
  • ds (Dataset) – The dataset from which to calculate VIF scores

  • use_features (Optional[list[str]], optional) – If None, then the features found within the dataset with a call to Dataset.features are used. This behaviour can be changed by passing a list of features as this use_features argument, by default None

Returns:

Dictionary of mapping feature name to VIF score

Return type:

dict[str, float]

class phenonaut.transforms.preparative.ZCA(new_feature_names: list[str] | str = 'ZCA_')

Bases: Transformer

ZCA whitening

Can be instantiated and called like:

zca=ZCA()
zca(dataset)
Parameters:
  • dataset (Dataset) – The Phenonaut dataset on which to apply ZCA whitening

  • new_feature_names (Union[list[str], str]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. By default “ZCA”

phenonaut.transforms.supervised_transformer module

class phenonaut.transforms.supervised_transformer.SupervisedTransformer(method, new_feature_names=None, callable_args: dict = {}, **kwargs)

Bases: object

callable_args = {}
fit(data: Dataset | DataFrame, y_or_ycolumnlabel: str | Series | array | None = None, fit_perturbation_ids: str | list | None = None)
fit_transform(data: Dataset | DataFrame, new_feature_names: list | None = None)
has_fit = False
has_fit_transform = False
has_transform = False
is_callable = False
method = None
method_kwargs = {}
new_feature_names = None
transform(data: Dataset | DataFrame, new_feature_names: List[str] | None = None)

phenonaut.transforms.transformer module

exception phenonaut.transforms.transformer.PheTransformerFitQueryMatchedNoRows(message)

Bases: Exception

class phenonaut.transforms.transformer.Transformer(method: object | Callable, new_feature_names: str | list[str] | None = None, transformer_name: str | None = None, constructor_kwargs: dict = {}, callable_kwargs: dict = {}, fit_kwargs: dict = {}, transform_kwargs: dict = {}, fit_transform_kwargs: dict = {})

Bases: object

Generic transformer to turn methods/objects into Phenonaut transforms

This generic transformer class may be used to wrap functions which perform a simple transform (like np.log2), or even more complex objects, like PCA, t-SNE and StandardScaler from scikit (which provide their own fit/transform/ fit_transform functions).

This way, a Phenonaut transformer may be constructed with applies the given function/object method to a Phenonaut dataset and correctly updates features.

Wrapping an object which requires fitting also brings with it the advantage that we may apply the groupby keyword to perform a unique fit and transform in groups, which is useful if you require PCA to be performed on a per-plate basis.

May be used as follows - in this example, we are wrapping PCA from SciKit which has the effect of perfoming a 2D PCA on our Phenonaut Dataset object. We also make use of the groupby keyword, to perform a unique PCA for each plate (denoted by unique BARCODE values).

from phenonaut import Phenonaut
from phenonaut.transforms import Transformer
import pandas as pd
import numpy as np

df=pd.DataFrame({
    'ROW':[1,1,1,1,1,1],
    'COLUMN':[1,1,1,2,2,2],
    'BARCODE':["Plate1","Plate1","Plate2","Plate2","Plate1","Plate1"],
    'feat_1':[1.2,1.3,5.2,6.2,0.1,0.2],
    'feat_2':[1.2,1.4,5.1,6.1,0.2,0.2],
    'feat_3':[1.3,1.5,5,6.8,0.3,0.38],
    'filename':['fileA.png','FileB.png','FileC.png','FileD.png','fileE.png','FileF.png'],
    'FOV':[1,2,1,2,1,2]})

phe=Phenonaut(df)
from sklearn.decomposition import PCA
t_pca=Transformer(PCA, constructor_kwargs={'n_components':2})
t_pca.fit(phe.ds, groupby="BARCODE")
t_pca.transform(phe.ds)

Along with the above, whereby a PCA transformer is generated from the SciKit PCA class, you may also use the built in phenonaut.transforms.PCA which has more PCA specific functionality, allowing creation of Scree plots etc.

A transformer may also be made with a callable function (which operates on dataframes or numpy arrays) like np.log2, or np.abs, etc, as shown below.

The following squares all features.

from phenonaut import Phenonaut
from phenonaut.transforms import Transformer
import numpy as np
import pandas as pd
df=pd.DataFrame({
    'ROW':[1,1,1,1,1,1],
    'COLUMN':[1,1,1,2,2,2],
    'BARCODE':["Plate1","Plate1","Plate2","Plate2","Plate1","Plate1"],
    'feat_1':[1.2,1.3,5.2,6.2,0.1,0.2],
    'feat_2':[1.2,1.4,5.1,6.1,0.2,0.2],
    'feat_3':[1.3,1.5,5,6.8,0.3,0.38],
    'filename':['fileA.png','FileB.png','FileC.png','FileD.png','fileE.png','FileF.png'],
    'FOV':[1,2,1,2,1,2]})
phe=Phenonaut(df)
t=Transformer(np.square)
t(phe)
Parameters:
  • method (Union[object, Callable]) – Instantiatable object with fit, transform and or fit_transform, or __call__ methods. Alternatively, a callable function. Designed to be passed something like the PCA class from SciKit, or a simple function like np.log2.

  • new_feature_names (Optional[Union[str, list[str]]]) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated, however, if the string ends in an underscore, then the old feature name has that string prepended to it. For example, if ‘StandardScaler_’ is given and the original features are feat_1, feat2, and feat_3, then the new features will be StandardScaler_feat_1, StandardScaler_feat_2, and StandardScaler_feat_3. If None, then the names of new features are attempted to be derived from the name of the wrapped function. By default None.

  • transformer_name (Optional[str], optional) – When features are set on a Dataset, the history reflects what was carried out to generate those features. If None, then automatic naming of the passed function is attempted. By default None.

  • constructor_kwargs (dict, optional) – Additional constructor arguments which may be passed to the class passed in the method argument upon instantiation. By default {}.

  • callable_kwargs (dict, optional) – Additional arguments to pass to the function passed in the method argument. By default {}.

  • fit_kwargs (dict, optional) – Additional arguments to the fit function called on object instantiated from the method argument. By default {}.

  • transform_kwargs (dict, optional) – Additional arguments to the transform function called on object instantiated from the method argument. By default {}.

  • fit_transform_kwargs (dict, optional) – Additional arguments to the fit_transform function called on object instantiated from the method argument. By default {}.

__call__(dataset: ~phenonaut.data.dataset.Dataset | ~phenonaut.phenonaut.Phenonaut | ~numpy.ndarray, groupby: str | list[str] | None = None, fit_perturbation_ids: str | list | None = None, fit_query: str | None = None, fit_kwargs: dict | None = None, transform_kwargs: dict | None = None, fit_transform_kwargs: dict | None = None, new_feature_names: str | list[str] | None = None, method_name: str | None = None, free_memory_after_transform: bool = True, center_on_perturbation_id: str | None = False, centering_function: ~collections.abc.Callable = <function median>)

Call transformer

If a simple callable method was passed to the constructor of the transformer, then it can be applied by calling the method here. The transform method can also be called with the same effect. If the method has no __call__ method, then transform is attempted to be called. If this is absent, then fit_transform is attempted.

Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object upon which the transform should be applied.

  • groupby (Optional[Union[str, list[str]]], optional) – Often we would like to apply transformations on a plate-by-plate basis. This groupby argument allows definition of a string which is used to identify columns with which unique values define a group or plate. It works with the pandas groupby function and accepts a list of strings if multiple columns must be used to define groups. By default None.

  • fit_perturbation_ids (Union[str, list], optional) – If only a subset of the data should be used for fitting, then pertubations for fitting may be listed here. If none, then every datapoint is used for fitting, by default None.

  • fit_query (Optional[str], optional) – A pandas style query may be supplied to perform fitting. By default None.

  • fit_kwargs (Optional[dict], optional) – Optional arguments supplied to the fit function of the object passed in the constructor of this transformer. By default None.

  • transform_kwargs (Optional[dict], optional) – Additional arguments to the transform function called on object instantiated from the method argument. If set here, and not None, then it overrides any transform_kwargs given to the object constructor. Transform is only called as a fallback, if the object does not have a fit_transform method, but separate fit and transform methods which may be called in series. By default None.

  • fit_transform_kwargs (Optional[dict], optional) – Additional arguments to the fit_transform function called on object instantiated from the method argument. If set here, and not None, then it overrides any fit_transform_kwargs given to the object constructor. By default None.

  • new_feature_names (Optional[Union[list[str], str]], optional) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. If None, then features are attempted to be named through interrogation of the method_name argument. By default None.

  • method_name (Optional[str], optional) – When setting features, the history message should describe what was done. If None, then the method object/method is interrogated in an attempt to automatically deduce the name. By default None.

  • free_memory_after_transform (bool, optional) – Remove temporary objects once transformation has been performed. When performing a groupby operation, intermediate fits are retained, these may be deleted after calling transform by setting this argument to True. By default True.

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the PCA on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • centering_function (Callable, optional) – Used with center_on_perturbation, this function is applied to all data for matching perturbations. By default, we use the median of perturbations. This behavior can be overridden by supplying a different function here. By default np.median.

fit(dataset: Dataset | Phenonaut, groupby: str | list[str] | None = None, fit_perturbation_ids: str | list | None = None, fit_query: str | None = None, fit_kwargs: dict | None = None)

Call fit on the transformer

Parameters:
  • dataset (Union[Dataset, Phenonaut]) – Dataset containing data to be fitted against

  • groupby (Optional[Union[str, list[str]]], optional) – Often we would like to apply transformations on a plate-by-plate basis. This groupby argument allows definition of a string which is used to identify columns with which unique values define a group or plate. It works with the pandas groupby function and accepts a list of strings if multiple columns must be used to define groups. By default None.

  • fit_perturbation_ids (Union[str, list], optional) – If only a subset of the data should be used for fitting, then pertubations for fitting may be listed here. If none, then every datapoint is used for fitting, by default None.

  • fit_query (Optional[str], optional) – A pandas style query may be supplied to perform fitting. By default None.

  • fit_kwargs (Optional[dict], optional) – Optional arguments supplied to the fit function of the object passed in the constructor of this transformer. By default None.

fit_transform(dataset: ~phenonaut.data.dataset.Dataset | ~phenonaut.phenonaut.Phenonaut, groupby: str | list[str] | None = None, fit_perturbation_ids: str | list | None = None, fit_query: str | None = None, fit_kwargs: dict | None = None, transform_kwargs: dict | None = None, fit_transform_kwargs: dict | None = None, new_feature_names: str | list[str] | None = None, method_name: str | None = None, free_memory_after_transform: bool = True, center_on_perturbation_id: str | None = False, centering_function: ~collections.abc.Callable = <function mean>)

_summary_

Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object upon which the fit_transform should be applied.

  • groupby (Optional[Union[str, list[str]]], optional) – Often we would like to apply transformations on a plate-by-plate basis. This groupby argument allows definition of a string which is used to identify columns with which unique values define a group or plate. It works with the pandas groupby function and accepts a list of strings if multiple columns must be used to define groups. By default None.

  • fit_perturbation_ids (Union[str, list], optional) – If only a subset of the data should be used for fitting, then pertubations for fitting may be listed here. If none, then every datapoint is used for fitting, by default None.

  • fit_query (Optional[str], optional) – A pandas style query may be supplied to perform fitting. By default None.

  • fit_kwargs (Optional[dict], optional) – Optional arguments supplied to the fit function of the object passed in the constructor of this transformer. By default None.

  • transform_kwargs (Optional[dict], optional) – Additional arguments to the transform function called on object instantiated from the method argument. If set here, and not None, then it overrides any transform_kwargs given to the object constructor. Transform is only called as a fallback, if the object does not have a fit_transform method, but separate fit and transform methods which may be called in series. By default None.

  • fit_transform_kwargs (Optional[dict], optional) – Additional arguments to the fit_transform function called on object instantiated from the method argument. If set here, and not None, then it overrides any fit_transform_kwargs given to the object constructor. By default None.

  • new_feature_names (Optional[Union[list[str], str]], optional) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. If None, then features are attempted to be named through interrogation of the method_name argument. By default None.

  • method_name (Optional[str], optional) – When setting features, the history message should describe what was done. If None, then the method object/method is interrogated in an attempt to automatically deduce the name. By default None.

  • free_memory_after_transform (bool, optional) – Remove temporary objects once transformation has been performed. When performing a groupby operation, intermediate fits are retained, these may be deleted after calling transform by setting this argument to True. By default True.

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the PCA on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • centering_function (Callable, optional) – Used with center_on_perturbation, this function is applied to all data for matching perturbations. By default, we use the median of perturbations. This behavior can be overridden by supplying a different function here. By default np.median.

transform(dataset: ~phenonaut.data.dataset.Dataset | ~phenonaut.phenonaut.Phenonaut, new_feature_names: str | list[str] | None = None, method_name: str | None = None, transform_kwargs: dict | None = None, free_memory_after_transform: bool = True, center_on_perturbation_id: str | None = False, centering_function: ~collections.abc.Callable = <function median>)

Apply transform

If no transform method is found, then __call__ is called.

Parameters:
  • dataset (Union[Dataset, Phenonaut]) – The Phenonaut dataset or Phenonaut object upon which the transform should be applied.

  • new_feature_names (Optional[Union[list[str], str]], optional) – List of strings containing the names for the new features. Can also be just a single string, which then has numerical suffixes attached enumerating the number of new features generated. If None, then features are attempted to be named through interrogation of the method_name argument. By default None.

  • method_name (Optional[str], optional) – When setting features, the history message should describe what was done. If None, then the method object/method is interrogated in an attempt to automatically deduce the name. By default None.

  • transform_kwargs (Optional[dict], optional) – Additional arguments to the transform function called on object instantiated from the method argument. If set here, and not None, then it overrides any transform_kwargs given to the object constructor. By default None.

  • free_memory_after_transform (bool, optional) – Remove temporary objects once transformation has been performed. When performing a groupby operation, intermediate fits are retained, these may be deleted after calling transform by setting this argument to True. By default True.

  • center_on_perturbation_id (Optional[str], optional) – Optionally, recentre the PCA on a named perturbation. Should have pertubation_column set within the dataset for this option, by default None.

  • centering_function (Callable, optional) – Used with center_on_perturbation, this function is applied to all data for matching perturbations. By default, we use the median of perturbations. This behavior can be overridden by supplying a different function here. By default np.median.

Module contents