Validation documentation
Validation RandomForest
- supernnova.validation.validate_randomforest.get_predictions(settings, model_file=None)[source]
Test random forest models on independent test set
Features are stored in a .FITRES file found in data_dir Use predefined splits to select test set Save predicted target and probabilities to preds_dir
- Parameters:
settings (ExperimentSettings) – custom class to hold hyperparameters
model_file (str) – path to saved randomforest model
Validation RNN
- supernnova.validation.validate_rnn.find_idx(array, value)[source]
Utility to find the index of the element of
arraythat most closely matchesvalue- Parameters:
array (np.array) – The array in which to search
value (float) – The value for which we are looking for a match
- Returns:
(int) the index of of the element of
arraythat most closely matchesvalue
- supernnova.validation.validate_rnn.get_batch_predictions(rnn, X, target)[source]
Utility to obtain predictions for a given batch
- Parameters:
rnn (torch.nn) – The RNN model
X (torch.Tensor) – The batch on which to carry out predictions
target (torch.longTensor) – The true class of each element in the batch
- Returns:
Tuple containing
arr_preds (np.array): predictions
arr_target (np.array): actual targets
- supernnova.validation.validate_rnn.get_batch_predictions_MFE(rnn, X, target)[source]
Utility to obtain predictions for a given batch
- Parameters:
rnn (torch.nn) – The RNN model
X (torch.Tensor) – The batch on which to carry out predictions
target (torch.longTensor) – The true class of each element in the batch
- Returns:
Tuple containing
arr_preds (np.array): predictions
arr_target (np.array): actual targets
- supernnova.validation.validate_rnn.get_predictions(settings, model_file=None)[source]
Obtain predictions for a given RNN model specified by the
settingsargument or alternatively, by a model_fileModels are benchmarked on the test data set
Batch size can be controled to speed up predictions
- For Bayesian models, multiple predictions are carried to
obtain a distribution of predictions
Predictions are computed for full lightcurves, and around the peak light
Predictions are saved to a pickle file (for faster loading)
- Parameters:
settings (ExperimentSettings) – custom class to hold hyperparameters
model_file (str) – Path to saved model weights. Default:
None
- supernnova.validation.validate_rnn.get_predictions_for_speed_benchmark(settings)[source]
Test RNN models inference speed
Models are benchmarked on the test data set
Batch size can be controled to speed up predictions
- For Bayesian models, multiple predictions are carried to
obtain a distribution of predictions
Results are saved to a .csv for future use
- Parameters:
settings (ExperimentSettings) – custom class to hold hyperparameters
Metrics
- supernnova.validation.metrics.aggregate_metrics(settings)[source]
Aggregate all pre-computed METRICS files into a single dataframe for analysis
Save a csv dataframe aggregating all the metrics
- Parameters:
settings (ExperimentSettings) – custom class to hold hyperparameters
- supernnova.validation.metrics.get_metrics_singlemodel(settings, prediction_file=None, model_type='rnn')[source]
Launch computation of all evaluation metrics for a given model, specified by the settings object or by a model file
Save a pickled dataframe (we pickle because we’re saving numpy arrays, which are not easily savable with the
to_csvmethod).- Parameters:
settings (ExperimentSettings) – custom class to hold hyperparameters
prediction_file (str) – Path to saved predictions. Default:
Nonemodel_type (str) – Choose
rnnorrandomforest
- Returns:
(pandas.DataFrame) holds the performance metrics for this dataframe
- supernnova.validation.metrics.get_rnn_performance_metrics_singlemodel(settings, df, host_zspe_list)[source]
Compute performance metrics (accuracy, AUC, purity etc) for an RNN model
Compute metrics around peak light (i.e.
PEAKMJD) and for the full lightcurve.For bayesian models, compute multiple predictions per lightcurve and then take the median
- Parameters:
settings (ExperimentSettings) – custom class to hold hyperparameters
df (pandas.DataFrame) – dataframe containing a model’s predictions
host_zspe_list (list) – available host galaxy spectroscopic redshifts
- Returns:
(pandas.DataFrame) holds the performance metrics for this dataframe
- supernnova.validation.metrics.get_randomforest_performance_metrics_singlemodel(settings, df, host_zspe_list)[source]
Compute performance metrics (accuracy, AUC, purity etc) for a randomforest model
- Parameters:
settings (ExperimentSettings) – custom class to hold hyperparameters
df (pandas.DataFrame) – dataframe containing a model’s predictions
host_zspe_list (list) – available host galaxy spectroscopic redshifts
- Returns:
(pandas.DataFrame) holds the performance metrics for this dataframe
- supernnova.validation.metrics.get_uncertainty_metrics_singlemodel(df)[source]
For any lightcurve, compute the standard deviation of the model’s predictions (this is only valid for bayesian models which yield a distribution of predictions).
Then, compute the mean and std dev of this distribution across all lightcurves A higher mean indicates a model which is less confident in its predictions
- Parameters:
df (pandas.DataFrame) – dataframe containing a model’s predictions
- Returns:
(pandas.DataFrame) holds the uncertainty metrics for this dataframe
- supernnova.validation.metrics.get_entropy_metrics_singlemodel(df, nb_classes)[source]
Compute the entropy of the predictions Low entropy indicates a model that is very confident of its predictions
- Parameters:
df (pandas.DataFrame) – dataframe containing a model’s predictions
nb_classes (int) – the number of classes in the classification task
- Returns:
(pandas.DataFrame) holds the entropy metrics for this dataframe
- supernnova.validation.metrics.get_calibration_metrics_singlemodel(df)[source]
Compute probability calibration dataframe. If the calibration curve is close to identity, the model is considered well-calibrated.
- Parameters:
df (pandas.DataFrame) – dataframe containing a model’s predictions
- Returns:
(pandas.DataFrame) holds the calibration metrics for this dataframe
- supernnova.validation.metrics.get_classification_stats_singlemodel(df, nb_classes)[source]
Find out how many lightcurves are classified in each class
- Parameters:
df (pandas.DataFrame) – dataframe containing a model’s predictions
nb_classes (int) – the number of classes in the classification task
- Returns:
(pandas.DataFrame) holds the calibration metrics for this dataframe