Application of a Pretrained Classifier
The project aims to use maschine learning methods to emulate a cloud classification scheme. The classifer can be trained using large amounts of data and later be used to predict cloud types from satelite data. Those two steps can be run separately.
This notebook contains a short explanation how to use a pretrained classifier in order to predict labels from new input data.
Imports
At first we need to point python to the project folder. The path can be assigned as a relative path as shown below, or as an absolute system path. Than the module can be imported via the import cloud_classifier
command.
[53]:
import sys
sys.path.append('../cloud_classifier')
import cloud_classifier
import importlib
importlib.reload(cloud_classifier)
[53]:
<module 'cloud_classifier' from '/home/squidy/tropos/CTyPyTool/notebooks/../cloud_classifier/cloud_classifier.py'>
Initialization
Our first step is to create a classifier object:
[54]:
cc = cloud_classifier.cloud_classifier()
Than we need to point our classifier object to an already existing classifier. The load_project()
method will load an existing classifier into our classifier object.
[55]:
path = "../classifiers/TreeClassifier"
cc.load_project(path)
Applying the Classifier: Prediction of Cloud Type Labels
Using a User-Defined File List
In order to predict labels with the now loaded classifier, we need to specify input files of satelite data. This can be done manually via in input_files
option in the set_project_parameters
method.
[56]:
file_1 = "../data/example_data/msevi-medi-20190317_1800.nc"
file_2 = "../data/example_data/msevi-medi-20190318_1100.nc"
cc.set_project_parameters(input_files = [file_1, file_2])
We now run the prediction pipeline (with the run_prediction_pipeline()
method) which * applies the classifier to our input data and * stores the predicted labels.
The option create_filelist
is set to False
to take the user-defined input file list.
[57]:
cc.run_prediction_pipeline(create_filelist = False)
/home/squidy/.local/share/virtualenvs/CTyPyTool-idbccFiL/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
warnings.warn(
Classifier loaded!
Masked indices set!
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20190317_1800_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20190318_1100_predicted.nc
Using an Automatically Generated Input File List
Alternatively to the manual definition, the input file list can be generated automatically.
The easiest way to do so is to put all input files into an input data folder (here it is set to ../data_example_data
) and just tell the classifier where to look via the input_source_folder
option.
[58]:
%%bash
ls -l ../data/example_data
total 30120
-rw-rw-r-- 1 squidy squidy 14946418 Jun 4 2021 msevi-medi-20190317_1800.nc
-rw-rw-r-- 1 squidy squidy 15552552 Jun 4 2021 msevi-medi-20190318_1100.nc
-rw-rw-r-- 1 squidy squidy 155069 Jun 4 2021 nwcsaf_msevi-medi-20190317_1800.nc
-rw-rw-r-- 1 squidy squidy 178946 Jun 4 2021 nwcsaf_msevi-medi-20190318_1100.nc
[59]:
cc.set_project_parameters(input_source_folder = "../data/example_data")
In a next step, we can let the classifier predict labels from the input files we have specified. This is again done with the run_prediction_pipeline()
method.
If we want the classifier to automatically generate a list of input files and therefore set the option create_filelist
to True
.
[60]:
cc.run_prediction_pipeline(create_filelist = True)
Input filelist created!
Classifier loaded!
Masked indices set!
/home/squidy/.local/share/virtualenvs/CTyPyTool-idbccFiL/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
warnings.warn(
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20190318_1100_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20190317_1800_predicted.nc
Accessing predicted labels
The predicted labels are stored in the folder of the classifier we are using. They are located in the subfolder labels
.
[61]:
%%bash
ls ../classifiers/TreeClassifier/labels
nwcsaf_msevi-medi-20190317_1800_predicted.nc
nwcsaf_msevi-medi-20190318_1100_predicted.nc
[ ]: