Experimental machine learning study for liver disease classification¶
This project aims to evaluate a portfolio of machine learning algorithms for liver diseases classification. The experimental study consists of 3 experimental scenarios:
1) Multiclass classification of the largest dataset 2) Binary classification of the largest dataset 3) Binary classification of all 3 datasets
Prerequisites¶
sudo snap install astral-uv --classic
Datasets¶
The following datasets are being used: - indian-liver-disease-dataset - hcv-data - liver-data
How to run?¶
To evaluate the experiments, first set up the environment:
source setupenv
Then the liver command will be present and you can run liver -h to see what it does:
$ liver -h
usage: liver [-h] --experiment {1,2,3} [--debug]
[--learners-group {logistic-regression,random-forest,tree,gradient-boosting,neural-network,svm} [{logistic-regression,random-forest,tree,gradient-boosting,neural-network,svm} ...]]
[--config {default,global,experiment1,experiment2,experiment3}] [--plot-only]
options:
-h, --help show this help message and exit
--experiment {1,2,3}
--debug Enable debug logging
--learners-group {logistic-regression,random-forest,tree,gradient-boosting,neural-network,svm} [{logistic-regression,random-forest,tree,gradient-boosting,neural-network,svm} ...]
Run specific family(ies) of learners.
--config {default,global,experiment1,experiment2,experiment3}
Configuration to use for the experiment
--plot-only Plot only on already existing results.
To check the generated results/report, run:
wslview results/experiment<experiment>-<config>.csv
wslview reports/experiment<experiment>-<config>.html
Configuration files¶
Configuration files are located in the configs folder.
The structure is as follows:
{
"logistic-regression": {
"penalty": [
"l2"
],
"C": [
1.0
],
"class_weight": [
null
]
},
"random-forest": {...},
"svm": {...},
"gradient-boosting": {...},
"tree": {...},
"neural-network": {...}
}
Known issues and limitations¶
- Disabled GUI options should usually be represented by omitting the parameter from the JSON, not by passing
null. PassingnullbecomesNonein Python and is only valid for parameters whose API explicitly acceptsNone, such asclass_weight,max_depth, orrandom_state. Orange.evaluation.testing.sample()uses a different splitting implementation/row-selection logic than Orange GUI’s Data Sampler widget, so the samen=0.8,stratified=True, andrandom_state=42do not guarantee the same train/test rows.
TODO¶
- Make
defaultJSON files for Orange and for Python, i.e.default-owanddefault-py. - Make full default JSON with all parameters described for each learner.