Definition of solubility classes is as follows:
Extremely Insoluble |
|
S7.4 |
< 0.01 mg/ml
|
Highly Insoluble |
0.01 mg/ml < |
S7.4 |
< 0.1 mg/ml
|
Insoluble |
0.1 mg/ml < |
S7.4 |
< 1 mg/ml
|
Slightly Soluble |
1 mg/ml < |
S7.4 |
< 10 mg/ml
|
Soluble |
|
S7.4 |
> 10 mg/ml
|
Training set size:
Sub-model and threshold |
Number of compounds
|
non-Extremely Insoluble (S7.4 > 0.01 mg/ml) |
5,310
|
non-Highly Insoluble (S7.4 > 0.1 mg/ml) |
5,310
|
non-Insoluble (S7.4 > 1 mg/ml) |
5,692
|
Soluble (S7.4 > 10 mg/ml) |
5,561
|
Internal validation set size:
Sub-model and threshold |
Number of compounds
|
non-Extremely Insoluble (S7.4 > 0.01 mg/ml) |
2,277
|
non-Highly Insoluble (S7.4 > 0.1 mg/ml) |
2,277
|
non-Insoluble (S7.4 > 1 mg/ml) |
2,441
|
Soluble (S7.4 > 10 mg/ml) |
2,378
|
Main sources of experimental data:
- Reference books:
- The Merck Index. An Encyclopedia of Chemicals, Drugs, and Biologicals, O'Neil, M.J., Smith, A., Heckelman, P.E., Budavari, S., Eds. 13th Edition, Merck & Co., Inc., Whitehouse Station, NJ, 2001
- Therapeutic Drugs, Dolery, C., Ed. 2nd Edition, Churchill Livingstone, New York, NY, 1999
- Clarke's Isolation and Identification of Drugs, Moffat, A.C., Jackson, J.V., Moss, M.S., Widdop, B., Eds. 2nd Edition, The Pharmaceutical Press, London, 1986
- Various articles from peer-reviewed scientific journals*
* - Articles reporting solubility models by other authors were the predominant type among analyzed literature, meaning that each publication contained larger collections of experimental data (usually in the order of tens or hundreds compounds) compiled from corresponding original experimental articles.
Internal Validation
Each of the sub-models has been internally validated using their separate internal validation set, constituting ca. 30% of the entire dataset available for a particular threshold model.
Table 1. Performance statistics for the various fractions of the internal validation set of the non-Extremely Insoluble (S7.4 > 0.01 mg/ml) sub-model of the ACD/Qualitative Solubility predictor.
Subset
|
Coverage of the entire internal validation set (N=2,277)
|
Observed*
|
Calculated probability (p)
|
>0.5
|
<0.5
|
RI > 0.3 N = 2,146
|
|
True |
1,800 (83.9%) |
51 (2.4%)
|
False |
71 (3.3%) |
224 (10.4%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.5 N = 1,800
|
|
True |
1,559 (86.6%) |
24 (1.3%)
|
False |
45 (2.5%) |
172 (9.6%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.75 N = 1,054
|
|
Genotoxic |
936 (88.8%) |
5 (0.5%)
|
Safe |
11 (1.0%) |
102 (9.7%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.
Table 2. Performance statistics for the various fractions of the internal validation set of the non-Highly Insoluble (S7.4 > 0.1 mg/ml) sub-model of the ACD/Qualitative Solubility predictor.
Subset
|
Coverage of the entire internal validation set (N=2,277)
|
Observed*
|
Calculated probability (p)
|
>0.5
|
<0.5
|
RI > 0.3 N = 2,037
|
|
True |
1,473 (72.3%) |
60 (2.9%)
|
False |
90 (4.4%) |
414 (20.3%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.5 N = 1,628
|
|
True |
1,236 (75.9%) |
29 (1.8%)
|
False |
46 (2.8%) |
317 (19.5%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.75 N = 908
|
|
Genotoxic |
725 (79.8%) |
4 (0.4%)
|
Safe |
9 (1.0%) |
170 (18.7%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.
Table 3. Performance statistics for the various fractions of the internal validation set of the non-Insoluble (S7.4 > 1 mg/ml) sub-model of the ACD/Qualitative Solubility predictor.
Subset
|
Coverage of the entire internal validation set (N=2,441)
|
Observed*
|
Calculated probability (p)
|
>0.5
|
<0.5
|
RI > 0.3 N = 2,153
|
|
True |
1,142 (53.0%) |
100 (4.6%)
|
False |
136 (6.3%) |
775 (36.0%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.5 N = 1,634
|
|
True |
918 (56.2%) |
47 (2.9%)
|
False |
67 (4.1%) |
602 (36.8%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.75 N = 847
|
|
Genotoxic |
525 (62.0%) |
7 (0.8%)
|
Safe |
15 (1.8%) |
300 (35.4%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.
Table 4. Performance statistics for the various fractions of the internal validation set of the Soluble (S7.4 > 10 mg/ml) sub-model of the ACD/Qualitative Solubility predictor.
Subset
|
Coverage of the entire internal validation set (N=2,378)
|
Observed*
|
Calculated probability (p)
|
>0.5
|
<0.5
|
RI > 0.3 N = 2,114
|
|
True |
688 (32.5%) |
98 (4.6%)
|
False |
99 (4.7%) |
1,229 (58.1%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.5 N = 1,649
|
|
True |
560 (34.0%) |
47 (2.9%)
|
False |
65 (3.9%) |
977 (59.2%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
RI > 0.75 N = 869
|
|
Genotoxic |
351 (40.4%) |
9 (1.0%)
|
Safe |
14 (1.6%) |
495 (57.0%)
|
|
Accuracy
|
|
|
Sensitivity
|
|
|
Specificity
|
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.