6th Try (Sensitivity and Specificity same algo, 2 separate runs flipped bit)
Found another single variable that accounts for 97.4% of my samples diagnostic’s
End State Renal Disease
Cut off was .43
I modified my algorithm to find the cutoffs from the training partition vs the cross validation test partitions. I was still trying to solve for specificity, but alas, it converges on sensitivity.
I’m not overly worried about it. I can always recode the response variable and converge on sensitivity.
Okay… so I tried or sensitivity and it converged on specificity.
The only thing I can think of is changing to training cutoffs vs cross validation test partitions for cutoff was it.
I use a function optimalCutoff and a var optimizeFor = Zeros or Ones depending on a flag at the beginning.
I also check for specificity or sensitivity on confusionMatrix output based on this flipped flag.
Anyways… if I flip this flag, it does either sensitivity or specificity. So that is working. Why it’s inverted from the default parameters… still not sure
But this IS better that it ALWAYS converging on sensitivity.
The reason for the slightly different results each pass is due to imputed variables I suspect and my static spss dataset which is an output of just one imputed set? I use the same seed (poor programmming practice I know, but data science is supposed to converge on the same results regardless of randomization, aka cross validation). In this case not so much the factors, but the classification scores (confusion matrix results).
5th Try (Solved Sensitivity)
Note: “3rd Try” is my specificity model (I coded the 1’s and 0’s backwards and mistook it for the true sensitivity model I was looking for)
An even better model
- Serum Creatine
Optimizing for cutoffs
I do not understand why. But when I tell R to test for specificity, I converge on sensitivity
The Answer is
- Constant Term
- Cutoff: .475
- Sensitivity: 93.5%
All metrics are derived from test partitions from cross validation (to include cutoffs). I’m hitting the ball all right.
3rd try (Solved Specificity)
- Cross Validation
- Binary Logistic
This is my 3rd try at that Medical Data problem and the ask was to test for sensitivity.
This is my final result
I hit the ball out of the park.
Problem with this: I forgot my 1’s were incorrectly coded for the class of non interest.
Optimized for sensitivity using cross validation 🙂
Code is saved on my private github. It was a lot of trial and error, but I got it.
Using guide here: http://www.sthda.com/english/articles/36-classification-methods-essentials/150-stepwise-logistic-regression-essentials-in-r/
I initially tried cross validation using this hash matrix, but it’s still a WIP
Fundamental weakness: not optimized for sensitivity. Bugs in code. 2 level variables shouldn’t be factors.
I finished a work sample challenge for medical data
I even imputed data!
Yeah, I consider myself a data scientist
Fundamental weakness: didn’t use factors