Category Archives: Data

Solid Time Series Correlations

Modified to include a rolling mean of the past 4 quarters return prior to inputting it into a rolling 4 period correlation

I came up with an ingenious way to find stable time series correlations.

Using panda’s rolling function.

10 years, reduce to quarterly data.
corr_set = [[df.pct_change(1).rolling(4).corr()]]

then derive 2 pair combinations of corr_set.columns
iterate over those pairs (this can take a long time, this is where clustering comes in handy)
For a pair’s given set of correlations (for each set of 4 quarters) (i.e. rolling windows of 4 quarters means any given point is a measure of a year’s worth of correlations), which is a single list (1 value for each date). then find the median from this list.
If the median correlation is critical. Then you know half the dataset has a solid correlation identified. You also know this measure is the median correlation of returns within any given year period

I also derived the percent each 4 period set had positive and negative correlations which was always above 50% for only one side of either negative or positive (all correlations converged on positive correlations). So I simply display the median correlation found.

This is over a 10 year period.

Chimera

Powerpoint

Report

Project homepage Readme

Using ICPSR polling data of 8th & 10th grade Americans. I transform from a set of predictor terms into what I call a “semiotic grid” of 1’s and 0’s which are then used to identify a class of 1’s and 0’s of desired outcomes of 3 specific response terms. GPA, gang fights, and (gasp) presence of psychedelic drug use.

I use monte carlo resampling to achieve class balancing and do a modified bestglm algorithm to get a wider set of terms via cross validation then through Cross Validated holdout analysis then tabulated. That’s just for initial factor reduction/pooling potential candidates. Then these terms go through more class balancing, cross validation once more using actual bestglm unmodified to arrive at a final regression formula as well as terms that are always population significant & closing with ROC.

I am offering the project as a type of open house to potential employers to determine if my skillset would be a good fit for what you hope to do with numbers.

Coefficient Finder

This model was derived from samples only. I did not dare touch the population until I was ready to do so. I used cross validation on training/validation and then ran a set of factors through a holdout partition doing cv as well. I then boiled these factors up across multiple samples and kept the common elements and then derived a population extraction of those samples and what you see is a scientifically reproducible significant factor finder. If I increase the specificity or change the seed, different elements will surface to the top, but the overall patterns should be the same. I’m really excited. I’ve been wanting to do this for a really long time. The only thing I have next to do is finish classification matrix and do more work on time series forecasting and then I will have learned what I really wanted to at Fullerton. This is for GPA. And those factors are

V7552,2,”DALY WEB FACEBK”,0

V7553,2,”hashtag#HR GAMING”,0

V7563,2,”hashtag#HR TALK CELL”,0

V8509,3,”FUTURE HOPELESS”,0

V8512,3,”SATISFD W MYSELF”,0

V8536,3,”FUTR R LIFE WRSE”,0

Correlated Factors that contribute to presence of substance use, GPA, and Gang Fights

https://github.com/thistleknot/Capstone-577/blob/master/readme.txt

V7133,1,"#X TRQL/LIFETIME",1        
V8527,3,"OFTN EAT GN VEG",0
V8529,3,"OFTN EXERCISE",0
V8531,3,"OFTN SLEEP <SHLD",0
V8505,3,"I ENJOY LIFE",0
V8514,3,"GOOD TO BE ALIVE",0

(x9)
V8526,3,"OFTN EAT BRKFST",0
V7142,1,"#X INHL/LIFETIME",1
V8526,3,"OFTN EAT BRKFST",0
V8527,3,"OFTN EAT GN VEG",0
V8528,3,"OFTN EAT FRUIT",0
V8529,3,"OFTN EXERCISE",0
V8530,3,"OFTN 7HRS SLEEP",0
V8531,3,"OFTN SLEEP <SHLD",0
V8502,3,"LIFE MEANINGLESS",0
V8565,3,"I AM OFTEN BORED",0
V7145,1,"#X STRD/LIFETIME",1
(x10)
V8502,3,"LIFE MEANINGLESS",0

(x9)
V8565,3,"I AM OFTEN BORED",0
V7139,1,"#X NARC/LIFETIME",1
V7552,2,"DALY WEB FACEBK",0

(x9)
V7553,2,"#HR GAMING",0
V7115,1,"#X LSD/LIFETIME",1
V8528,3,"OFTN EAT FRUIT",0
V8530,3,"OFTN 7HRS SLEEP",0

(x9)
V8565,3,"I AM OFTEN BORED",0
V7158,1,"#X INJECTOTH/LIF",1
V8565,3,"I AM OFTEN BORED",0
V8528,3,"OFTN EAT FRUIT",0

(x9)
V8530,3,"OFTN 7HRS SLEEP",0

V7112,1,"#XMJ+HS/LIFETIME",1
V7553,2,"#HR GAMING",0
V7101,1,"EVR SMK CIG,REGL",1
(x9)
V7507,3,"OFT WSH MOR FRND",0
V7118,1,"#X PSYD/LIFETIME",1
(x10)
V8512,3,"SATISFD W MYSELF",0

(x9)
V8528,3,"OFTN EAT FRUIT",0
V7097,1,"#X SED/BARB/LIFE",1
"V8509" - V8509,3,"FUTURE HOPELESS",0
"V8514" - V8514,3,"GOOD TO BE ALIVE",0
V7133,1,"#X TRQL/LIFETIME",1
V8509,3,"FUTURE HOPELESS",0
V8512,3,"SATISFD W MYSELF",0
V8514,3,"GOOD TO BE ALIVE",0
V8451,1,"#X BEER/LIFETIME",1
V8526,3,"OFTN EAT BRKFST",0
V8527,3,"OFTN EAT GN VEG",0
V8528,3,"OFTN EAT FRUIT",0
V8529,3,"OFTN EXERCISE",0
V8530,3,"OFTN 7HRS SLEEP",0
V8531,3,"OFTN SLEEP <SHLD",0
V8502,3,"LIFE MEANINGLESS",0
V8565,3,"I AM OFTEN BORED",0

(x9)
V8502,3,"LIFE MEANINGLESS",0
V7221,2,"R HS GRADE/D=1",0   

(x8)
V7127,1,"#X AMPH/LIFETIME",1
V7097,1,"#X SED/BARB/LIFE",1
V7133,1,"#X TRQL/LIFETIME",1
V7142,1,"#X INHL/LIFETIME",1
V7426,1,"#X SMKLESS/EVER",1
V7164,1,"#X MDMA/LIFETIM",1
V7145,1,"#X STRD/LIFETIME",1
V7158,1,"#X INJECTOTH/LIF",1
V7161,1,"#X ROHYPNOL/LIFE",1

(x7)
V7152,1,"#X H LIF USE NDL",1

(x6)
V7155,1,"#X H LIF W/O NDL",1
V8517 "FRQ GANG FIGHT" - 
(x10)
V7507,3,"OFT WSH MOR FRND",0

(x8)
V7501,3,"OFTN FEEL LONELY",0
V8514,3,"GOOD TO BE ALIVE",0

(x7)
V8505,3,"I ENJOY LIFE",0