
Stable Correlations
Max flow sorted correlations
- Correlations sorted on two axis:
- X: Max Flow from highest descending
- Y: Average correlation
V3
V2 https://gist.github.com/thistleknot/3695adf9114793e82f1eaddbecbd544e
I create relationships with a dataframe’s columns if a given record (in this case states) is above 1 median absolute deviation above the median towards a beneficial outcome (for example, low unemployment, population, or high income).
Here is the code to get the graph created (I used a lot of questions with chatgpt to get to this result, but I now know how to implement it properly with this POC).
I’m enjoying the ways you can slice and dice a graph database
This is showcasing states (and the regions I joined them to) that identify as being 1 median absolute deviation above the median (a binary factor derived in what would otherwise be known as a helper column, which is all etl logic done in python atm). This way of splitting the data made the most sense to me for non normal distributions (for a POC). Else median’s are too wishy washy as their center’s can change and you would get a different mix, where-as this is more akin to identifying upper and lower groups.
So here is why chatGPT is so disruptive.
You can basically ask it advanced scientific questions about concepts you don’t fully grasp but you know how the technology has been used in certain areas. Case in point, quantum cross validation.
I figured I know about IBM’s qiskit, and I know about quantum cross validation, but I’ve never used qiskit and unsure how I would setup the problem.
So… I asked chatGPT my understanding of the problem,
Then I recalled the bright idea to refine the prompt questions based on a feedback loop by volleying back inferences into chatGPT (essentially iterating over the inference system) asking it to rephrase–providing clarity where necessary–and to make any suggested scientific corrections (important: upped the “top p” to ensure it was using more resources to get a quality answer). Then I fed this refined question back into chatGPT until “is the above information accurate, clarify where it’s not” was answered as “True”, and there was nothing left to clarify and then I finally took away what it coded me.
I have yet to test this as I’m still working towards finetuning my own GPT-Neo, but this is what I’ve been hoping people understand about these system’s. They have generalized on the relationships in language to basically query up these results for us. The more data you have exposure to, the more relationships derived the wider the set of questions the system can respond to.
I made a custom indicator using empirical cumulative distribution function
I use it with split_sequences (Dr. Brownlee’s function) to create moving windows of size 20, then apply ECDF to each window and take the last value (i.e. a rolling ECDF) and reconstruct the original dates
I think it detects market tops and bottoms accurately
https://gist.github.com/thistleknot/05c6dd68aca1e20a9586c08c0f564ba6
I finished my factor model, at least I coded up all the metrics. Revisions and additions will likely follow, but atm I have about 15 indicators that all have equal weight, and the highest ranking stocks are listed at the top.
A summary of what I have (I gleaned the requirements from here https://seekingalpha.com/article/4407684-why-piotroskis-f-score-no-longer-works and https://www.labsterx.com/blog/fundamental-analysis-using-yahoo-finance/) but I also included a few extra bells and whistles (like average mean buy/sell recommendation from analysts, a trending indicator based on a quarterly rolling correlation with date, and preferred cycles chosen based on business cycle)
* Positive three-year average retained earnings
* Sum of TTM cash flow from operations and cash flow from investments greater than 10% of revenue
* At least eight of last twelve quarters’ EPS greater than same quarter previous year
* Cash flow from operations greater than net income each of last three fiscal years
* TTM EBITDA greater than one-third of total debt
* Current ratio greater than one
* TTM equity purchased greater than equity issued
* TTM gross margin greater than subsector median
* Average five-year asset turnover greater than subsector median
I finished the database I was working on for stock market data.
for the sp1500
SEC filings for financial statements
as well as what yahoo offers (financial statements for annual and quarterly, earnings trend estimates)
commodities
bonds
fred data for econometrics
the whole etl job finishes now in about 30 minutes which I’ve encapsulated into a single folder
I intend to use tableau to parse through this and create some choice dashboards
once I finalize on the dashboards, I then intend to migrate them over to flask
I’ve been working on my stock data.
Some sql queries used for the data mart.
https://gist.github.com/thistleknot/dcb21713f11dd3c30632dff990f2804d
https://gist.github.com/thistleknot/5678d338f57d9bbd2a6c3a6e91384857
I have a lot more information in it than simply sector information.
What I’m showcasing here is a special metric based solely on a rolling quarterly return shown as a ratio (1 = no return) alongside the top performing stocks based on this metric as well as sectors as well as the rolling quaterly return shown as a line chart and how the sectors have performed over time.