Category Archives: My Work

Neo4j Graph Database


Match 5 conditions
Match 4 conditions


I create relationships with a dataframe’s columns if a given record (in this case states) is above 1 median absolute deviation above the median towards a beneficial outcome (for example, low unemployment, population, or high income).

Here is the code to get the graph created (I used a lot of questions with chatgpt to get to this result, but I now know how to implement it properly with this POC).

I’m enjoying the ways you can slice and dice a graph database

This is showcasing states (and the regions I joined them to) that identify as being 1 median absolute deviation above the median (a binary factor derived in what would otherwise be known as a helper column, which is all etl logic done in python atm). This way of splitting the data made the most sense to me for non normal distributions (for a POC). Else median’s are too wishy washy as their center’s can change and you would get a different mix, where-as this is more akin to identifying upper and lower groups.

Quantum Cross Validation

So here is why chatGPT is so disruptive.

You can basically ask it advanced scientific questions about concepts you don’t fully grasp but you know how the technology has been used in certain areas. Case in point, quantum cross validation.

I figured I know about IBM’s qiskit, and I know about quantum cross validation, but I’ve never used qiskit and unsure how I would setup the problem.

So… I asked chatGPT my understanding of the problem,

Then I recalled the bright idea to refine the prompt questions based on a feedback loop by volleying back inferences into chatGPT (essentially iterating over the inference system) asking it to rephrase–providing clarity where necessary–and to make any suggested scientific corrections (important: upped the “top p” to ensure it was using more resources to get a quality answer). Then I fed this refined question back into chatGPT until “is the above information accurate, clarify where it’s not” was answered as “True”, and there was nothing left to clarify and then I finally took away what it coded me.

I have yet to test this as I’m still working towards finetuning my own GPT-Neo, but this is what I’ve been hoping people understand about these system’s. They have generalized on the relationships in language to basically query up these results for us. The more data you have exposure to, the more relationships derived the wider the set of questions the system can respond to.

CAPM Portfolio’s

I know how to build a Markowitz Weighted Portfolio, and how to ‘hack it’, just up the quantities associated with higher beta’s which represents the Risk Premium (i.e. how much over the Risk Free Rate is expected as return, aka known as risk premium of the market, based on the DGS3MO).

But I let it resolve to optimal sharpe ratio and simply display the beta’s as derived from MDYG (SP1500).

So based on CAPM Expected Return (Average Risk Premium for past 5 years is .0142 (1.42%), the CAPM return is 4.33% + 1.42% * Portfolio Beta of 1.00116592, which comes out to be 5.75% for next quarter.

A different forecast, one based on Markowitz simulations has 9% for next quarter.

Another forecast based on an expected return factor model forecasted results using a model that has 13% MAPE, the weighted forecasted return is 13% for next quarter (i.e. 13% +/- (13%^2) (i.e. 13% +/- 0.0169%)

What’s frustrating is knowing I hit the ball out of the park when it comes to CAPM portfolio’s and Markowitz, but to know that those in academia that actively trade are not fans of the material they are hamstrung to teach. So I get various strong opinions about what works. Very cult of personality about methodologies, but not me. I’m open to trying as much as I can just for the opportunity to learn.

The Inefficient Stock Market is a gold mine in terms of what factors to look for. I’ve been doing my own research (FRED data, commodities, foreign exchanges, indexes, sectors, SP1500 prices, fundamentals, financial statements, Critiques of Piotroski, French Fama 3 and 5 Factor Models, Arbitrate Pricing Theory). The book suggests improved/revised factor models using a mix of financials and fundamentals offering 30 to look out for.

If it works and proves to match the projected expected returns within the risks shown. Then this could be used to borrow money on margin call knowing your returns are modeled/controlled for and you can make money on the spread, but it’s risky. Borrowed money is usually at the Risk Free Rate, so you aim for a risk premium return by controlling for risk.

The philosophy behind the filters is, “this vs that. Bifurcation.” Split everything somewhat subjectively to a simple filter no matter how complex the calculation is on the back end, aka a 1 or 0 is coded for every value with default being 0 (such as na’s), and add these filters together across ETF’s and sift the top results. Which allows me to focus on revising and expanding individual logic in factors encapsulated in sql and/or python files. For example modifying thresholds which affect proportion of occurrence for a given factor(field). If query logic is based on median’s, it’s easy to get 50% of the values every time for each factor.

Stock Screener Factor Model

I finished my factor model, at least I coded up all the metrics. Revisions and additions will likely follow, but atm I have about 15 indicators that all have equal weight, and the highest ranking stocks are listed at the top.

A summary of what I have (I gleaned the requirements from here and but I also included a few extra bells and whistles (like average mean buy/sell recommendation from analysts, a trending indicator based on a quarterly rolling correlation with date, and preferred cycles chosen based on business cycle)

* Positive three-year average retained earnings

* Sum of TTM cash flow from operations and cash flow from investments greater than 10% of revenue

* At least eight of last twelve quarters’ EPS greater than same quarter previous year

* Cash flow from operations greater than net income each of last three fiscal years

* TTM EBITDA greater than one-third of total debt

* Current ratio greater than one

* TTM equity purchased greater than equity issued

* TTM gross margin greater than subsector median

* Average five-year asset turnover greater than subsector median

Stock Database

I finished the database I was working on for stock market data.

for the sp1500
SEC filings for financial statements
as well as what yahoo offers (financial statements for annual and quarterly, earnings trend estimates)
fred data for econometrics

the whole etl job finishes now in about 30 minutes which I’ve encapsulated into a single folder

I intend to use tableau to parse through this and create some choice dashboards

once I finalize on the dashboards, I then intend to migrate them over to flask

Tableau Dashboarding of Sectors

I’ve been working on my stock data.

Some sql queries used for the data mart.

I have a lot more information in it than simply sector information.

What I’m showcasing here is a special metric based solely on a rolling quarterly return shown as a ratio (1 = no return) alongside the top performing stocks based on this metric as well as sectors as well as the rolling quaterly return shown as a line chart and how the sectors have performed over time.