S&P 500 Correlation Since The Election, with R

sp500

Data Source: Capital IQ, chart by R

Active managers take heart: After a challenging 2016, there’s been a substantial drop-off in correlation between S&P 500 member components since November.  The correlation between S&P 500 components and the index itself is down to .53.

2016-12-22 0.5295475
2016-12-23 0.5300660
2016-12-27 0.5302740
2016-12-28 0.5305382
2016-12-29 0.5298621
2016-12-30 0.5292927

Compare this chart with Barclays’ hedge fund return data over the same period (shown below) which suggests a breakdown in returns around mid-2015, just as stock correlations were rising.

By contrast, December 2016 was the best month for hedge fund returns since 2013.

Drawing a conclusion, if correlations remain low, hedge fund returns should improve in 2017.  And managers take heed: chips should come off the table if correlations rise again.

Barclay Hedge Fund Index:

hedge

The source of the correlation chart above is simply a re-work of the excellent work done by the SystematicInvestor blog.  I’m perpetually amazed by how many smart people have contributed to the shareware universe for financial analysis using R.  Hats off here, all I did was fiddle with the dates and formatting.

Transcript Sentiment History with R: Beta

hsentiment

The link to: beta version of the transcript sentiment analysis – now historical, and hosted on the Web on RStudio’s Shiny.  The chart takes about 15 seconds to generate.

This app pulls historical transcripts for a given stock ticker (earnings calls and conference transcripts) and displays a sentiment “score” (word value / number of words) based on the “bing” lexicon from the University of Chicago – along with a red regression line to give a sense of sentiment trend over time.  The graphic above shows the sentiment history of Zions bank over the past three years.  

It also outputs the transcript links in a table below the chart, so you can go read the source for yourself.

Please pm me with any ideas.  I’ve got buttons and individual transcript links going, but the interface design side obviously needs some work.  And if you get an error, simply wait a couple of seconds and try again.  There’s a transient error (the best kind of error) which I am debugging and the code isn’t yet as full of try – catch as I would like.  Hence, this is a Beta.  Stay tuned.

Next step: marry the chart with stock action…

Web Hosted R Applications with Shiny

In an effort to make these examples interactive, I signed up for an account with Shiny by RStudio (see link below).  Shiny is, for lack of a better analogy, a client-server implementation of R.

With Shiny, R scripts are “served” by a server file, and the user interface is handled in turn by a ui file.  It’s a pretty cool little system, and if you set up the files in the required schema you can quickly and easily access the power of R and share your apps with a wider audience.

For the first installment of interactive examples, I’ll just do that easy stock chart from the quantmod() library.

The link is here: simple interactive stock charting form hosted by Shiny

Change the ticker, and change the chart.  Of course, adding fields for date ranges and chart options would be nice, and if I get time away from indoor rock climbing I will do that.

Next I plan to port the conference call text and portfolio analytics examples etc. to this platform to continue my mission of showing how easy it is to use R for traditional securities analysis applications.  Stay tuned!

Shiny Studio for R built by RStudio

Fundamental Sharpe Ratio Screening in R

Another simple example of how easy it is to use R in the investment process.

In this case, I’m trying to come up with a way of sorting my target universe to improve stock selection, as opposed to an ad hoc approach to portfolio construction.

The quantmod() library sources four years of fundamental data from Google.

First step is to download and clean the data.  (Frankly, the getFinancials() function is of middling value as, for example, much of the income statement items for the Bank sector are NA.  In a production environment, a better way to attack the data gathering part of this study would be use Bloomberg and more than four years, and download the data into CSV form for inputting into R.)

Second step: I’m applying a simple Sharpe ratio-type formula of return / standard deviation to the growth rates of the income statement items from each year.  The idea is to see which company shows the most consistent growth rate with little variability in Revenue, Net Income, and EPS.  With a better data set we could extend this analysis to include gross profit, and perhaps even some balance sheet or cash flow items.

So, we loop through each ticker (I’m using high margin high growth financials for this example) and then generate the Sharpe ratios for each income statement item.

The last step is to somewhat arbitrarily sum the Sharpe ratios, then sort and plot the sums, so we can get a look at the “winner” of this sample data set.

As you can see, ICE and AXP fall to the bottom of the list and CBOE (hmmm worth a closer look at this for reasonableness) and Visa (makes sense) are at the top.  The reason, as shown in the second graphic, is ICE has negative Sharpe in net income and EPS, largely due to various acquisitions.  Were we to simply sort this by revenues, and/or adjust for acquisitions, the picture might look different.

Bibliography: Financial Analytics with R by Mark Bennett.

fundie

sharpes

library(quantmod)
library(tibble)

d <- 0
mufree <- 0.02
symbols <- c(“MA”,”AXP”,”V”,”VNTV”,”FDC”,”TSS”,”NDAQ”,”CME”,”ICE”,”CBOE”)
stmt <- “A”
basedate=NA
D=length(symbols)
for (d in 1:D) {
symbol <- symbols[d]
getFinancials(symbol,src=”google”)

net_income<-eval(parse(text=paste(symbol,’.f$IS$’,stmt,'[“Net Income”,]’,sep=”)))
base_date <- names(net_income[4])
y4 <- net_income[1]
y3 <- net_income[2]
y2 <- net_income[3]
y1 <- net_income[4]
if (d==1) {
ni_ret <- data.frame(symbol, base_date, y1,y2,y3,y4)
} else {
ni_ret <- rbind(ni_ret,data.frame(symbol, base_date, y1,y2,y3,y4))
}

revenue<-eval(parse(text=paste(symbol,’.f$IS$’,stmt,'[“Revenue”,]’,sep=”)))
base_date <- names(revenue[4])
y4 <- revenue[1]
y3 <- revenue[2]
y2 <- revenue[3]
y1 <- revenue[4]
if (d==1) {
rev_ret <- data.frame(symbol, base_date, y1,y2,y3,y4)
} else {
rev_ret <- rbind(rev_ret,data.frame(symbol, base_date, y1,y2,y3,y4))
}

d_norm_eps<-eval(parse(text=paste(symbol,’.f$IS$’,stmt,'[“Diluted Normalized EPS”,]’,sep=”)))
base_date <- names(d_norm_eps[4])
y4 <- d_norm_eps[1]
y3 <- d_norm_eps[2]
y2 <- d_norm_eps[3]
y1 <- d_norm_eps[4]
if (d==1) {
dneps_ret <- data.frame(symbol, base_date, y1,y2,y3,y4)
} else {
dneps_ret <- rbind(dneps_ret,data.frame(symbol, base_date, y1,y2,y3,y4))
}

}

calc_growth <- function(a,b) {
if(is.na(a) || is.infinite(a) ||
is.na(b) || is.infinite(b) || abs(a) < .001)
return(NA)
if(sign(a) == -1 && sign(b) == -1)
return((-abs(b)/abs(a)))
if(sign(a) == -1 && sign(b) == +1)
return(NA)#((-a+b)/-a)
if(sign(a) == +1 && sign(b) == -1)
return(NA)#(-(a+abs(b))/a)
return(round(abs(b)/abs(a),2)*sign(b))
}

ni_ret_g <- data.frame(ni_ret$symbol,ni_ret$base_date,calc_growth(ni_ret$y2/ni_ret$y1),calc_growth(ni_ret$y3/ni_ret$y2),calc_growth(ni_ret$y4/ni_ret$y3))
colnames(ni_ret_g) <- c(“symbol”,”base_date”,”y2″,”y3″,”y4″)

rev_ret_g <- data.frame(rev_ret$symbol,rev_ret$base_date,calc_growth(rev_ret$y2/rev_ret$y1),calc_growth(rev_ret$y3/rev_ret$y2),calc_growth(rev_ret$y4/rev_ret$y3))
colnames(rev_ret_g) <- c(“symbol”,”base_date”,”y2″,”y3″,”y4″)

dneps_ret_g <- data.frame(dneps_ret$symbol,dneps_ret$base_date,calc_growth(dneps_ret$y2/dneps_ret$y1),calc_growth(dneps_ret$y3/dneps_ret$y2),calc_growth(dneps_ret$y4/dneps_ret$y3))
colnames(dneps_ret_g) <- c(“symbol”,”base_date”,”y2″,”y3″,”y4″)

cols <- c(3,4,5)

sharpe_ni <- apply(ni_ret_g[,cols],1,mean)/apply(ni_ret_g[,cols],1,sd)
sharpe_rev <- apply(rev_ret_g[,cols],1,mean)/apply(rev_ret_g[,cols],1,sd)
sharpe_dneps <- apply(dneps_ret_g[,cols],1,mean)/apply(dneps_ret_g[,cols],1,sd)

all_sharpes <- rbind(sharpe_ni,sharpe_rev,sharpe_dneps)
colnames(all_sharpes) <- ni_ret_g[,1]
z <- colSums(all_sharpes)
tb<-as_tibble(z)
tb$symbol<-ni_ret_g[,1]

tb %>%
filter(value > 1) %>%
mutate(symbol = reorder(symbol, value)) %>%
ggplot(aes(symbol, value, fill = value)) +
geom_bar(alpha = 0.8, stat = “identity”) +
labs(y = “2012-2016: Sum of Three Fundamental Sharpes – Net Income, Revenue,and EPS”,
x = NULL) +
coord_flip()

all_sharpes

The calc_growth function comes from the following excellent book:

https://www.amazon.com/Financial-Analytics-Building-Laboratory-Science/dp/1107150752/ref=sr_1_1?s=instant-video&ie=UTF8&qid=1482169831&sr=8-1&keywords=financial+analytics+with+r

Using quadprog() in R for optimization

Here’s an example of simple portfolio optimization in R using quadprog().  This example largely derives from a fantastic entry level book called “Analyzing Financial Data and Implementing Financial Models Using R” by Clifford Ang (Amazon link below).  Here I’ve modified the function a little to change the tickers (S&P, bonds, XLF, and QQQ etfs) and only allow short selling.  It uses quantmod() to get data, the to.monthly() function (how easy) and the quantprog() library for the quadratic programming solution.

As you can see, the tangency portfolio weights (shown below) improve the sharpe ratio and returns, with a slight increase in volatility.

I plan to genericize and extend this example to multiple securities in the next week or so.

Scroll down for the amazon link to Ang’s book.  It is an easy read.

quadtangency

library(quantmod)

data.SPY data.SPY[c(1:3,nrow(data.SPY)),]
SPY.monthly SPY.monthly SPY.ret names(SPY.ret)<-paste(“SPY.Ret”)
SPY.ret[c(1:3,nrow(SPY.ret)),]

data.TLT data.TLT[c(1:3,nrow(data.TLT)),]
TLT.monthly TLT.monthly TLT.ret names(TLT.ret)<-paste(“TLT.Ret”)
TLT.ret[c(1:3,nrow(TLT.ret)),]

data.XLF data.XLF[c(1:3,nrow(data.XLF)),]
XLF.monthly XLF.monthly XLF.ret names(XLF.ret)<-paste(“XLF.Ret”)
XLF.ret[c(1:3,nrow(XLF.ret)),]

data.QQQ data.QQQ[c(1:3,nrow(data.QQQ)),]
QQQ.monthly QQQ.monthly QQQ.ret names(QQQ.ret)<-paste(“QQQ.Ret”)
QQQ.ret[c(1:3,nrow(QQQ.ret)),]

Ret.monthly Ret.monthly[c(1:3,nrow(Ret.monthly)),]
mat.ret mat.ret[1:3,]
colnames(mat.ret)<-c(“SPY”,”TLT”,”XLF”,”QQQ”)
head(mat.ret)
VCOV VCOV
avg.ret rownames(avg.ret) colnames(avg.ret) avg.ret
min.ret min.ret
max.ret max.ret
increments tgt.ret head(tgt.ret)
tgt.sd tgt.sd
wgt head(wgt)
colnames(wgt)

library(quadprog)
for (i in 1:increments) {
Dmat dvec Amat bvec soln tgt.sd[i] wgt[i,] }
head(tgt.sd)
head(wgt)
CHECK.wgt CHECK.wgt

tgt.port head(tgt.port)
with.short.tgt.port

riskfree tgt.port$Sharpe minvar.port

head(tgt.port)

tangency.port

eff.frontier minvar.port$tgt.ret)
plot(x=tgt.sd,y=tgt.ret,xlab=”Risk”,ylab=”Return”,
main=”Mean Variance Efficient Frontier with Short Selling”)
abline(h=0,lty=1)
points(x=minvar.port$tgt.sd,y=minvar.port$tgt.ret,pch=17,cex=3)
points(x=tangency.port$tgt.sd,y=tangency.port$tgt.ret,pch=19,cex=3)
points(x=eff.frontier$tgt.sd,y=eff.frontier$tgt.ret)
text(x=tangency.port$tgt.sd,y=tangency.port$tgt.ret,”Tangency”, pos=4)
text(x=minvar.port$tgt.sd,y=minvar.port$tgt.ret,”Minimum Variance”,pos=4)

tangency.port
minvar.port

https://www.amazon.com/Analyzing-Financial-Implementing-Springer-Economics/dp/3319140744/ref=sr_1_1?ie=UTF8&qid=1482006627&sr=8-1&keywords=clifford+ang

Earnings Call Sentiment Analysis with R

As an upgrade to the Python-based earnings transcripts web scraper:

A short vignette to show how simple it is to use R for investment analysis.

In this case, I’m scraping Visa’s q4 2016 earnings call (Visa is on an off fiscal year) from SeekingAlpha and running a sentiment lexicon against it to score the positive and negative sentiments of all the words in the text.  I filtered out words with no sentiment attached.

Data visualization below.  The next step would be to do this for a large list of securities and sort by the total sentiment scores of each security, to get a sorted list of whose calls were most positive or negative in the sample group.  Stay tuned! vsentiment

 
library(tidytext)
library(dplyr)
library(RCurl)
library(XML)
library(ggplot2)
library(tidyr)

# download html
html <- getURL(“http://seekingalpha.com/article/4014422-visa-v-q4-2016-results-earnings-call-transcript&#8221;, followlocation = TRUE)

# parse html
doc = htmlParse(html, asText=TRUE)
plain.text <- xpathSApply(doc, “//p”, xmlValue)

#convert html to dataframe
text_df <- data_frame(plain.text)

#use tidy text to index each line
tidy_text <- text_df %>%
unnest_tokens(word, plain.text)

#for each line, count each word
tidy_text <- tidy_text %>%
count(word, sort=TRUE)

#inner join with “bing” sentiment lexicon
tidy_sentiment <- tidy_text %>%
inner_join(get_sentiments(“bing”)) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive – negative)
tidy_sentiment_ordered <- tidy_sentiment[order(tidy_sentiment$sentiment),]

#subset words and sentiment for graphing
tsosub <- subset(tidy_sentiment_ordered,select=c(“word”,”sentiment”))

#plot the subset, with a filter
tsosub %>%
filter(sentiment > 2 | sentiment < -2) %>%
mutate(word = reorder(word, sentiment)) %>%
ggplot(aes(word, sentiment, fill = sentiment)) +
geom_bar(alpha = 0.8, stat = “identity”) +
labs(y = “Contribution to sentiment”,
x = NULL) +
coord_flip()

Transcript web scraper, python

This is an example of a script written locally.  This script looks up a symbol list and pulls conference call transcripts from seekingalpha.  The individual transcripts are then parsed by each word with regex into a csv file for storage.  The point is: you can search for a given word across multiple transcripts / stocks.  The next step here would be a sentiment heuristic.  More on that later.

import urllib2, sys
from bs4 import BeautifulSoup
import re
from collections import Counter
import time
import csv
import pandas as pd
from urllib2 import HTTPError

symbol_list=['STI','USB','RF','FITB']
df=pd.DataFrame(columns=['ticker','word','count'])

for symbol in symbol_list:
    site='http://seekingalpha.com/symbol/'+symbol+'/earnings/transcripts'
    print symbol
    hdr={'User-Agent':'Mozilla/5.0'}
    req=urllib2.Request(site,headers=hdr)
    try:
        page=urllib2.urlopen(req)
        soup=BeautifulSoup(page,'lxml')
        for link in soup.find_all('a'):
            x=link.get('href')
            #print x
            if isinstance(x,basestring):
                wordlist=['transcript','2016','q3']
                if all(x.find(s)>=0 for s in wordlist):
                    parse_site='http://seekingalpha.com/'+x+'?part=single'
                    print parse_site
                    parse_req=urllib2.Request(parse_site,headers=hdr)
                    try:
                        parse_page=urllib2.urlopen(parse_req)
                        parse_soup=BeautifulSoup(parse_page,'lxml')
                        [s.extract() for s in parse_soup(['style','script','[document]','head','title'])]
                        visible_text=parse_soup.text
                        lst=re.findall(r'\b[^\W\d_]+\b',visible_text)
                        lst=[x.lower() for x in lst]
                        counter=Counter(lst)
                        occs=[(symbol,word,count) for word,count in counter.items() if count > 0]
                        occs.sort(key=lambda x:x[0],reverse=False)
                        df=pd.DataFrame(occs)
                        df.to_csv('out.csv',header=False,mode='a')
                    except HTTPError as e:
                        print 'Error code:',e.code
                    time.sleep(5)
    except HTTPError as e:
        print 'Error code:',e.code

Where to Start

First, combine intellectual curiosity with a stubborn willingness to succeed though multiple failures.

I find it odd that “data science” – usually known in the business as quant or high frequency – trading tools are not used in traditional excel-based long short equity funds’ analytical work.  To me it’s the same job.

Sign up at quantopian and review the video tutorials.  This is a Python site designed to facilitate backtesting, but you can learn a lot more general stuff along the way.

http://www.quantopian.com

Concepts to master: Pandas, a data frame concept inherited from R which function like virtual excel spreadsheets, but which also have a few stock-data specific functions.

http://pandas.pydata.org/

You can install Python natively on a pc and work with local code – which will be necessary for most web scraping tasks – vs quantopian.  I’d suggest using the anaconda install because it comes with Python notebooks (browser based editor) and anaconda automatically updates packages.  Don’t bother with Python on a Mac.  Just don’t.

https://www.continuum.io/downloads

Concepts to master: beautifulsoup library for web scraping, which is really a wheelhouse of what Python can do.  Run the install from a command line (not the interpreter).

pip install beautifulsoup

Install R Studio.  Features notebooks, which allows you to save code in chunks.  R is the descendant of S, which you probably used in your stats classes.  It has a billion powerful libraries for stock stuff.  I’m particularly enthused with R, the charting is amazing.

RStudio

Concepts to master: the quantmod() library for R is an important one to look at.

http://www.quantmod.com/

Good luck!  Message me with questions.  Some code samples below: