This is a follow-up from my previous post explaining my concept of a CAPE <> F-Score <> Triple Top equity trading strategy.
In this post, I'll walk through the coding implementation of generating F-Scores in python. In this example I'll focus specifically on the S&P 500, however, the general process could be utilized on any equity market. Additionally, while there are a plethora of ways to aggragate financial data, such as with an API, I'm going to keep this simple & straight-forward and scrape data from Yahoo Finance.
To start, lets take a second and discuss the modules we'll need for this section of the strategy. You'll need to install them before running your script. These modules include:
Implementing these modules into our script is as easy as including them as your first few lines of code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
Once you have the appropriate packages installed, the first step in our process is to gather the tick symbols for the equities we are going to parse through to calculate F-Scores. To do this, we're going to create a list of tickers from the stocks that currently comprise the S&P 500. Here again, there are a ton of ways to do this, but one of the most straight forward wasy to do so is to scrape the data from Wikipedia, like so:
tickers = []
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
page = requests.get(url)
page_content = page.content
soup = BeautifulSoup(page_content,'html.parser')
tabl = soup.find("table", {"class" : "wikitable sortable"})
links = tabl.findAll('a',{"class" : "external text", "rel" : "nofollow"})
for link in links:
if link.get_text() != "reports":
tickers.append(link.get_text())
Recognizing that this is a pretty bulky way of creating a list of tickers, here is an alternative way of doing the same thing with much less code:
pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
tickers = sp500_wiki[0]['Symbol'].tolist()
Now that we've created a list of thetickers
, our next step is to creat a few dictionaries to store the financial data we'll scrape from Yahoo Finance. We'll create three dictionaries that we'll reference later:
financial_dir_cy = {} #directory to store current year's information
financial_dir_py = {} #directory to store last year's information
financial_dir_py2 = {} #directory to store information from two years ago
It's very important that we create all three dictionaries of previous performance, as generating F-Scores relies on comparing the performance of financial metrics over a specific amount of time.
The next step in the process is to scrape the appropriate financial data for each ticker from Yahoo Finance. F-Score screening criteria are split into three main areas of financial performance/health and can be explained as follows:
Profitability Signals
-
Net Income - Score 1 If There Is Positive Net Income In The Current Year.
-
Operating Cash Flow - Score 1 If There Is Positive Cash Flow From Operations In The Current Year.
-
Return On Assets - Score 1 If The Roa Is Higher In The Current Period Compared To The Previous Year.
-
Quality Of Earnings - Score 1 If The Cash Flow From Operations Exceeds Net Income Before Extraordinary Items.
Leverage, liquidity and source of funds
-
Decrease In Leverage - Score 1 If There Is A Lower Ratio Of Long-Term Debt In The Current Period Compared To Value In The Previous Year.
-
Increase In Liquidity - Score 1 If There Is A Higher Current Ratio This Year Compared To The Previous Year.
-
Absence Of Dilution - Score 1 If The Company Did Not Issue New Shares (Equity) In The Preceding Year.
Operating efficiency
-
Gross Margin - Score 1 If There Is A Higher Gross Margin Compared To The Previous Year.
-
Asset Turnover - Score 1 If There Is A Higher Asset Turnover Ratio Year On Year (As A Measure Of Productivity).
To assess the financial performance/health of any given company based on these criteria, we need to gather information from all three major financial documents, namely a companies balance sheet, incomes statement, and cashflow statement. To do so, we'll parse through our list of S&P 500 tickers and scrape data from Yahoo Finance for the time periods we indicated in our three dictionaries. The process to do this is as follows:
for ticker in tickers:
try:
print("scraping financial statement data for ",ticker)
temp_dir = {}
temp_dir2 = {}
temp_dir3 = {}
#getting balance sheet data from yahoo finance for the given ticker
url = 'https://in.finance.yahoo.com/quote/'+ticker+'/balance-sheet?p='+ticker
page = requests.get(url)
page_content = page.content
soup = BeautifulSoup(page_content,'html.parser')
tabl = soup.find_all("div", {"class" : "M(0) Whs(n) BdEnd Bdc($seperatorColor) D(itb)"})
for t in tabl:
rows = t.find_all("div", {"class" : "rw-expnded"})
for row in rows:
temp_dir[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[1]
temp_dir2[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[2]
temp_dir3[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[3]
#getting income statement data from yahoo finance for the given ticker
url = 'https://in.finance.yahoo.com/quote/'+ticker+'/financials?p='+ticker
page = requests.get(url)
page_content = page.content
soup = BeautifulSoup(page_content,'html.parser')
tabl = soup.find_all("div", {"class" : "M(0) Whs(n) BdEnd Bdc($seperatorColor) D(itb)"})
for t in tabl:
rows = t.find_all("div", {"class" : "rw-expnded"})
for row in rows:
temp_dir[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[1]
temp_dir2[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[2]
temp_dir3[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[3]
#getting cashflow statement data from yahoo finance for the given ticker
url = 'https://in.finance.yahoo.com/quote/'+ticker+'/cash-flow?p='+ticker
page = requests.get(url)
page_content = page.content
soup = BeautifulSoup(page_content,'html.parser')
tabl = soup.find_all("div", {"class" : "M(0) Whs(n) BdEnd Bdc($seperatorColor) D(itb)"})
for t in tabl:
rows = t.find_all("div", {"class" : "rw-expnded"})
for row in rows:
temp_dir[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[1]
temp_dir2[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[2]
temp_dir3[row.get_text(separator='|').split("|")[0]]=row.get_text(separator='|').split("|")[3]
#combining all extracted information with the corresponding ticker
financial_dir_cy[ticker] = temp_dir
financial_dir_py[ticker] = temp_dir2
financial_dir_py2[ticker] = temp_dir3
except:
print("Problem scraping data for ",ticker)
This is a good amount of code so it may seem daunting, but if you take the time to look through it you'll see that the individual steps are actually very simple.
I parse through our list of tickers and collect this financial data from the balance sheet, income statment and cashflow statement, then I create tables for all of the information based on the year, and store it in temparary dictionaries. After I've parsed through our tickers list, collected the information, and stored it in temparary dictionaries, I then compbine the extracted info with the correspending ticker in our orginal dictionaries.
Do you remember earlier when we made a few dictionaries that I said we'd refernce later? Well, the next step, and the final step I talk about in this post, is to store this newly scraped data in seperate pandas dataframes for each year that we gathered financial data. To do this we'll use the three previous dictionaries we made and create three dataframes for the various years.
Note: You'll also notice that we also update thetickers
list based on only those tickers whose values were successfully extracted
#storing information in pandas dataframe
combined_financials_cy = pd.DataFrame(financial_dir_cy)
#combined_financials_cy.dropna(axis=1,inplace=True) #dropping columns with NaN values
combined_financials_py = pd.DataFrame(financial_dir_py)
#combined_financials_py.dropna(axis=1,inplace=True)
combined_financials_py2 = pd.DataFrame(financial_dir_py2)
#combined_financials_py2.dropna(axis=1,inplace=True)
tickers = combined_financials_cy.columns #updating the tickers list based on only those tickers whose values were successfully extracted
This process gives us the information we need about the financial performance/health of each ticker in our list of the S&P 500. With this information in hand, now we can begin to generate an F-Score for each ticker based on the specific criteria previously mentioned.
In recognition that this post is getting pretty long, I'll break and pick it up from here next time.
Thanks for reading!
DISCLAIMER: To be brutally honest, I do not know if this strategy can consistantly generate abnormal risk-adjusted rates of return. I am simply sharing this idea so others can contest or explore its potential usefulness. The information in this post does not constitute investment advice. I will not accept liability for any loss or damage, including without limitation any loss of profit, which may arise directly or indirectly from use of or reliance on such information.