Set up a database and store downloaded financial data for future use

Publish Date: 2024-04-26

Update Date: 2024-05-03

Word Count: 562

Read Times: 3 Min

Read Count:

Once the freely available data is downloaded from the web, it is better to setup a database to store that information for future use.

Download S&P500 using yfinance

import pandas as pd
import warnings
warnings.filterwarnings('ignore')

def load_data(url):
    html = pd.read_html(url, header=0)
    return html

# Load the list of S&P 500 companies
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp500 = load_data(url)[0]
sp500.head()

	Symbol	Security	GICS Sector	GICS Sub-Industry	Headquarters Location	Date added	CIK	Founded
0	MMM	3M	Industrials	Industrial Conglomerates	Saint Paul, Minnesota	1957-03-04	66740	1902
1	AOS	A. O. Smith	Industrials	Building Products	Milwaukee, Wisconsin	2017-07-26	91142	1916
2	ABT	Abbott	Health Care	Health Care Equipment	North Chicago, Illinois	1957-03-04	1800	1888
3	ABBV	AbbVie	Health Care	Biotechnology	North Chicago, Illinois	2012-12-31	1551152	2013 (1888)
4	ACN	Accenture	Information Technology	IT Consulting & Other Services	Dublin, Ireland	2011-07-06	1467373	1989

# retrieve the S&P500 symbols
sp500['Symbol'] = sp500['Symbol'].str.replace('.','-')
symbols_list = sp500['Symbol'].unique().tolist()

# define start date and end date
end_date = '2024-04-26'
start_date = pd.to_datetime(end_date) - pd.DateOffset(365*10)

# download all S&P500 data in the period
import yfinance as yf
df = yf.download(tickers=symbols_list,
                 start=start_date,
                 end=end_date).stack()
df.head()

[*********************100%%**********************]  503 of 503 completed

	Price	Adj Close	Close	High	Low	Open	Volume
Date	Ticker
2014-04-29	A	35.049534	38.125893	38.304722	37.174534	38.118740	4688612.0
	AAL	33.476742	35.509998	35.650002	34.970001	35.200001	8994200.0
	AAPL	18.633333	21.154642	21.285000	21.053928	21.205000	337377600.0
	ABBV	34.034985	51.369999	51.529999	50.759998	50.939999	5601300.0
	ABT	31.831518	38.540001	38.720001	38.259998	38.369999	4415600.0

Setup Sqlite3 database

SQLite-database is the most efficient way to use as it is a small, fast, self-contained, high-reliability, full-featured SQL database engine which is implemented by C-language.

import sqlite3

# save the downloaded S&P500 data into the database
sp500_db = sqlite3.connect(database='sq500_data.sqlite')
df.to_sql(name="SP500",
          con=sp500_db,
          if_exists="replace",
          index=True)

# load the data into the memory from database
df_new = pd.read_sql_query(sql="SELECT * FROM SP500",
                  con=sp500_db,
                  parse_dates={"Date"})

df_new.head()

	Date	Ticker	Adj Close	Close	High	Low	Open	Volume
0	2014-04-29	A	35.049534	38.125893	38.304722	37.174534	38.118740	4688612.0
1	2014-04-29	AAL	33.476742	35.509998	35.650002	34.970001	35.200001	8994200.0
2	2014-04-29	AAPL	18.633333	21.154642	21.285000	21.053928	21.205000	337377600.0
3	2014-04-29	ABBV	34.034985	51.369999	51.529999	50.759998	50.939999	5601300.0
4	2014-04-29	ABT	31.831518	38.540001	38.720001	38.259998	38.369999	4415600.0