Set up a database and store downloaded financial data for future use


Once the freely available data is downloaded from the web, it is better to setup a database to store that information for future use.

Download S&P500 using yfinance

import pandas as pd
import warnings
warnings.filterwarnings('ignore')

def load_data(url):
    html = pd.read_html(url, header=0)
    return html

# Load the list of S&P 500 companies
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp500 = load_data(url)[0]
sp500.head()

SymbolSecurityGICS SectorGICS Sub-IndustryHeadquarters LocationDate addedCIKFounded
0MMM3MIndustrialsIndustrial ConglomeratesSaint Paul, Minnesota1957-03-04667401902
1AOSA. O. SmithIndustrialsBuilding ProductsMilwaukee, Wisconsin2017-07-26911421916
2ABTAbbottHealth CareHealth Care EquipmentNorth Chicago, Illinois1957-03-0418001888
3ABBVAbbVieHealth CareBiotechnologyNorth Chicago, Illinois2012-12-3115511522013 (1888)
4ACNAccentureInformation TechnologyIT Consulting & Other ServicesDublin, Ireland2011-07-0614673731989
# retrieve the S&P500 symbols
sp500['Symbol'] = sp500['Symbol'].str.replace('.','-')
symbols_list = sp500['Symbol'].unique().tolist()
# define start date and end date
end_date = '2024-04-26'
start_date = pd.to_datetime(end_date) - pd.DateOffset(365*10)

# download all S&P500 data in the period
import yfinance as yf
df = yf.download(tickers=symbols_list,
                 start=start_date,
                 end=end_date).stack()
df.head()
[*********************100%%**********************]  503 of 503 completed

PriceAdj CloseCloseHighLowOpenVolume
DateTicker
2014-04-29A35.04953438.12589338.30472237.17453438.1187404688612.0
AAL33.47674235.50999835.65000234.97000135.2000018994200.0
AAPL18.63333321.15464221.28500021.05392821.205000337377600.0
ABBV34.03498551.36999951.52999950.75999850.9399995601300.0
ABT31.83151838.54000138.72000138.25999838.3699994415600.0

Setup Sqlite3 database

SQLite-database is the most efficient way to use as it is a small, fast, self-contained, high-reliability, full-featured SQL database engine which is implemented by C-language.

import sqlite3

# save the downloaded S&P500 data into the database
sp500_db = sqlite3.connect(database='sq500_data.sqlite')
df.to_sql(name="SP500",
          con=sp500_db,
          if_exists="replace",
          index=True)
1232390
# load the data into the memory from database
df_new = pd.read_sql_query(sql="SELECT * FROM SP500",
                  con=sp500_db,
                  parse_dates={"Date"})

df_new.head()

DateTickerAdj CloseCloseHighLowOpenVolume
02014-04-29A35.04953438.12589338.30472237.17453438.1187404688612.0
12014-04-29AAL33.47674235.50999835.65000234.97000135.2000018994200.0
22014-04-29AAPL18.63333321.15464221.28500021.05392821.205000337377600.0
32014-04-29ABBV34.03498551.36999951.52999950.75999850.9399995601300.0
42014-04-29ABT31.83151838.54000138.72000138.25999838.3699994415600.0

Author: wenvenn
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source wenvenn !