Once the freely available data is downloaded from the web, it is better to setup a database to store that information for future use.
Download S&P500 using yfinance
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
def load_data(url):
html = pd.read_html(url, header=0)
return html
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp500 = load_data(url)[0]
sp500.head()
| Symbol | Security | GICS Sector | GICS Sub-Industry | Headquarters Location | Date added | CIK | Founded |
---|
0 | MMM | 3M | Industrials | Industrial Conglomerates | Saint Paul, Minnesota | 1957-03-04 | 66740 | 1902 |
---|
1 | AOS | A. O. Smith | Industrials | Building Products | Milwaukee, Wisconsin | 2017-07-26 | 91142 | 1916 |
---|
2 | ABT | Abbott | Health Care | Health Care Equipment | North Chicago, Illinois | 1957-03-04 | 1800 | 1888 |
---|
3 | ABBV | AbbVie | Health Care | Biotechnology | North Chicago, Illinois | 2012-12-31 | 1551152 | 2013 (1888) |
---|
4 | ACN | Accenture | Information Technology | IT Consulting & Other Services | Dublin, Ireland | 2011-07-06 | 1467373 | 1989 |
---|
sp500['Symbol'] = sp500['Symbol'].str.replace('.','-')
symbols_list = sp500['Symbol'].unique().tolist()
end_date = '2024-04-26'
start_date = pd.to_datetime(end_date) - pd.DateOffset(365*10)
import yfinance as yf
df = yf.download(tickers=symbols_list,
start=start_date,
end=end_date).stack()
df.head()
[*********************100%%**********************] 503 of 503 completed
| Price | Adj Close | Close | High | Low | Open | Volume |
---|
Date | Ticker | | | | | | |
---|
2014-04-29 | A | 35.049534 | 38.125893 | 38.304722 | 37.174534 | 38.118740 | 4688612.0 |
---|
AAL | 33.476742 | 35.509998 | 35.650002 | 34.970001 | 35.200001 | 8994200.0 |
---|
AAPL | 18.633333 | 21.154642 | 21.285000 | 21.053928 | 21.205000 | 337377600.0 |
---|
ABBV | 34.034985 | 51.369999 | 51.529999 | 50.759998 | 50.939999 | 5601300.0 |
---|
ABT | 31.831518 | 38.540001 | 38.720001 | 38.259998 | 38.369999 | 4415600.0 |
---|
Setup Sqlite3 database
SQLite-database is the most efficient way to use as it is a small, fast, self-contained, high-reliability, full-featured SQL database engine which is implemented by C-language.
import sqlite3
sp500_db = sqlite3.connect(database='sq500_data.sqlite')
df.to_sql(name="SP500",
con=sp500_db,
if_exists="replace",
index=True)
1232390
df_new = pd.read_sql_query(sql="SELECT * FROM SP500",
con=sp500_db,
parse_dates={"Date"})
df_new.head()
| Date | Ticker | Adj Close | Close | High | Low | Open | Volume |
---|
0 | 2014-04-29 | A | 35.049534 | 38.125893 | 38.304722 | 37.174534 | 38.118740 | 4688612.0 |
---|
1 | 2014-04-29 | AAL | 33.476742 | 35.509998 | 35.650002 | 34.970001 | 35.200001 | 8994200.0 |
---|
2 | 2014-04-29 | AAPL | 18.633333 | 21.154642 | 21.285000 | 21.053928 | 21.205000 | 337377600.0 |
---|
3 | 2014-04-29 | ABBV | 34.034985 | 51.369999 | 51.529999 | 50.759998 | 50.939999 | 5601300.0 |
---|
4 | 2014-04-29 | ABT | 31.831518 | 38.540001 | 38.720001 | 38.259998 | 38.369999 | 4415600.0 |
---|