Jannah Theme License is not validated, Go to the theme options page to validate the license, You need a single license for each domain name.
ETFs

ETF Screener Agent: Finding the Perfect ETF Match with FMP’s Holdings API | by Pranjal Saxena | Sep, 2025

Pulling Holdings with FMP

Before we build any agent, we need the data it will use. For ETFs, the most important data is holdings — the list of stocks an ETF owns, along with their weights. If you know what’s inside an ETF, you can decide whether it matches your investment goals.

FMP makes this easy with their ETF Holdings API. For example, if we want to see what’s inside the popular SPDR S&P 500 ETF (SPY), we can call:

https://financialmodelingprep.com/api/v3/etf-holder/SPY?apikey=YOUR_API_KEY

Here’s a trimmed version of the JSON you’d get back:

[
{
"asset": "AAPL",
"name": "Apple Inc",
"weightPercentage": 7.12,
"sharesNumber": 155432000,
"marketValue": 31890000000
},
{
"asset": "MSFT",
"name": "Microsoft Corp",
"weightPercentage": 6.45,
"sharesNumber": 142210000,
"marketValue": 28970000000
},
{
"asset": "AMZN",
"name": "Amazon.com Inc",
"weightPercentage": 3.05,
"sharesNumber": 125600000,
"marketValue": 13780000000
}
...
]

This response tells us which companies the ETF holds, how many shares, their market value, and the weight in the ETF. Exactly the kind of information we need to screen ETFs.

Now let’s fetch this data in Python:

import requests
import pandas as pd

API_KEY = "YOUR_API_KEY"
symbol = "SPY"

url = f"https://financialmodelingprep.com/api/v3/etf-holder/{symbol}?apikey={API_KEY}"
response = requests.get(url)
data = response.json()

# Convert to DataFrame for easy handling
df = pd.DataFrame(data)

# Show top 5 holdings
print(df[["asset", "name", "weightPercentage"]].head())

Output:

  asset          name        weightPercentage
0 AAPL Apple Inc 7.12
1 MSFT Microsoft Corp 6.45
2 AMZN Amazon.com Inc 3.05
3 NVDA NVIDIA Corp 2.95
4 META Meta Platforms Inc 2.45

Just like that, we can see SPY’s top holdings. This is the backbone of our screener: raw ETF holdings data we can query, compare, and filter.

Expanding ETF Search Scope

Screening just one ETF is useful, but the real power comes when we compare multiple ETFs side by side. Investors don’t want to know only what’s inside SPY — they also want to see how it stacks up against QQQ, VGT, or XLK. That’s where our ETF universe comes in.

FMP provides an ETF List API that gives us all available ETFs. From this universe, we can pick the ones we care about and fetch their holdings on demand. To avoid making repeated API calls, we’ll also add a small caching layer so once we’ve fetched an ETF’s holdings, we can reuse them.

Step 1 — Get the ETF list

Here’s how we can pull the list and keep only the basic details like symbol, name, and price:

import requests, pandas as pd

API = "https://financialmodelingprep.com/api/v3"
KEY = "YOUR_API_KEY"

def etf_list():
url = f"{API}/etf/list?apikey={KEY}"
data = requests.get(url).json()
df = pd.DataFrame(data)
return df[["symbol", "name", "price"]].dropna()

universe = etf_list()
print(universe.head(10))

This gives you a quick snapshot of available ETFs you can explore.

Step 2 — Fetch and cache holdings

Instead of calling the API every time, let’s add a basic cache using SQLite.

import os, json, sqlite3, time

CACHE_DB = "etf_cache.sqlite"

def init_cache():
con = sqlite3.connect(CACHE_DB)
con.execute("""
CREATE TABLE IF NOT EXISTS holdings_cache(
symbol TEXT PRIMARY KEY,
payload TEXT NOT NULL,
ts INTEGER NOT NULL
)""")
con.close()

def cache_get(symbol, max_age=24*3600):
con = sqlite3.connect(CACHE_DB)
row = con.execute("SELECT payload, ts FROM holdings_cache WHERE symbol=?", (symbol,)).fetchone()
con.close()
if not row: return None
payload, ts = row
if time.time() - ts > max_age:
return None
return json.loads(payload)

def cache_put(symbol, payload):
con = sqlite3.connect(CACHE_DB)
con.execute("REPLACE INTO holdings_cache(symbol, payload, ts) VALUES(?,?,?)",
(symbol, json.dumps(payload), int(time.time())))
con.commit(); con.close()

def etf_holdings(symbol):
cached = cache_get(symbol)
if cached: return pd.DataFrame(cached)
url = f"{API}/etf-holder/{symbol}?apikey={KEY}"
data = requests.get(url).json()
cache_put(symbol, data)
return pd.DataFrame(data)

init_cache()

Now we can fetch holdings once and reuse them later.

Step 3 — Compare ETFs: SPY vs QQQ

Let’s see what happens when we compare SPY with QQQ.

def compare_overlap(a="SPY", b="QQQ"):
df_a = etf_holdings(a)[["asset", "name", "weightPercentage"]].rename(
columns={"weightPercentage": f"{a}_weight"})
df_b = etf_holdings(b)[["asset", "name", "weightPercentage"]].rename(
columns={"weightPercentage": f"{b}_weight"})
merged = pd.merge(df_a, df_b, on=["asset","name"], how="inner")
merged["combined_weight"] = merged[f"{a}_weight"] + merged[f"{b}_weight"]
return merged.sort_values("combined_weight", ascending=False).head(10)

overlap = compare_overlap("SPY", "QQQ")
print(overlap)

The result is a table of overlapping holdings — Apple, Microsoft, NVIDIA, and so on — showing how much weight each ETF assigns to them.

This is the kind of analysis our agent will later use to filter and rank ETFs. By building a universe of ETFs and caching their holdings, we’ve laid the foundation for natural-language queries.

The Agent Core: Natural Language to Filters

Now that we have ETF holdings data ready, the next challenge is making it usable. Investors don’t want to write filters in Python; they want to ask questions in plain English, like:

“Which ETFs have more than 10% in NVIDIA?”
“Find ETFs where Apple and Microsoft are top holdings.”

To bridge this gap, we’ll use a lightweight agent powered by a free LLM from Groq. The role of this agent is simple:

  1. Take the user’s natural-language query.
  2. Convert it into a structured filter (a small JSON object).
  3. Use that filter on our ETF holdings universe to find matches.

Step 1 — Converting queries into filters

Let’s imagine the query:

Find ETFs with more than 15% exposure to NVIDIA and AMD

The agent should turn this into:

{
"tickers": ["NVDA", "AMD"],
"min_weight": 15
}

This JSON is much easier to work with in Python.

Step 2 — Using Groq for query parsing

We can use Groq’s free Llama models to parse queries. Here’s a minimal setup:

import requests, json

GROQ_API_KEY = "YOUR_GROQ_KEY"
GROQ_URL = "https://api.groq.com/openai/v1/chat/completions"

def query_to_filter(user_query):
payload = {
"model": "llama-3.1-8b-instruct",
"messages": [
{"role": "system", "content": "You are an assistant that converts ETF queries into JSON filters."},
{"role": "user", "content": user_query}
],
"temperature": 0,
"response_format": {"type": "json_object"}
}

headers = {"Authorization": f"Bearer {GROQ_API_KEY}"}
response = requests.post(GROQ_URL, headers=headers, json=payload)
return json.loads(response.json()["choices"][0]["message"]["content"])

# Example
user_query = "Find ETFs with more than 15% exposure to NVIDIA and AMD"
print(query_to_filter(user_query))

Output:

{"tickers": ["NVDA","AMD"], "min_weight": 15}

Once we have this filter, we can use it to check each ETF’s holdings:

def screen_etfs(filter_json, etfs=["SPY","QQQ","VGT"]):
results = []
for etf in etfs:
df = etf_holdings(etf)
df_match = df[df["asset"].isin(filter_json["tickers"])]
exposure = df_match["weightPercentage"].sum()
if exposure >= filter_json.get("min_weight", 0):
results.append({"ETF": etf, "Exposure": exposure})
return pd.DataFrame(results)

filter_json = {"tickers": ["NVDA","AMD"], "min_weight": 15}
print(screen_etfs(filter_json))

This will return ETFs that meet the condition.

By doing this, we’ve turned ETF data into an interactive screener. The user speaks in plain English, the agent translates it into structured filters, and Python applies those filters to the ETF universe.

Credit: Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button