Methodology
Stockadora runs a structured daily pipeline that crawls SEC EDGAR, extracts fields from filings, generates machine-readable summaries, and publishes the output as a static website. The system is fully automated. There is no systematic manual editorial review prior to publication.
This architecture prioritizes transparency, reproducibility, and performance. By publishing static pages generated from structured SEC data, the system avoids real-time interpretation, market predictions, or editorial opinion. The source of truth is always the SEC filing itself — the summaries are a navigational layer on top of it.
Pipeline Overview
Daily at 8:42 UTC — structured, fully automated
Crawl
EDGAR polled daily at ~8:42 UTC via GitHub Actions
Extract
Structured fields pulled from raw filings (form type, CIK, dates, text blocks)
Summarize
Gemini 2.5 Flash generates structured summaries from filing text
Store
JSON written to AWS S3 with structured, predictable file paths
Publish
Astro static build pulls from S3, compiles to pure HTML — no server runtime
Technical Details
For technical readers, journalists, and skeptics.
Data Ingestion
Parsing & Field Extraction
AI Summarization
Model: Gemini 2.5 Flash (Google). Each filing goes through a multi-round process: the model iteratively builds a narrative summary from the filing text, then a final pass extracts structured fields into predefined output schemas. Fields vary by form type but include: key highlights, business description, risk factors, financial overview, and SEO metadata.
The model is instructed to extract and structure information from the filing rather than generate opinionated commentary. However, complex or unusually formatted filings may result in incomplete or imperfect summaries — the model operates on the text as presented, without access to outside context.
Storage Format
<data-type>/companies/<YYYY-MM-DD>/<cik>-<slug>.json
plus index files for listings (latest, by-date, by-week). Paths are derived from
filing metadata (CIK, filing date, company slug) and are stable across reruns.
Website Build
aws s3 sync. All data
baked into static HTML. Zero server-side runtime. Pages delivered from CDN. No
JavaScript required to read content.
Automation Boundary
Limitations & Caveats
- — The summarization model may misinterpret or omit information from complex filings, particularly those with non-standard formatting or dense technical language.
- — Amended filings (e.g., S-1/A, 10-K/A) may not be reflected immediately. The pipeline processes new filings as they appear; amendments follow the same crawl schedule.
- — Coverage is not exhaustive. Not all SEC filers are processed. The pipeline is scoped to specific form types and volume thresholds.
- — Summaries reflect the filing content at time of processing. Subsequent company developments are not incorporated retroactively.