Methodology

Stockadora runs a structured daily pipeline that crawls SEC EDGAR, extracts fields from filings, generates machine-readable summaries, and publishes the output as a static website. The system is fully automated. There is no systematic manual editorial review prior to publication.

This architecture prioritizes transparency, reproducibility, and performance. By publishing static pages generated from structured SEC data, the system avoids real-time interpretation, market predictions, or editorial opinion. The source of truth is always the SEC filing itself — the summaries are a navigational layer on top of it.

Pipeline Overview

SEC EDGAR
primary source
daily crawl
Kotlin Backend
parse + extract
filing text
Google Gemini
AI summarize
structured JSON
AWS S3
store
s3 sync
Astro Build
static HTML
CDN deliver
CDN / User
read

Daily at 8:42 UTC — structured, fully automated

1

Crawl

EDGAR polled daily at ~8:42 UTC via GitHub Actions

2

Extract

Structured fields pulled from raw filings (form type, CIK, dates, text blocks)

3

Summarize

Gemini 2.5 Flash generates structured summaries from filing text

4

Store

JSON written to AWS S3 with structured, predictable file paths

5

Publish

Astro static build pulls from S3, compiles to pure HTML — no server runtime

Technical Details

For technical readers, journalists, and skeptics.

Data Ingestion
Daily schedule triggered by GitHub Actions (cron: 8:42 UTC). Downloads EDGAR's daily master filing index, which lists every submission accepted by the SEC that day. The pipeline looks back 3 days to catch any delayed processing. Form types currently covered: S-1 and F-1 (IPOs), 10-K, 20-F, and 40-F (annual reports), 8-K (material events), Form 4 (insider trades). The pipeline skips weekends, matching SEC publication schedules.
Parsing & Field Extraction
Kotlin/JVM backend parses HTML and structured EDGAR metadata. Extracted fields include: company name, CIK (SEC entity identifier), form type, filing date, period of report, accession number, and relevant text sections from the filing body.
AI Summarization

Model: Gemini 2.5 Flash (Google). Each filing goes through a multi-round process: the model iteratively builds a narrative summary from the filing text, then a final pass extracts structured fields into predefined output schemas. Fields vary by form type but include: key highlights, business description, risk factors, financial overview, and SEO metadata.

The model is instructed to extract and structure information from the filing rather than generate opinionated commentary. However, complex or unusually formatted filings may result in incomplete or imperfect summaries — the model operates on the text as presented, without access to outside context.

Storage Format
JSON files stored on AWS S3. Path convention: <data-type>/companies/<YYYY-MM-DD>/<cik>-<slug>.json plus index files for listings (latest, by-date, by-week). Paths are derived from filing metadata (CIK, filing date, company slug) and are stable across reruns.
Website Build
Astro static site generator. Data pulled from S3 at build time via aws s3 sync. All data baked into static HTML. Zero server-side runtime. Pages delivered from CDN. No JavaScript required to read content.
Automation Boundary
Everything above is fully automated. There is no systematic human review of individual summaries prior to publication. Accuracy derives from the SEC filing itself being the authoritative primary source — not from editorial oversight.

Limitations & Caveats

  • The summarization model may misinterpret or omit information from complex filings, particularly those with non-standard formatting or dense technical language.
  • Amended filings (e.g., S-1/A, 10-K/A) may not be reflected immediately. The pipeline processes new filings as they appear; amendments follow the same crawl schedule.
  • Coverage is not exhaustive. Not all SEC filers are processed. The pipeline is scoped to specific form types and volume thresholds.
  • Summaries reflect the filing content at time of processing. Subsequent company developments are not incorporated retroactively.
About Stockadora Editorial Policy Contact