All projects
ESG / Climate Earth Security

Climate Research Engine

Analysts spent weeks manually matching companies to carbon offset projects. This tool does it in seconds — vector search over sustainability disclosures, 60+ data points per company.

LangChain FAISS OpenAI Streamlit PostgreSQL Pydantic Docker AWS ECS Terraform
Aerial view of mangrove forest and river with boats

Challenge

Earth Security helps organisations navigate the carbon credit market. Their research team identifies which companies are most likely to invest in specific types of carbon projects — from mangrove restoration to clean energy. The intelligence they need is buried across sustainability reports, annual filings, and regulatory disclosures. Each research cycle meant weeks of manual document review.

The brief was tight: build an in-house analysis tool that helps their analysts identify leads and opportunities — discovery to working MVP, under £10k.

Approach

I worked with the analysts through a discovery phase to define the data points, data sources, and queries that would actually move the needle. 60+ fields per company — climate commitments, carbon credit history, biodiversity targets, financial instruments, SBTi status. Each extraction prompt was specified with validation criteria so the LLM pipeline could reliably pull structured data from unstructured reports.

01Ingest
Document PipelineReports, filings & disclosures
Source DocumentsAnnual sustainability reports, ESG filings, CDP disclosures, net-zero strategy PDFs
Chunking & EmbeddingDocuments split into semantic sections and embedded with OpenAI for vector search
Vector IndexingIndexed into FAISS for sub-second similarity search across the full corpus
02Enrich
Multi-Source Enrichment7 sources · 60+ data points
Company FirmographicsIndustry, HQ, employee count, revenue band, and key contacts via Apollo & HubSpot
Corporate StructureLegal entity mapping and parent-subsidiary hierarchy from GLEIF registry
Climate IntelligenceLLM-extracted: emission targets, SBTi status, credit history, green bonds, nature commitments
03Match
Prospect MatchingScored & ranked by fit
Vector SimilarityMatches company climate profiles against project characteristics using semantic search
Evidence CitationsEvery match links back to the exact passage in the source document that supports it
Analyst-Ready OutputRanked company profiles with match scores, ready for review and outreach decisions

We started with their real research questions, not a product roadmap. The data points, sources, and scoring logic were defined with the analysts who’d use the tool every day.

Built from 7 sources, and growing.

Firmographics, corporate hierarchy, climate commitments, carbon credit history, and contact data — stitched together automatically so analysts never start from a blank spreadsheet.

ApolloFirmographics, industry classification, employee count, contactsCompany data
HubSpotCRM records, engagement history, deal stage and pipeline statusCRM
GLEIFLegal entity identifiers, corporate hierarchy, parent-subsidiary mappingLegal entities
Verra RegistryVCS carbon project listings, verification status, credit issuance volumesCarbon registry
Ocean 100Top 100 ocean economy companies, marine sustainability commitmentsMarine data
LLM ExtractionEmission targets, SBTi status, carbon credit usage, green bond historyAI-extracted

Sixty data points sounds like a lot. It’s the minimum. Every field maps to a question their analysts were already asking manually — we just made the answers show up before they had to go looking.

The core of the tool is a vector store built on FAISS, indexing sustainability disclosures with OpenAI embeddings. LangChain orchestrates the RAG pipeline — from document ingestion through to the final matching output. Each company is scored against project types with evidence pulled directly from source documents.

Companies
BP plcEnergy
ShellEnergy
MaerskShipping
NestléConsumer
Carbon Projects
REDD+
Blue Carbon
Clean Cookstoves
Renewable Energy

Analysts query the vector store in natural language through a Streamlit interface. No SQL, no filters — just ask the question. The system returns ranked matches with evidence pulled directly from source documents.

Climate Intelligence — Natural Language Search
Example queries
FTSE 100 companies investing in blue carbon projects23 results
Shipping companies with SBTi-approved reduction targets14 results
Companies that purchased REDD+ credits in the last 2 years31 results
European consumer brands with mangrove restoration investment8 results

The best tool is the one your team actually uses. Natural language search meant analysts didn’t need to learn SQL or wrestle with filters — they just asked the question the same way they’d ask a colleague.

Company profile snapshot
Company Profile — BP plc
BP plcEnergy · London, UK
70%
Match
score
Oil & GasSector
$164BRevenue
67,600Employees
LSE: BPListed
Science Based TargetsNature DisclosureActive Credit Buyer
Project Fit
82%
REDD+Avoided deforestation
”$50M+ committed to nature-based solutions”
71%
Blue CarbonCoastal & marine
”Active mangrove restoration investment”
45%
Clean CookstovesHousehold energy
”Scope 3 offset strategy includes household”
Key Evidence
50% emission reduction target by 2030
$4.75B in green bonds issued
Active VCS credit buyer — Forest Protection, Mangrove, Cookstove

Result

Discovery to MVP in 2 months, under £10k. What used to take weeks of manual document review now takes seconds.

0Research speedup
0Discovery to MVP
0Data points per company

The tool became a core part of the research workflow — replacing manual document review with structured, enriched company intelligence that surfaces the right prospects in seconds.