Looking for a highly skilled Data Engineer to join our team. You’ll be the first data hire, working closely with the backend team and reporting directly to the CEO/CTO. This is a fully on-site role in San Francisco, ideal for someone who wants to work closely with a lean, high-impact product team
Core Responsibilities:
- You will be part of a small team, with a large amount of ownership and autonomy for managing things directly.
- You will have the freedom to suggest and drive organization-wide initiatives.
- Build & Own ETL Pipelines: Aggregate and normalize data from public APIs and web-scraped sources; clean and structure property data for downstream use
- Design Scalable MySQL Schemas: Optimize relational models for query performance, indexing, and future scalability
- Automate Workflows: Set up robust orchestration using tools like Airflow, DBT, or similar — replacing ad-hoc scripting with maintainable pipelines
- Ensure Data Quality: Implement validation checks and monitoring across pipelines to ensure consistency, completeness, and reliability
- Work Cross-Functionally: Collaborate with backend developers to deliver structured data that feeds app features, analytics, and AI models
Requirements:
- Strong experience with TypeScript, Python in building production-grade data pipelines
- Expertise in MySQL schema & database design, performance tuning, and query optimization
- Comfortable working with semi-structured data from various sources (JSON, CSV, XML, APIs, etc.)
- Experience building web scrapers & scripts for aggregating unstructured data from web-based sources
- Familiarity with orchestration tools (e.g., Airflow, DBT) and data processing best practices
- Solid understanding of data engineering principles — lineage, auditing, versioning, and scalability
- Familiarity working with and integrating LLM APIs
- Familiarity with prop-tech/appraisals, real-estate, geospatial data (nice to have)