VLM Run: Hiring 25Q1

:vlm-run-logo: VLM Run: The Unified Gateway for Visual AI

VLM Run is an end-to-end platform for developers to fine-tune, specialize, and operationalize Vision Language Models (VLMs). We aim to make VLM Run the go-to platform for running VLMs with a unified structured output API that’s versatile, powerful and developer-friendly.

VLM Run is a unified API to run all modern visual AI tasks.

📦 Unified Vision API: Harness the power of VLMs for tasks like OCR, tagging, image captioning, visual recommendations, and search—all under one roof.
{…} ETL-Ready: Designed for visual ETL, our models extract JSON from diverse visual content—images, videos, presentations, and more—easily and accurately.
🎯 Hyper-Specialized: Fine-tune models for specific verticals or use cases within hours, ensuring enterprise-level outcomes with SLAs.
🛡️ Private: Deploy and operationalize visual AI securely in your private cloud, keeping sensitive data protected.

🧑‍💻 Why join us?

Our founding team is exceptionally technical — with decades of experience in computer vision, robotics, and production-scale ML infrastructure. They’ve built systems used in autonomous vehicles, production ML systems, and large-scale VLMs, and have led initiatives at top AI labs (Toyota Research, AWS AI Labs, MIT/CMU PhDs) and in infrastructure platforms (PyTorch Lightning). Backed by leading Silicon Valley and New York VCs, we combine rigorous academic foundations with real-world impact to build a platform-ready solution for enterprise visual intelligence.

Work with domain experts: From self-driving cars and AR/VR to robotics, we’ve tackled cutting-edge computer vision challenges in academia and industry.
Build with AI-native tools: We’re constantly re-imagining our developer-stack with tools like Devin, Cursor and Deep Research to 100x our developer leverage and ship faster.
Work on the next frontier for visual AI: We believe that VLMs will make the last decade of computer-vision methods obsolete. In 5 years, using ConvNets or OCR will be like using a fax machine, and more than 90% of computer-vision workloads will be powered by VLMs.

<aside> 📫

Send us your Github profile with links to popular repos/work to [email protected].

</aside>