Lin Qiao, CEO & Co-Founding father of Fireworks AI – Interview Series

Lin Qiao, was formerly head of Meta’s PyTorch and is the Co-Founder and CEO of Fireworks AI. Fireworks AI is a production AI platform that’s built for developers, Fireworks partners with the world’s leading generative AI researchers to serve the very best models, on the fastest speeds. Fireworks AI recently raised a $25M Series A.

What initially attracted you to computer science?

My dad was a really senior mechanical engineer at a shipyard, where he built cargo ships from scratch. From a young age, I learned to read the precise angles and measurements of ship blueprints, and I loved it.

I used to be very much into STEM from middle school onward– all the pieces math, physics and chemistry I devoured. One in every of my highschool assignments was to learn BASIC programming, and I coded a game a couple of snake eating its tail. After that, I knew computer science was in my future.

While at Meta you led 300+ world-class engineers in AI frameworks & platforms where you built and deployed Caffe2, and later PyTorch. What were a few of your key takeaways from this experience?

Big Tech firms like Meta are at all times five or more years ahead of the curve. After I joined Meta in 2015, we were initially of our AI journey– making the shift from CPUs to GPUs. We needed to design AI infrastructure from the bottom up. Models like Caffe2 were groundbreaking after they were created, but AI evolved so fast that they quickly grew outdated. We developed PyTorch and all the system around it as an answer.

PyTorch is where I learned in regards to the biggest roadblocks developers face within the race to construct AI. The primary challenge is finding stable and reliable model architecture that’s low latency and versatile in order that models can scale. The second challenge is total cost of ownership, so firms don’t go bankrupt attempting to grow their models.

My time at Meta showed me how vital it’s to maintain models and frameworks like PyTorch open-source. It encourages innovation. We’d not have grown as much as we had at PyTorch without open-source opportunities for iteration. Plus, it’s inconceivable to not sleep to this point on all the most recent research without collaboration.

Are you able to discuss what led you to launching Fireworks AI?

I’ve been within the tech industry for greater than 20 years, and I’ve seen wave after wave of industry-level shifts– from the cloud to mobile apps. But this AI shift is a whole tectonic realignment. I saw plenty of firms combating this modification. Everyone desired to move fast and put AI first, but they lacked the infrastructure, resources and talent to make it occur. The more I talked to those firms, the more I spotted I could solve this gap out there.

I launched Fireworks AI each to resolve this problem and function an extension of the incredible work we achieved at PyTorch. It even inspired our name! PyTorch is the torch holding the fireplace– but we would like that fireside to spread in every single place. Hence: Fireworks.

I even have at all times been keen about democratizing technology, and making it inexpensive and straightforward for developers to innovate no matter their resources. That’s why we’ve such a user-friendly interface and robust support systems to empower builders to bring their visions to life.

Could you discuss what’s developer centric AI and why that is so vital?

It’s easy: “developer-centric” means prioritizing the needs of AI developers. For instance: creating tools, communities and processes that make developers more efficient and autonomous.

Developer-centric AI platforms like Fireworks should integrate into existing workflows and tech stacks. They need to make it easy for developers to experiment, make mistakes and improve their work. They need to encourage feedback, because its developers themselves who understand what they must be successful. Lastly, it’s about greater than just being a platform. It’s about being a community – one where collaborating developers can push the boundaries of what’s possible with AI.

The GenAI Platform you’ve developed is a big advancement for developers working with large language models (LLMs). Are you able to elaborate on the unique features and advantages of your platform, especially compared to existing solutions?

Our entire approach as an AI production platform is exclusive, but a few of our greatest features are:

Efficient inference –We engineered Fireworks AI for efficiency and speed. Developers using our platform can run their LLM applications at the bottom possible latencyandcost. We achieve this with the most recent model and repair optimization techniques including prompt caching, adaptable sharding, quantization, continuous batching, FireAttention, and more.

Inexpensive support for LoRA-tuned models –We provide inexpensive service of low-rank adaptation (LoRA) fine-tuned models via multi-tenancy on base models. This implies developers can experiment with many various use cases or variations on the identical model without breaking the bank.

Easy interfaces and APIs –Our interfaces and APIs are straightforward and straightforward for developers to integrate into their applications. Our APIs are also OpenAI compatible for ease of migration.

Off-the-shelf modelsandfine-tuned models–We offer greater than 100 pre-trained models that developers can use out-of-the-box. We cover the very best LLMs, image generation models, embedding models, etc. But developers may also decide to host and serve their very own custom models. We also offer self-serve fine-tuning services to assist developers tailor these custom models with their proprietary data.

Community collaboration:We imagine within the open-source ethos of community collaboration. Our platform encourages (but doesn’t require) developers to share their fine-tuned models and contribute to a growing bank of AI assets and knowledge. Everyone advantages from growing our collective expertise.

Could you discuss the hybrid approach that is obtainable between model parallelism and data parallelism?

Parallelizing machine learning models improves the efficiency and speed of model training and helps developers handle larger models that a single GPU can’t process.

Model parallelism involves dividing a model into multiple parts and training each part on separate processors. Then again, data parallelism divides datasets into subsets and trains a model on each subset at the identical time across separate processors. A hybrid approach combines these two methods. Models are divided into separate parts, that are each trained on different subsets of information, improving efficiency, scalability and suppleness.

Fireworks AI is utilized by over 20,000 developers and is currently serving over 60 billion tokens each day. What challenges have you ever faced in scaling your operations to this level, and the way have you ever overcome them?

I’ll be honest, there have been many high mountains to cross since we founded Fireworks AI in 2022.

Our customers first got here to us in search of very low latency support because they’re constructing applications for either consumers, prosumers or other developers— all audiences that need speedy solutions. Then, when our customers’ applications began to scale fast, they realized they couldn’t afford the standard costs related to that scale. They then asked us to assist with lowering total cost of ownership (TCO), which we did. Then, our customers desired to migrate from OpenAI to OSS models, and so they asked us to offer on-par and even higher quality than OpenAI. We made that occur too.

Each step in our product’s evolution was a difficult problem to tackle, but it surely meant our customers’ needs truly shaped Fireworks into what it’s today: a lightning fast inference engine with low TCO. Plus, we offer each an assortment of high-quality, out-of-the-box models to pick from, or fine-tuning services for developers’ to create their very own.

With the rapid advancements in AI and machine learning, ethical considerations are more vital than ever. How does Fireworks AI address concerns related to bias, privacy, and ethical use of AI?

I even have two teenage daughters who use genAI apps like ChatGPT often. As a mom, I worry about them finding misleading or inappropriate content, since the industry is just starting to tackle the critical problem of content safety. Meta is doing so much with the Purple Llama project, and Stability AI’s recent SD3 modes are great. Each firms are working hard to bring safety to their recent Llama3 and SD3 models with multiple layers of filters. The input-output safeguard model, Llama Guard, does get a very good amount of usage on our platform, but its adoption just isn’t on par with other LLMs yet. The industry as an entire still has a protracted method to go to bring content safety and AI ethics to the forefront.

We at Fireworks care deeply about privacy and security. We’re HIPAA and SOC2 compliant, and offer secure VPC and VPN connectivity. Firms trust Fireworks with their proprietary data and models to construct their business moat.

What’s your vision for a way AI will evolve?

Just as AlphaGo demonstrated autonomy while learning to play chess by itself, I feel we’ll see genAI applications get an increasing number of autonomous. Apps will routinely route and direct requests to the fitting agent or API to process, and course-correct until they retrieve the fitting output. And as an alternative of 1 function-calling model polling from others as a controller, we’ll see more self-organized, self-coordinated agents working in unison to resolve problems.

Fireworks’ lightning-fast inference, function-calling models and fine-tuning service have paved the way in which for this reality. Now it’s as much as modern developers to make it occur.

Lin Qiao, CEO & Co-Founding father of Fireworks AI – Interview Series – AI Guido (2024)