Generative AI’s path to production echoes traditional AI, but with new twists

Subbiah Sethuraman
4 min readMay 27, 2024

--

AI proof of concept? Nailed it. Production rollout? Take a breath. While gen AI’s initial promise has ignited excitement across industries, transitioning PoCs into real-world applications presents a new set of hurdles demanding thoughtful leadership.

Gartner’s been showcasing the massive leap in AI adoption: over 80% of enterprises are projected to use APIs or generative AI in production by 2026, compared to a mere 5% in 2023. This is a phenomenal shift considering the historical challenge of moving AI to production where 8 in 10 AI solutions have traditionally failed in the real world.

And while generative AI offers immense potential combined with classical AI, navigating the journey from PoC to production requires careful consideration of its unique challenges.

The stakes are higher for scaling generative AI solutions

In contrast to classical AI, generative AI models are now often being set up to directly interact with end-consumers, raising the stakes for robust performance and responsible implementation. Consider the case of a Chevrolet chatbot that was tricked into selling a car for $1 — a stark reminder of the potential pitfalls.

This shift in focus — from model-centric to product-centric — is exposing new twists and leaders must watch out for these common pitfalls:

  1. A lack of domain knowledge for Large Language Model (LLM) domain translation. For example, in the life sciences industry, patient line of therapy definitions can vary based on the therapy and organizations involved. Similarly, customer segmentation definitions can vary by industry, business needs and targeting points, and more. To address this translation, gen AI solutions must use either in-context learning or domain fine-tunning. However, both these approaches come with their own set of challenges. The effectiveness of in-context learning-based translation is highly sensitive to input examples, while domain-specific fine-tunning can lead to model over-specialization, reducing the zero-shot capabilities of the model. Consequently, both approaches may result in scaling issues during deployment if consumer input is not controlled.
  2. Return-on-investment roadblocks. With conventional AI, organizations report minimal to no return on investment from 70% of applications, hindering wider adoption. Today’s era of fast gen AI solutions is largely made up of chatbots and co-pilots. some 75% of solutions. Most are aimed at areas such as call centers, sales reps, customers and email generators.

    However, many of these solutions prioritize speed over value, focusing on tasks where faster completion doesn’t always translate to higher impact. Organizations might miss the bigger picture by solely focusing on immediate gains.

    Consider this: a patent authoring co-pilot can significantly reduce patent filing time allowing an organization to file more patents efficiently, which may hold higher value.
  3. LLMs can come up with many different answers. This helps them find better solutions in some ways, but it can also lead to poor user experiences, undermining trust. The stochastic nature of LLMs contributes to this experience on several dimensions including Repeatability, Reproducibility, Adaptability and Latency
  4. Gen AI applications in production introduce a range of challenges in evaluation and troubleshooting issues. One key dimension involves monitoring and logging key performance parameters and errors. Many gen AI applications heavily rely on third-party service APIs for LLM calls, which can introduce potential fault points beyond an organization’s direct control. That’s why it’s vital to track application parameters, such as response time, service downtime, rate limits and error logs generated by these APIs. This monitoring will help teams proactively manage potential issues and ensure smoother operations for their gen AI-based applications.
  5. The cost of operating LLMs must be managed carefully. For example, depending on the model chosen, the cost of processing the same 1 million input tokens can vary significantly, ranging from $2 to $60. Therefore, monitoring cost and usage patterns within an application is essential. Implementing hard limits on user-level tokens help manage usage anomalies that may arise while the application is live. Other cost elements in operations include the infrastructure and personnel cost necessary to maintain the application operation and functionality. Proving good cost management practices during scaling can contribute significantly to overall sustainability and profitability in the eyes of business decisionmakers.
  6. Without the right guardrails in place, gen AI may be vulnerable to misuse or adversarial attacks, posing significant risks to both organizations and their customers. While the promise of generative AI is undeniable, successfully scaling these solutions requires a thoughtful and strategic approach to risk. The EU’s AI Act provides detailed guidelines on what constitutes safe and responsible AI to help shape your approach.

In conclusion, the journey of generative AI to production mirrors many aspects of traditional AI deployment, yet it introduces unique challenges and opportunities. As organizations integrate generative AI into their workflows, they must navigate complexities such as model transparency, ethical considerations, and the need for robust data governance. The potential for generative AI to revolutionize industries is immense, but it requires careful planning, continuous learning, and adaptive strategies to harness its full capabilities responsibly.

You can see the full version of this article here : Navigating Gen AI: From proof of concept to production | ZS

--

--