Design A/B rollout analysis and drift detection for BERTScore on a production LLM app in agent-based workflows.
Design A/B rollout analysis and drift detection for BERTScore on a production LLM app in code assistant.
Design A/B rollout analysis and drift detection for BERTScore on a production LLM app in summarization feed.
Design A/B rollout analysis and drift detection for BERTScore on a production LLM app in search + answer over docs.
Design A/B rollout analysis and drift detection for ROUGE-L on a production LLM app in agent-based workflows.
Design A/B rollout analysis and drift detection for ROUGE-L on a production LLM app in code assistant.
Design A/B rollout analysis and drift detection for ROUGE-L on a production LLM app in customer support chat.
Design A/B rollout analysis and drift detection for ROUGE-L on a production LLM app in summarization feed.
Design A/B rollout analysis and drift detection for ROUGE-L on a production LLM app in search + answer over docs.
Design A/B rollout analysis and drift detection for instruction following score on a production LLM app in customer support chat.
Design A/B rollout analysis and drift detection for instruction following score on a production LLM app in summarization feed.
Design A/B rollout analysis and drift detection for instruction following score on a production LLM app in code assistant.