Design A/B rollout analysis and drift detection for pass@1 (code) on a production LLM app in summarization feed.
Design A/B rollout analysis and drift detection for groundedness on a production LLM app in agent-based workflows.
Design A/B rollout analysis and drift detection for groundedness on a production LLM app in search + answer over docs.
Design A/B rollout analysis and drift detection for groundedness on a production LLM app in customer support chat.
Design A/B rollout analysis and drift detection for groundedness on a production LLM app in code assistant.
Design A/B rollout analysis and drift detection for groundedness on a production LLM app in summarization feed.
Design A/B rollout analysis and drift detection for exact match on a production LLM app in search + answer over docs.
Design A/B rollout analysis and drift detection for exact match on a production LLM app in customer support chat.
Design A/B rollout analysis and drift detection for exact match on a production LLM app in agent-based workflows.
Design A/B rollout analysis and drift detection for exact match on a production LLM app in code assistant.
Design A/B rollout analysis and drift detection for exact match on a production LLM app in summarization feed.
Design A/B rollout analysis and drift detection for BERTScore on a production LLM app in customer support chat.