LLM Model Evaluator

A side-by-side comparison tool that sends a prompt to two LLMs simultaneously and displays both responses in real time. Built to explore how different models handle the same input — useful for evaluating tone, accuracy, verbosity, and reasoning style.

Stack: Python · Streamlit · Groq API

Features:

  • Enter any prompt once and send it to two models at the same time
  • Responses displayed side-by-side for easy comparison
  • Powered by Groq’s fast inference API