LLM Model Evaluator

A side-by-side comparison tool that sends a prompt to two LLMs simultaneously and displays both responses in real time. Built to explore how different models handle the same input — useful for evaluating tone, accuracy, verbosity, and reasoning style.

Stack: Python · Streamlit · Groq API

Features:

Enter any prompt once and send it to two models at the same time
Responses displayed side-by-side for easy comparison
Powered by Groq’s fast inference API

Jenny Faulkner

Explorer

LLM Model Evaluator

LLM Model Evaluator