https://github.com/ask-fini/paramount Skip to content Navigation Menu Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + GitHub Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} ask-fini / paramount Public * Notifications You must be signed in to change notification settings * Fork 1 * Star 68 * Agent accuracy measurements for LLMs License View license 68 stars 1 fork Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 0 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights ask-fini/paramount This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Last commit Last Name Name message commit date Latest commit History 173 Commits paramount paramount .dockerignore .dockerignore .gitignore .gitignore Dockerfile.client Dockerfile.client Dockerfile.server Dockerfile.server LICENSE LICENSE MANIFEST.in MANIFEST.in Makefile Makefile README.md README.md example.py example.py paramount.toml.example paramount.toml.example setup.py setup.py usage.gif usage.gif View all files Repository files navigation * README * License paramount Paramount lets your expert agents evaluate AI chats, enabling: * quality assurance * ground truth capturing * automated regression testing Usage Example usage Getting Started 1. Install the package: pip install paramount 2. Decorate your AI function: @paramount.record() def my_ai_function(message_history, new_question): # Inputs # new_message = {'role': 'user', 'content': new_question} updated_history = message_history + [new_message] return updated_history # Outputs. 3. After my_ai_function(...) has run several times, launch the Paramount UI to evaluate results: paramount Your SMEs can now evaluate recordings and track accuracy improvements over time. Paramount runs completely offline in your private environment. Usage After installation, run python example.py for a minimal working example. Configuration In order to set up successfully, define which input and output parameters represent the chat list used in the LLM. This is done via the paramount.toml configuration file that you add in your project root dir. It will be autogenerated for you with defaults if it doesn't already exist on first run. [record] enabled = true function_url = "http://localhost:9000" # The url to your LLM API flask app, for replay [db] type = "csv" # postgres also available [db.postgres] connection_string = "" [api] endpoint = "http://localhost" # url and port for paramount UI/API port = 9001 split_by_id = false # In case you have several bots and want to split them by ID identifier_colname = "" [ui] # These are display elements for the UI # For the table display - define which columns should be shown meta_cols = ['recorded_at'] input_cols = ['args__message_history', 'args__new_question'] # Matches my_ai_function() example output_cols = ['1', '2'] # 1 and 2 are indexes for llm_answer and llm_references in example above # For the chat display - describe how your chat structure is set up. This example uses OpenAI format. chat_list = "output__1" # Matches output updated_history. Must be a list of dicts to display chat format chat_list_role_param = "role" # Key in list of dicts describing the role in the chat chat_list_content_param = "content" # Key in list of dicts describing the content It is also possible to describe references via config but is not shown here for simplicity. See paramount.toml.example for more info. For Developers The deeper configuration instructions about the client & server can be seen here. Docker By using Dockerfile.server, you can containerize and deploy the whole package (including the client). With Docker, you will need to mount the paramount.toml file dynamically into the container for it to work. docker build -t paramount-server -f Dockerfile.server . # or make docker-build-server docker run -dp 9001:9001 paramount-server # or make docker-run-server License This project is under GPL License for individuals. Companies with > 1000 invocations per month or >100 employees require a commercial license. About Agent accuracy measurements for LLMs Resources Readme License View license Activity Custom properties Stars 68 stars Watchers 1 watching Forks 1 fork Report repository Releases No releases published Packages 0 No packages published Contributors 2 * * Languages * TypeScript 63.2% * Python 32.8% * CSS 2.0% * JavaScript 1.2% * Other 0.8% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.