Model Sets

Background

If this is your first time here, we recommend reading our Mission Statement before starting with Model Sets. You will find Model Sets immensely more useful—and easier to configure—if you can answer the question, "What do you care about?"

For additional background, we also recommend reading the Model Sets Overview.

Create a Model Set

A Model Set can be created using one of two methods:

Clone an existing Model Set.
Create a new Model Set from scratch.

Only API keys owned by your account can use your Model Sets in the Influxion gateway. Their configurations and behaviors are public in the web interface, where both you and other users can clone them.

Clone an existing Model Set

It's often easier to start from an existing Model Set that performs similar tasks and has similar requirements as your use case. You can then tune the configuration from a known starting point.

Navigate to Model Registry → Model Sets in the sidebar.
Search for existing Model Sets based on names, categories, or tags.
If you find one you're interested in, click on its card to view its details.
Click the Clone button in the upper right.
Adjust the behavior settings, evals integrations, and model selections as desired. Use the Behavior Projections to iterate on behavior settings and model selections.
Fill out the details at the end.
Click Clone.

Create a new Model Set

Navigate to Model Registry → Model Sets in the sidebar.
Click New.
Start specifying the behaviors you desire.
Select models from the available providers. Use the Behavior Projections to iterate on behavior settings and model selections.
Fill out the details at the end.
Click Deploy.

Feasibility

When creating or editing a Model Set, Influxion attempts to validate your requirements against recent Model and Model Set behaviors. There are several possible results:

If the interface says, "Achievable: Yes", then it's likely that the Model Set configuration will be able to satisfy your requirements.

If the interface says, "Achievable: Partial", then you may be close, but some dimensions may still need tuning. You can still deploy the Model Set, and Influxion will make a best effort to satisfy your requirements.

If the interface says, "Achievable: No", then either your requirements are unrealistic or you should look for alternative model deployments to use. You can still deploy the Model Set, but its behavior is unlikely to be satisfactory.

If there is insufficient historical data to determine feasibility, Influxion will sample models for a Model Set for a short period after you deploy it to try and collect the required metrics. Monitor the Model Set's behavior afterward and edit its configuration as needed.

Evals

Influxion provides LLM Evals integration with DeepEval. Evals dimensions can be used as part of your behavioral requirements, or added simply for observability as an additional usage metric.

Evals require additional LLM usage, so are charged in addition to your Model Set gateway requests using the same pricing structure.

LLM Evals currently execute using the openai/gpt-4o-mini model. When creating or editing a Model Set, you can specify the sampling probability, i.e., the likelihood of evaluating each individual gateway request. This value defaults to 0.1, meaning roughly 10% of requests in the Model Set will be evaluated.

Deployed Model Sets

Once deployed, use the Model Set Slug as the model parameter in your API requests. This is just like a plain model, except that the slug begins with a @ symbol.

curl -X POST https://api.influxion.io/v1/chat/completions \
  -H "Authorization: Bearer $INFLUXION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "@your-username/your-deployment-name",
    "messages": [
      {
        "role": "user",
        "content": "Hello, world!"
      }
    ]
  }'

import os
import requests

response = requests.post(
    "https://api.influxion.io/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['INFLUXION_API_KEY']}",
        "Content-Type": "application/json"
    },
    json={
        "model": "@your-username/your-deployment-name",
        "messages": [
            {
                "role": "user",
                "content": "Hello, world!"
            }
        ]
    }
)

print(response.json())

const response = await fetch(
  "https://api.influxion.io/v1/chat/completions",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.INFLUXION_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "@your-username/your-deployment-name",
      messages: [
        {
          role: "user",
          content: "Hello, world!",
        },
      ],
    }),
  }
);

const data = await response.json();
console.log(data);

Monitor a Model Set

As you use a Model Set in your application, behavior metrics are measured and summarized on the Model Set page. Expand individual Performance, Cost, Usage, and Error metrics to view a time series of each behavior dimension.

By default, the behavior for the entire Model Set is shown. Using the dropdown on each figure, you can show the behaviors of the individual models in the set, too.

You can also switch between average behaviors (the default), and p50, p90, and p99 values.

Edit a Model Set

Model Sets are reconfigurable. Models come and go and requirements change with time. After all, that's why we're here.

From a Model Set's page, click the edit icon next to the Clone button in the upper right corner. Modify the settings as needed—everything except the name and slug are editable.

Delete a Model Set

COMING SOON!

Caveats

The Chat Completions API is publicly known and used by different model providers. However, providers may respond in their own "dialect" or with other quirks. Influxion currently returns their responses unmodified. We recommend making your application robust to these variations, especially when using Model Sets that route across multiple providers.

Influxion does not currently route based on features in individual requests. We recommend that you choose models that support any capabilities your application needs, like tool calling or different modalities.

Model Set constraints are satisfied as average behaviors. Individual request behavior is generally too noisy to control without concrete SLAs from downstream providers. That typically requires expensive provisioning of reserved GPUs.

Model Sets

On this page