Create a prediction

POST

/v1/predictions

Start a new prediction for the model version and inputs you provide.

Example request body:

{
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  }
}

Example cURL request:

curl -s -X POST \
  -d '{"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "input": {"text": "Alice"}}' \
  -H "Authorization: Token <paste-your-token-here>" \
  -H 'Content-Type: application/json' \
  https://api.replicate.com/v1/predictions

The response will be the prediction object:

{
  "id": "gm3qorzdhgbfurvjtvhg6dckhu",
  "model": "replicate/hello-world",
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  },
  "logs": "",
  "error": null,
  "status": "starting",
  "created_at": "2023-09-08T16:19:34.765994657Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel",
    "get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu"
  }
}

As models can take several seconds or more to run, the output will not be available immediately. To get the final result of the prediction you should either provide a webhook HTTPS URL for us to call when the results are ready, or poll the get a prediction endpoint until it has finished.

Input and output (including any files) will be automatically deleted after an hour, so you must save a copy of any files in the output if you'd like to continue using them.

Output files are served by replicate.delivery and its subdomains. If you use an allow list of external domains for your assets, add replicate.delivery and *.replicate.delivery to it.

Request Example

Shell

JavaScript

Java

Swift

curl --location --request POST 'https://api.replicate.com/v1/predictions' \
--header 'Content-Type: application/json' \
--data-raw '{
    "input": {},
    "stream": true,
    "version": "string",
    "webhook": "string",
    "webhook_events_filter": [
        "start"
    ]
}'

Response Example

{}

Request

Authorization

Provide your bearer token in the

Authorization

header when making requests to protected resources.

Example:

Authorization: Bearer ********************

Body Params application/json

input

object

optional

The model's input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the "API" tab on the model you are running or get the model version and look at its openapi_schema property. For example, stability-ai/sdxl takes prompt as an input.

Files should be passed as HTTP URLs or data URLs.

Use an HTTP URL when:

you have a large file > 256kb
you want to be able to use the file multiple times
you want your prediction metadata to be associable with your input files

Use a data URL when:

you have a small file <= 256kb
you don't want to upload and host the file somewhere
you don't need to use the file again (Replicate will not store it)

stream

boolean

optional

Request a URL to receive streaming output using server-sent events (SSE).

If the requested model version supports streaming, the returned prediction will have a stream entry in its urls property with an HTTPS URL that you can use to construct an EventSource.

version

string

optional

The ID of the model version that you want to run.

webhook

string

optional

An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once.

webhook_events_filter

array[string]

optional

By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter in the prediction request:

start: immediately on prediction start

output: each time a prediction generates an output (note that predictions can generate multiple outputs)

logs: each time log output is generated by a prediction

completed: when the prediction reaches a terminal state (succeeded/canceled/failed)

For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:

{
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  },
  "webhook": "https://example.com/my-webhook",
  "webhook_events_filter": ["start", "completed"]
}

Requests for event types output and logs will be sent at most once every 500ms. If you request start and completed webhooks, then they'll always be sent regardless of throttling.

Allowed values:

startoutputlogscompleted

Examples

Responses

🟢200Success

application/json

Body

object {0}

Modified at 2023-12-22 03:33:22

List predictions

Get a prediction