Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/02_concepts/13_exceptions.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
id: error-handling
title: Error handling
description: The exceptions an Actor can raise and how to handle them
---

import HandleCallErrorsSource from '!!raw-loader!roa-loader!./code/13_handle_call_errors.py';
import RetryTimedOutSource from '!!raw-loader!roa-loader!./code/13_retry_timed_out.py';
import ApiLink from '@theme/ApiLink';
import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

When you run an Actor, exceptions come from a few layers: the Apify API client for failed API requests, the Apify SDK for misuse and invalid input, and the libraries you build on, such as Crawlee. This page maps the ones you are most likely to meet and how to approach each.

## Errors from the Apify API

Every SDK operation that talks to the Apify API can raise <ApiLink to="class/ApifyApiError">`ApifyApiError`</ApiLink>. This includes <ApiLink to="class/Actor#start">`Actor.start`</ApiLink>, <ApiLink to="class/Actor#call">`Actor.call`</ApiLink>, <ApiLink to="class/Actor#abort">`Actor.abort`</ApiLink>, <ApiLink to="class/Actor#metamorph">`Actor.metamorph`</ApiLink>, <ApiLink to="class/Actor#add_webhook">`Actor.add_webhook`</ApiLink>, charging, and all storage operations on datasets, key-value stores, and request queues. The SDK raises these client exceptions as-is, so you keep the HTTP status code, the error type, and the response data on the exception.

<ApiLink to="class/ApifyApiError">`ApifyApiError`</ApiLink> dispatches to a subclass based on the HTTP status code:

- <ApiLink to="class/UnauthorizedError">`UnauthorizedError`</ApiLink> (401) and <ApiLink to="class/ForbiddenError">`ForbiddenError`</ApiLink> (403) for an unauthorized or forbidden request.
- <ApiLink to="class/NotFoundError">`NotFoundError`</ApiLink> (404) when the Actor, run, or storage does not exist.
- <ApiLink to="class/ConflictError">`ConflictError`</ApiLink> (409) for a conflicting request.
- <ApiLink to="class/RateLimitError">`RateLimitError`</ApiLink> (429) when the API rate limit is hit.
- <ApiLink to="class/ServerError">`ServerError`</ApiLink> for any 5xx response.
- <ApiLink to="class/InvalidRequestError">`InvalidRequestError`</ApiLink> (400) when the API rejects the request as malformed.

The client retries rate-limited and server errors on its own, so you only see <ApiLink to="class/RateLimitError">`RateLimitError`</ApiLink> or <ApiLink to="class/ServerError">`ServerError`</ApiLink> once those retries are exhausted. The `apify.errors` module re-exports the whole client error hierarchy, so you can import everything from one place:

```python
from apify.errors import ApifyApiError, NotFoundError, RateLimitError
```

Catch <ApiLink to="class/ApifyApiError">`ApifyApiError`</ApiLink> to handle any API failure in one place, then branch on the subclass or the HTTP `status_code`. To react to a specific failure, catch its subclass first:

<RunnableCodeBlock className="language-python" language="python">
{HandleCallErrorsSource}
</RunnableCodeBlock>

## Misuse and invalid input

The SDK raises standard Python exceptions when it is used incorrectly or given invalid input. These point to a bug or a bad argument in your code, so the fix is to correct the call rather than to catch the exception.

- [`RuntimeError`](https://docs.python.org/3/library/exceptions.html#RuntimeError) when an `Actor` method is used outside the `async with Actor:` block, either before initialization or after exit, or when the Actor is initialized twice.
- [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError) for an invalid argument, such as a malformed `timeout`, an invalid proxy configuration, charging an automatically charged event by hand, or pushing data that is not JSON-serializable or is over the size limit.
- [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) for an argument of the wrong type.
- [`ConnectionError`](https://docs.python.org/3/library/exceptions.html#ConnectionError) when <ApiLink to="class/Actor#create_proxy_configuration">`Actor.create_proxy_configuration`</ApiLink> verifies Apify Proxy access and the proxy reports that you have none.

## Run failures

<ApiLink to="class/Actor#call">`Actor.call`</ApiLink> and <ApiLink to="class/Actor#call_task">`Actor.call_task`</ApiLink> wait for the run to finish and return it, whatever its final status. A finished run can be `SUCCEEDED`, `FAILED`, `ABORTED`, or `TIMED-OUT`, so check `run.status` before you rely on the run's output. A timed-out run is the one case where retrying can help, as long as you give it more time:

<RunnableCodeBlock className="language-python" language="python">
{RetryTimedOutSource}
</RunnableCodeBlock>

## The pay-per-event charge limit

Reaching the pay-per-event charge limit does not raise an error. The SDK caps charging and data pushing instead, and your Actor keeps running. When a single <ApiLink to="class/Actor#charge">`Actor.charge`</ApiLink> call would cross the limit, only the part that fits within the budget is billed, and `charged_count` on the returned <ApiLink to="class/ChargeResult">`ChargeResult`</ApiLink> reports how many events went through. <ApiLink to="class/Actor#push_data">`Actor.push_data`</ApiLink> behaves the same way when given a `charged_event_name`, writing only the items that fit within the budget. To detect the limit, check the `event_charge_limit_reached` field on the `ChargeResult`. It is a return value rather than an exception, so you can read it in a tight charging loop and stop your work once the budget runs out. For details, see [Pay-per-event monetization](./pay-per-event).

## Errors while crawling

If your Actor runs a [Crawlee](https://crawlee.dev/python) crawler, failures inside request handlers surface as Crawlee exceptions, and Crawlee handles the retries and session rotation around them, so a single failing request does not stop the crawl. API calls you make from inside a handler still raise <ApiLink to="class/ApifyApiError">`ApifyApiError`</ApiLink>, so handle those as in the first section. For details, see the [Crawlee documentation](https://crawlee.dev/python).

## Conclusion

Most failures you handle at runtime are <ApiLink to="class/ApifyApiError">`ApifyApiError`</ApiLink> from the API client. Catch it to cover any API failure, and reach for a subclass or the HTTP `status_code` when you need finer control. The standard [`RuntimeError`](https://docs.python.org/3/library/exceptions.html#RuntimeError), [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError), and [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) signal a bug or bad input, so correct the call rather than catch them. After <ApiLink to="class/Actor#call">`Actor.call`</ApiLink>, check `run.status` to react to a failed run, and let Crawlee handle the errors raised inside a crawler.
29 changes: 29 additions & 0 deletions docs/02_concepts/code/13_handle_call_errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import asyncio

from apify import Actor
from apify.errors import ApifyApiError, NotFoundError


async def main() -> None:
async with Actor:
try:
run = await Actor.call('apify/web-scraper', run_input={'startUrls': []})
except NotFoundError:
# Catch a specific subclass first.
Actor.log.error('The Actor to call does not exist.')
return
except ApifyApiError as exc:
# Any other API failure, e.g. an invalid token or a server error.
Actor.log.error(f'Calling the Actor failed: {exc} (HTTP {exc.status_code}).')
return

# `Actor.call` returns the finished run whatever its status, so check it.
if run.status != 'SUCCEEDED':
Actor.log.error(f'Run {run.id} ended with status {run.status}.')
return

Actor.log.info(f'Run {run.id} finished successfully.')


if __name__ == '__main__':
asyncio.run(main())
24 changes: 24 additions & 0 deletions docs/02_concepts/code/13_retry_timed_out.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import asyncio
from datetime import timedelta

from apify import Actor


async def main() -> None:
async with Actor:
timeout = timedelta(minutes=5)
max_attempts = 3

for attempt in range(1, max_attempts + 1):
run = await Actor.call('apify/web-scraper', timeout=timeout)

if run.status != 'TIMED-OUT' or attempt == max_attempts:
Actor.log.info(f'Run {run.id} ended with status {run.status}.')
break

timeout *= 2
Actor.log.warning(f'Timed out, retrying with timeout {timeout}.')


if __name__ == '__main__':
asyncio.run(main())
1 change: 1 addition & 0 deletions src/apify/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ def is_running_in_ipython() -> bool:
'Actor',
'Charging',
'Configuration',
'Errors',
'Event data',
'Event managers',
'Events',
Expand Down
34 changes: 34 additions & 0 deletions src/apify/errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
"""`apify.errors` re-exports the Apify API client's error hierarchy.

Callers get a single import location for every error raised by an operation that talks to the Apify API. The SDK
raises these client exceptions as-is and does not wrap them in its own types. See
https://docs.apify.com/api/client/python for the full client error reference.
"""

from __future__ import annotations

from apify_client.errors import (
ApifyApiError,
ApifyClientError,
ConflictError,
ForbiddenError,
InvalidRequestError,
InvalidResponseBodyError,
NotFoundError,
RateLimitError,
ServerError,
UnauthorizedError,
)

__all__ = [
'ApifyApiError',
'ApifyClientError',
'ConflictError',
'ForbiddenError',
'InvalidRequestError',
'InvalidResponseBodyError',
'NotFoundError',
'RateLimitError',
'ServerError',
'UnauthorizedError',
]
24 changes: 24 additions & 0 deletions tests/unit/test_errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from __future__ import annotations

import apify_client.errors as client_errors

import apify.errors as sdk_errors


def test_client_errors_are_re_exported() -> None:
"""`apify.errors` re-exports the API client error hierarchy so callers have a single import location."""
names = [
'ApifyApiError',
'ApifyClientError',
'ConflictError',
'ForbiddenError',
'InvalidRequestError',
'InvalidResponseBodyError',
'NotFoundError',
'RateLimitError',
'ServerError',
'UnauthorizedError',
]
assert set(sdk_errors.__all__) == set(names)
for name in names:
assert getattr(sdk_errors, name) is getattr(client_errors, name)
42 changes: 42 additions & 0 deletions website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ const GROUP_ORDER = [
'Actor',
'Charging',
'Configuration',
'Errors',
'Event data',
'Event managers',
'Events',
Expand Down Expand Up @@ -149,6 +150,47 @@ module.exports = {
moduleShortcutsPath: join(__dirname, '/module_shortcuts.json'),
},
reexports: [
// Errors
{
url: 'https://docs.apify.com/api/client/python/reference/class/ApifyApiError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/ApifyClientError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/ConflictError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/ForbiddenError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/InvalidRequestError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/InvalidResponseBodyError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/NotFoundError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/RateLimitError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/ServerError',
group: 'Errors',
},
{
url: 'https://docs.apify.com/api/client/python/reference/class/UnauthorizedError',
group: 'Errors',
},
// Storages
{
url: 'https://crawlee.dev/python/api/class/Storage',
Expand Down