This is a personal note that I decided to share. It reflects my understanding of a subject and may contain errors and approximations. Feel free to contribute by contacting me here. Any help will be credited!

Understanding Bedrock Model Resources

In Bedrock, we can invoke three kinds of model resources:

Foundation Model (managed by AWS): Targets a model deployed by Bedrock in a specific region.
Inference Profile (managed by AWS): Targets a model; requests are routed to multiple regions. It provides better throughput and performance.
Application Inference Profile (managed by developers): Targets a model either via Inference Profile or Foundation Model directly. Application Inference Profiles are wrappers used for governance. We can tag them to track costs and usage via CloudWatch and Cost Explorer.

Diagram showing the relationships between Amazon Bedrock model resources.

Direct Model Invocation Bypasses Cost Tracking

From a policy perspective, if you want to call an Application Inference Profile with InvokeModel or InvokeModelWithResponseStream, you need to allow actions on the Application Inference Profile, as well as on the underlying Inference Profile and Foundation Model. This is because Bedrock evaluates permissions against the final resolved model ARN, not only the entry-point ARN.

However, some clients that use Bedrock may resolve model ARNs directly, bypassing higher-level abstractions like Application Inference Profiles. So they call directly an Inference Profile or a Foundation Model. This is what I experienced with Claude Code, particularly with the small Haiku model. By bypassing the Application Inference Profile, it also bypasses the tracking. Which can lead to uncontrolled costs and usage.

Restricting Bedrock Usage to Application Inference Profiles

To address this issue, we should configure the policy to restrict calls to the Inference Profile and Foundation Models, so they can only be invoked via the Application Inference Profile.

Example of Policy for Claude Code with two Application Inference Profiles :

one for Sonnet
one for Haiku.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            // Allow resolved invocation only when coming from an Application Inference Profile
	        "Sid": "AllowInvokeViaApplicationInferenceProfileOnly",
	        "Effect": "Allow",
	        "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:<AWS_REGION>:<AWS_ACCOUNT_NUMBER>:inference-profile/eu.anthropic.claude-sonnet-4-5-20250929-v1:0",
                "arn:aws:bedrock:<AWS_REGION>:<AWS_ACCOUNT_NUMBER>:inference-profile/eu.anthropic.claude-haiku-4-5-20251001-v1:0",
                "arn:aws:bedrock:<AWS_REGION>::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0",
                "arn:aws:bedrock:<AWS_REGION>::foundation-model/anthropic.claude-haiku-4-5-20251001-v1:0",
            ],
            "Condition": {
                "StringEquals": {
                    "bedrock:InferenceProfileArn": [
                        "arn:aws:bedrock:<AWS_REGION>:<AWS_ACCOUNT_NUMBER>:application-inference-profile/<MODEL_ID>",
                          "arn:aws:bedrock:<AWS_REGION>:<AWS_ACCOUNT_NUMBER>:application-inference-profile/<MODEL_ID>"
                    ]
                }
            }
        },
        {
            // Allow direct invocation of the Application Inference Profiles
            "Sid": "AllowInvokeApplicationInferenceProfileOnly",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:<AWS_REGION>:<AWS_ACCOUNT_NUMBER>:application-inference-profile/<MODEL_ID>",
                "arn:aws:bedrock:<AWS_REGION>:<AWS_ACCOUNT_NUMBER>:application-inference-profile/<MODEL_ID>"
            ]
        }
    ]
}

⚠

In IAM Policy, bedrock:InferenceProfileArn takes also Application Inference Profile ARN. bedrock:ApplicationInferenceProfileArn does not exist at the time.

Moreover, we can Deny all Inference Profile and Foundation Models invocations when the source Inference Profile arn is null, ensuring that clients cannot bypass the Application Inference Profiles.

{
    "Version": "2012-10-17",
    "Statement": [
        {
	        // Guardrail against clients bypassing application inference profiles
            "Sid": "DenyDirectModelInvoke",
            "Effect": "Deny",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:*::foundation-model/*",
                "arn:aws:bedrock:*:<AWS_ACCOUNT_NUMBER>:inference-profile/*"
            ],
            "Condition": {
                "Null": {
                    "bedrock:InferenceProfileArn": "true"
                }
            },
        }
    ]
}

Diagram showing Model Invocation & IAM Permissions.

A Note on Claude Code, Amazon Bedrock, and Cost Tracking

Understanding Bedrock Model Resources

Direct Model Invocation Bypasses Cost Tracking

Restricting Bedrock Usage to Application Inference Profiles

References

Understanding Bedrock Model Resources#

Direct Model Invocation Bypasses Cost Tracking#

Restricting Bedrock Usage to Application Inference Profiles#

References#

Understanding Bedrock Model Resources

Direct Model Invocation Bypasses Cost Tracking

Restricting Bedrock Usage to Application Inference Profiles

References