Describe the question/issue
Fluent bit logs show error AccessDeniedException because it tries to create a log group that it is not allowed / is not configured:
time="2024-06-13T18:48:52Z" level=error msg="AccessDeniedException: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/fluentbit-task-role/xxxxxxx is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:us-east-1:xxxxxxxxxxxx:log-group:fluent-bit-cloudwatch:log-stream: because no identity-based policy allows the logs:CreateLogGroup action\n\tstatus code: 400, request id: xxxxxxxx"
However, our output plugin setting is:
[OUTPUT]
Name cloudwatch_logs
Match *
region ca-central-1
log_group_name testname
log_stream_name teststream
auto_create_group false
Retry_Limit no_limits
During fluent-bit startup we see following logs:
[2024/06/13 19:22:22] [ info] cloudwatch.0
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_stream = 'true'"
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_group = 'true'"
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter region = 'us-east-1'"
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter default_log_group_name = 'fluentbit-default'"
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter log_group_name = 'fluent-bit-cloudwatch'"
Our configuration only uses the newer cloudwatch_logs plugin. We do not specify or use the cloudwatch plugin.
It seems that the cloudwatch plugin is being used for some reason as well, even though it is not being specified by us. It is using some config that specifies us-east-1 region and fluent-bit-cloudwatch log group, as shown in the logs. This then causes the denied exception error.
In regards to our specified cloudwatch_logs plugin - we are seeing logs written to the specified log group / log stream correctly.
As shown above in the output config, we set the Retry_Limit to no_limits.
However, logs show:
[2024/06/13 19:31:07] [ warn] [engine] chunk '1-1718307049.471694794.flb' cannot be retried: task_id=0, input=syslog.1 > output=cloudwatch.0
[2024/06/13 19:31:07] [debug] [task] task_id=0 reached retry-attempts limit 1/1
Earlier startup logs show:
[2024/06/13 19:30:50] [debug] [output:cloudwatch_logs:cloudwatch_logs.1] task_id=0 assigned to thread #0
It's not completely clear to me whether the task_id=0 reached retry-attempts limit 1/1 is referencing cloudwatch_logs plugin. If so, then why is it not respecting our Retry_Limit no_limits setting? (We've also tried different settings, e.g. 5 instead of no_limits). Or is the task_id=0 reached retry-attempts limit 1/1 related to the previous error line that references cloudwatch.0, which means that it is also related to our mysterious cloudwatch plugin.
Configuration
ECS Config:
resource "aws_ecs_service" "fluentbit" {
name = "fluentbit"
task_definition = aws_ecs_task_definition.fluentbit.arn
cluster = aws_ecs_cluster.fluentbit.id
launch_type = "FARGATE"
desired_count = 2
enable_execute_command = true
network_configuration {
assign_public_ip = false
security_groups = [
aws_security_group.fluentbit-container-sg.id,
]
subnets = [
data.aws_ssm_parameter.subnet1.value,
data.aws_ssm_parameter.subnet2.value,
]
}
load_balancer {
target_group_arn = aws_lb_target_group.fluentbit_ecs_syslog_tg.arn
container_name = "fluentbit"
container_port = "5140"
}
}
resource "aws_ecs_task_definition" "fluentbit" {
family = "fluentbit"
container_definitions = jsonencode([{
name = "fluentbit"
essential = true
#readonlyRootFilesystem = true can't be enabled because AWS fargate in the s3 init files https://github.com/fluent/fluent-bit/issues/7308
image = "${data.aws_ssm_parameter.fluent-latest-image.value}"
entrypoint = ["/bin/sh","-c"]
command = ["/init/fluent_bit_init_entrypoint.sh"]
environment = [
{
name = "aws_fluent_bit_init_s3_1"
value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-base.conf"
},
{
name = "aws_fluent_bit_init_s3_2"
value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-input.conf"
},
{
name = "aws_fluent_bit_init_s3_3"
value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-parser.conf"
},
{
name = "aws_fluent_bit_init_s3_4"
value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-output.conf"
}
]
portMappings = [{
containerPort = 5140
hostPort = 5140
protocol = "tcp"
},{
containerPort = 2020
hostPort = 2020
protocol = "tcp"
}]
healthcheck = {
command = ["CMD-SHELL","curl -f http://localhost:2020/api/v1/health || exit 1"]
interval = 60
timeout = 5
retries = 3
start_period = 90
}
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-region" = "ca-central-1"
"awslogs-group" = "${aws_cloudwatch_log_group.ecs_fluentbit_service.id}"
"awslogs-stream-prefix" = "ecs"
}
}
}])
Fluent Bit Log Output
See above.
Fluent Bit Version Info
Container: aws-for-fluent-bit:init-latest
Fluent-bit version: Fluent Bit v1.9.10
Cluster Details
Application Details
Steps to reproduce issue
Related Issues
Describe the question/issue
Fluent bit logs show error
AccessDeniedExceptionbecause it tries to create a log group that it is not allowed / is not configured:However, our output plugin setting is:
During fluent-bit startup we see following logs:
Our configuration only uses the newer
cloudwatch_logsplugin. We do not specify or use thecloudwatchplugin.It seems that the
cloudwatchplugin is being used for some reason as well, even though it is not being specified by us. It is using some config that specifiesus-east-1region andfluent-bit-cloudwatchlog group, as shown in the logs. This then causes the denied exception error.In regards to our specified
cloudwatch_logsplugin - we are seeing logs written to the specified log group / log stream correctly.As shown above in the output config, we set the
Retry_Limittono_limits.However, logs show:
Earlier startup logs show:
It's not completely clear to me whether the
task_id=0 reached retry-attempts limit 1/1is referencingcloudwatch_logsplugin. If so, then why is it not respecting ourRetry_Limit no_limitssetting? (We've also tried different settings, e.g.5instead ofno_limits). Or is thetask_id=0 reached retry-attempts limit 1/1related to the previous error line that referencescloudwatch.0, which means that it is also related to our mysteriouscloudwatchplugin.Configuration
ECS Config:
Fluent Bit Log Output
See above.
Fluent Bit Version Info
Container:
aws-for-fluent-bit:init-latestFluent-bit version: Fluent Bit v1.9.10
Cluster Details
Application Details
Steps to reproduce issue
Related Issues