Skip to content

fluent-bit init uses cloudwatch plugin not specified in the config #835

@borkod

Description

@borkod

Describe the question/issue

  • Issue 1

Fluent bit logs show error AccessDeniedException because it tries to create a log group that it is not allowed / is not configured:

time="2024-06-13T18:48:52Z" level=error msg="AccessDeniedException: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/fluentbit-task-role/xxxxxxx is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:us-east-1:xxxxxxxxxxxx:log-group:fluent-bit-cloudwatch:log-stream: because no identity-based policy allows the logs:CreateLogGroup action\n\tstatus code: 400, request id: xxxxxxxx"

However, our output plugin setting is:


[OUTPUT]
  Name              cloudwatch_logs
  Match             *
  region            ca-central-1
  log_group_name    testname
  log_stream_name   teststream
  auto_create_group  false
  Retry_Limit   no_limits

During fluent-bit startup we see following logs:

[2024/06/13 19:22:22] [ info] cloudwatch.0
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_stream = 'true'"
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_group = 'true'"
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter region = 'us-east-1'"
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter default_log_group_name = 'fluentbit-default'"
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter log_group_name = 'fluent-bit-cloudwatch'"

Our configuration only uses the newer cloudwatch_logs plugin. We do not specify or use the cloudwatch plugin.

It seems that the cloudwatch plugin is being used for some reason as well, even though it is not being specified by us. It is using some config that specifies us-east-1 region and fluent-bit-cloudwatch log group, as shown in the logs. This then causes the denied exception error.

In regards to our specified cloudwatch_logs plugin - we are seeing logs written to the specified log group / log stream correctly.

  • Issue 2

As shown above in the output config, we set the Retry_Limit to no_limits.

However, logs show:

[2024/06/13 19:31:07] [ warn] [engine] chunk '1-1718307049.471694794.flb' cannot be retried: task_id=0, input=syslog.1 > output=cloudwatch.0
[2024/06/13 19:31:07] [debug] [task] task_id=0 reached retry-attempts limit 1/1

Earlier startup logs show:

[2024/06/13 19:30:50] [debug] [output:cloudwatch_logs:cloudwatch_logs.1] task_id=0 assigned to thread #0

It's not completely clear to me whether the task_id=0 reached retry-attempts limit 1/1 is referencing cloudwatch_logs plugin. If so, then why is it not respecting our Retry_Limit no_limits setting? (We've also tried different settings, e.g. 5 instead of no_limits). Or is the task_id=0 reached retry-attempts limit 1/1 related to the previous error line that references cloudwatch.0, which means that it is also related to our mysterious cloudwatch plugin.

Configuration

ECS Config:

resource "aws_ecs_service" "fluentbit" {
  name            = "fluentbit"
  task_definition = aws_ecs_task_definition.fluentbit.arn
  cluster = aws_ecs_cluster.fluentbit.id
  launch_type = "FARGATE"
  desired_count = 2
  enable_execute_command = true

  network_configuration {
    assign_public_ip = false

    security_groups = [
      aws_security_group.fluentbit-container-sg.id,
    ]

    subnets = [
      data.aws_ssm_parameter.subnet1.value,
      data.aws_ssm_parameter.subnet2.value,
    ]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.fluentbit_ecs_syslog_tg.arn
    container_name   = "fluentbit"
    container_port   = "5140"
  }
}

resource "aws_ecs_task_definition" "fluentbit" {
  family = "fluentbit"
  
  container_definitions = jsonencode([{
    name = "fluentbit"
    essential = true
    #readonlyRootFilesystem = true    can't be enabled because AWS fargate in the s3 init files https://github.com/fluent/fluent-bit/issues/7308
    image = "${data.aws_ssm_parameter.fluent-latest-image.value}"
    entrypoint = ["/bin/sh","-c"]
    command = ["/init/fluent_bit_init_entrypoint.sh"]
    environment = [
      {
        name = "aws_fluent_bit_init_s3_1"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-base.conf"
      },
      {
        name = "aws_fluent_bit_init_s3_2"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-input.conf"
      },
      {
        name = "aws_fluent_bit_init_s3_3"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-parser.conf"
      },
      {
        name = "aws_fluent_bit_init_s3_4"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-output.conf"
      }
    ] 
    portMappings = [{
      containerPort = 5140
      hostPort = 5140
      protocol = "tcp"
    },{
      containerPort = 2020
      hostPort = 2020
      protocol = "tcp"
    }]
    healthcheck = {
      command = ["CMD-SHELL","curl -f http://localhost:2020/api/v1/health || exit 1"] 
      interval = 60
      timeout = 5
      retries = 3
      start_period = 90
    } 
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-region" = "ca-central-1"
        "awslogs-group" = "${aws_cloudwatch_log_group.ecs_fluentbit_service.id}"
        "awslogs-stream-prefix" = "ecs"
      }
    }
  }])

Fluent Bit Log Output

See above.

Fluent Bit Version Info

Container: aws-for-fluent-bit:init-latest
Fluent-bit version: Fluent Bit v1.9.10

Cluster Details

Application Details

Steps to reproduce issue

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions