Loggregator Guide for Cloud Foundry Operators

Page last updated: December 10, 2015

This topic contains information about the Loggregator System that may be of use to operators of Cloud Foundry deployments.

Scaling Loggregator

Dopplers are components that take in log and metric data from Cloud Foundry system components and store this data in a buffer before periodically forwarding it to the Traffic Controller, which serves up the aggregated data stream through the Firehose WebSocket endpoint. When input from Metron agents exceeds a doppler’s buffer size for a given interval, log and metric data can be lost. There are several ways to minimize this loss.

Increase buffer size

The doppler internal buffer size can be increased by changing the doppler.message\_drain\_buffer\_size property from its default value of 100 in your CloudFoundry BOSH deployment manifest.

Add additional Doppler instances

The number of doppler servers can be increased by modifying the instances property for the doppler\_z1 and doppler\_z2 jobs in your CloudFoundry BOSH deployment manifest.

Scaling Nozzles

Scale nozzles by using the subscription ID, which is specified when the nozzle connects to the Firehose. If you use the same subscription ID on each nozzle instance, the Firehose will evenly distribute events across all instances of the nozzle. For example, if you have two nozzles with the same subscription ID, then half the events will go to one nozzle and half to the other. Similarly, if there were three instances of the nozzle, then each instance would get one-third the traffic.

A stateless nozzle should handle scaling gracefully. If the nozzle buffers or caches the data, the nozzle author must test what happens when the nozzle is scaled up or scaled down.

Slow Nozzle Alerts

The Traffic Controller alerts nozzles if they are consuming events too slowly. If the nozzle falls behind, Loggregator will alert the nozzle in two ways:

  • TruncatingBuffer alerts: If the nozzle is consuming messages more slowly than they are being produced, the loggregator system may drop messages. In this case, the loggregator system sends the log message TB: Output channel too full. Dropped (n) messages, where “n” is the number of dropped messages. It also emits a CounterEvent with the name TruncatingBuffer.DroppedMessages. The nozzle receives both messages from the Firehose, alerting the operator to the performance issue.

  • PolicyViolation error: The Traffic Controller periodically sends ping control messages over the Firehose WebSocket connection. If a client does not respond to a ping message with a pong message within 30 seconds, the Traffic Controller closes the WebSocket connection with the WebSocket error code “ClosePolicyViolation” (1008). The nozzle should intercept this WebSocket close error, alerting the operator to the performance issue.

An operator can choose to scale her nozzles in response to these alerts, in order to minimize the loss of data.

Managing Syslog Forwarding from Cloud Foundry Components

Syslog data can be forwarded directly from Cloud Foundry components to an external aggregator.

To do this, add the following properties to the properties hash in your CF deployment manifest.

properties:
  syslog_daemon_config:
        address: YOUR-SYSLOG-AGGREGATOR-IP
        port: YOUR-SYSLOG-AGGREGATOR-TCP-PORT
        transport: YOUR-TRANSPORT-PROTOCOL

Replace YOUR-TRANSPORT-PROTOCOL with one of the following transport protocols:

  • tcp
  • udp
  • relp

Customizing Loggregator Components

Each Loggregator component can be customized in many ways by changing its properties in the CF deployment manifest. Some of the most useful are detailed below.

DEA Logging Agent

Property Description Default
dea_logging_agent.debug Boolean value to turn on verbose mode false
metron_endpoint.host The host used to emit messages to the Metron agent 127.0.0.1
metron_endpoint.dropsonde_port The port used to emit dropsonde messages to the Metron agent 3457
metron_endpoint.shared_secret The key used to sign log messages No default value
dea_logging_agent.status.port Port used to run the varz endpoint 0
nats.user Username for cc client to connect to NATS No default value
nats.password Password for cc client to connect to NATS No default value
nats.machines IP addresses of Cloud Foundry NATS servers No default value
nats.port IP port of Cloud Foundry NATS server No default value

Doppler

Property Description Default
doppler.zone Zone of the doppler server No default value
doppler.debug boolean value to turn on verbose logging for doppler system (dea agent & doppler server) false
doppler.status.user username used to log into varz endpoint
doppler.status.port Port used to run the varz endpoint
doppler.status.port port used to run the varz endpoint 0
doppler.maxRetainedLogMessages number of log messages to retain per application 100
doppler.incoming_port Port for incoming log messages in the legacy format 3456
doppler.dropsonde_incoming_port Port for incoming messages in the dropsonde format 3457
doppler.outgoing_port Port for outgoing log messages 8081
doppler.blacklisted_syslog_ranges Blacklist for IPs that should not be used as syslog drains, e.g. internal ip addresses. No default value
doppler.container_metric_ttl_seconds TTL (in seconds) for container usage metrics 120
doppler.unmarshaller_count Number of parallel unmarshallers to run within Doppler 5
doppler.sink_inactivity_timeout_seconds Interval before removing a sink due to inactivity 3600
doppler.sink_dial_timeout_seconds Dial timeout for sinks 1
doppler.sink_io_timeout_seconds I/O Timeout on sinks 0
doppler_endpoint.shared_secret Shared secret used to verify cryptographically signed doppler messages No default value
doppler.message_drain_buffer_size Size of the internal buffer used by doppler to store messages. If the buffer gets full doppler will drop the messages. 100
etcd.machines IPs pointing to the ETCD cluster No default value
metron_endpoint.host The host used to emit messages to the Metron agent 127.0.0.1
metron_endpoint.dropsonde_port The port used to emit dropsonde messages to the Metron agent default: 3457
ssl.skip_cert_verify when connecting over TLS, don’t verify certificates false

Traffic Controller

Property Name Description Default
traffic_controller.zone Zone of the loggregator_trafficcontroller
traffic_controller.debug boolean value to turn on verbose logging for loggregator system (dea agent & loggregator server) false
loggregator.outgoing_dropsonde_port Port for outgoing dropsonde messages 8081
loggregator.doppler_port Port for outgoing doppler messages 8081
traffic_controller.outgoing_port Port on which the traffic controller listens to for requests 8080
traffic_controller.status.user username used to log into varz endpoint
traffic_controller.status.password password used to log into varz endpoint
traffic_controller.status.port port used to run the varz endpoint 0
doppler.uaa_client_id Doppler’s client id to connect to UAA doppler
uaa.clients.doppler.secret Doppler’s client secret to connect to UAA
uaa.url URL of UAA
uaa.no_ssl Do not use SSL to connect to UAA (used in case uaa.url is not set) false
metron_endpoint.dropsonde_port The port used to emit dropsonde messages to the Metron agent 3457
loggregator.etcd.machines IPs pointing to the ETCD cluster
loggregator.etcd.maxconcurrentrequests Number of concurrent requests to ETCD 10
system_domain Domain reserved for CF operator, base URL where the login, uaa, and other non-user apps listen
nats.user Username for cc client to connect to NATS
nats.password Password for cc client to connect to NATS
nats.machines IP addresses of Cloud Foundry NATS servers
nats.port IP port of Cloud Foundry NATS server 4222
loggregator_endpoint.shared_secret Shared secret used to verify cryptographically signed loggregator messages
ssl.skip_cert_verify when connecting over https, ignore bad ssl certificates false
cc.srv_api_uri API URI of cloud controller

Metron Agent

Property Name Description Default
syslog_daemon_config.address IP address for syslog aggregator
syslog_daemon_config.port TCP port of syslog aggregator
syslog_daemon_config.transport Transport to be used when forwarding logs (tcp udp
syslog_daemon_config.fallback_addresses Addresses of fallback servers to be used if the primary syslog server is down. Only tcp or relp are supported. Each list entry should consist of \address\, \transport\ and \port\ keys. []
syslog_daemon_config.custom_rule Custom rule for syslog forward daemon
metron_agent.incoming_port Incoming port for legacy log messages 3456
metron_agent.dropsonde_incoming_port Incoming port for dropsonde log messages 3457
metron_agent.debug boolean value to turn on verbose mode false
metron_agent.zone Availability zone where this agent is running
metron_agent.deployment Name of deployment (added as tag on all outgoing metrics)
metron_agent.etcd_query_interval_milliseconds Interval for querying ETCD for trafficcontroller heartbeats 5000
metron_agent.logrotate.freq_min The frequency in minutes which logrotate will rotate VM logs 5
metron_agent.logrotate.rotate The number of files that logrotate will keep around on the VM 7
metron_agent.logrotate.size The size at which logrotate will decide to rotate the log file 50M
loggregator.dropsonde_incoming_port Port where loggregator listens for dropsonde log messages 3457
loggregator_endpoint.shared_secret Shared secret used to verify cryptographically signed loggregator messages
loggregator.etcd.machines IPs pointing to the ETCD cluster
loggregator.etcd.maxconcurrentrequests Number of concurrent requests to ETCD 106

Syslog Drain Binder

See Using Log Management Services.

Property Name Description Default
metron_endpoint.host The host used to emit messages to the Metron agent 127.0.0.1
metron_endpoint.dropsonde_port The port used to emit dropsonde messages to the Metron agent 3457
loggregator.etcd.machines IPs pointing to the ETCD cluster
loggregator.etcd.maxconcurrentrequests Number of concurrent requests to ETCD 10
system_domain Domain reserved for CF operator, base URL where the login, uaa, and other non-user apps listen
syslog_drain_binder.drain_url_ttl_seconds Time to live for drain urls in seconds 60
syslog_drain_binder.update_interval_seconds Interval on which to poll cloud controller in seconds 15
syslog_drain_binder.polling_batch_size Batch size for the poll from cloud controller 1000
syslog_drain_binder.debug boolean value to turn on verbose logging for syslog_drain_binder false
cc.bulk_api_password password for the bulk api
cc.srv_api_uri API URI of cloud controller
ssl.skip_cert_verify when connecting over https, ignore bad ssl certificates false