Loggregator Guide for Cloud Foundry Operators
Page last updated: December 10, 2015
This topic contains information about the Loggregator System that may be of use to operators of Cloud Foundry deployments.
Scaling Loggregator
Dopplers are components that take in log and metric data from Cloud Foundry system components and store this data in a buffer before periodically forwarding it to the Traffic Controller, which serves up the aggregated data stream through the Firehose WebSocket endpoint. When input from Metron agents exceeds a doppler’s buffer size for a given interval, log and metric data can be lost. There are several ways to minimize this loss.
Increase buffer size
The doppler internal buffer size can be increased by changing the doppler.message\_drain\_buffer\_size property
from its default value of 100 in your CloudFoundry BOSH deployment manifest.
Add additional Doppler instances
The number of doppler servers can be increased by modifying the instances
property for the doppler\_z1
and doppler\_z2
jobs in your CloudFoundry BOSH deployment manifest.
Scaling Nozzles
Scale nozzles by using the subscription ID, which is specified when the nozzle connects to the Firehose. If you use the same subscription ID on each nozzle instance, the Firehose will evenly distribute events across all instances of the nozzle. For example, if you have two nozzles with the same subscription ID, then half the events will go to one nozzle and half to the other. Similarly, if there were three instances of the nozzle, then each instance would get one-third the traffic.
A stateless nozzle should handle scaling gracefully. If the nozzle buffers or caches the data, the nozzle author must test what happens when the nozzle is scaled up or scaled down.
Slow Nozzle Alerts
The Traffic Controller alerts nozzles if they are consuming events too slowly. If the nozzle falls behind, Loggregator will alert the nozzle in two ways:
TruncatingBuffer alerts: If the nozzle is consuming messages more slowly than they are being produced, the loggregator system may drop messages. In this case, the loggregator system sends the log message
TB: Output channel too full. Dropped (n) messages
, where “n” is the number of dropped messages. It also emits a CounterEvent with the nameTruncatingBuffer.DroppedMessages
. The nozzle receives both messages from the Firehose, alerting the operator to the performance issue.PolicyViolation error: The Traffic Controller periodically sends
ping
control messages over the Firehose WebSocket connection. If a client does not respond to aping
message with apong
message within 30 seconds, the Traffic Controller closes the WebSocket connection with the WebSocket error code “ClosePolicyViolation” (1008). The nozzle should intercept this WebSocket close error, alerting the operator to the performance issue.
An operator can choose to scale her nozzles in response to these alerts, in order to minimize the loss of data.
Managing Syslog Forwarding from Cloud Foundry Components
Syslog data can be forwarded directly from Cloud Foundry components to an external aggregator.
To do this, add the following properties to the properties
hash in your CF deployment manifest.
properties:
syslog_daemon_config:
address: YOUR-SYSLOG-AGGREGATOR-IP
port: YOUR-SYSLOG-AGGREGATOR-TCP-PORT
transport: YOUR-TRANSPORT-PROTOCOL
Replace YOUR-TRANSPORT-PROTOCOL
with one of the following transport protocols:
tcp
udp
relp
Customizing Loggregator Components
Each Loggregator component can be customized in many ways by changing its properties in the CF deployment manifest. Some of the most useful are detailed below.
DEA Logging Agent
Property | Description | Default |
---|---|---|
dea_logging_agent.debug | Boolean value to turn on verbose mode | false |
metron_endpoint.host | The host used to emit messages to the Metron agent | 127.0.0.1 |
metron_endpoint.dropsonde_port | The port used to emit dropsonde messages to the Metron agent | 3457 |
metron_endpoint.shared_secret | The key used to sign log messages | No default value |
dea_logging_agent.status.port | Port used to run the varz endpoint | 0 |
nats.user | Username for cc client to connect to NATS | No default value |
nats.password | Password for cc client to connect to NATS | No default value |
nats.machines | IP addresses of Cloud Foundry NATS servers | No default value |
nats.port | IP port of Cloud Foundry NATS server | No default value |
Doppler
Property | Description | Default |
---|---|---|
doppler.zone | Zone of the doppler server | No default value |
doppler.debug | boolean value to turn on verbose logging for doppler system (dea agent & doppler server) | false |
doppler.status.user | username used to log into varz endpoint |
|
doppler.status.port | Port used to run the varz endpoint |
|
doppler.status.port | port used to run the varz endpoint | 0 |
doppler.maxRetainedLogMessages | number of log messages to retain per application | 100 |
doppler.incoming_port | Port for incoming log messages in the legacy format | 3456 |
doppler.dropsonde_incoming_port | Port for incoming messages in the dropsonde format | 3457 |
doppler.outgoing_port | Port for outgoing log messages | 8081 |
doppler.blacklisted_syslog_ranges | Blacklist for IPs that should not be used as syslog drains, e.g. internal ip addresses. | No default value |
doppler.container_metric_ttl_seconds | TTL (in seconds) for container usage metrics | 120 |
doppler.unmarshaller_count | Number of parallel unmarshallers to run within Doppler | 5 |
doppler.sink_inactivity_timeout_seconds | Interval before removing a sink due to inactivity | 3600 |
doppler.sink_dial_timeout_seconds | Dial timeout for sinks | 1 |
doppler.sink_io_timeout_seconds | I/O Timeout on sinks | 0 |
doppler_endpoint.shared_secret | Shared secret used to verify cryptographically signed doppler messages | No default value |
doppler.message_drain_buffer_size | Size of the internal buffer used by doppler to store messages. If the buffer gets full doppler will drop the messages. | 100 |
etcd.machines | IPs pointing to the ETCD cluster | No default value |
metron_endpoint.host | The host used to emit messages to the Metron agent | 127.0.0.1 |
metron_endpoint.dropsonde_port | The port used to emit dropsonde messages to the Metron agent | default: 3457 |
ssl.skip_cert_verify | when connecting over TLS, don’t verify certificates | false |
Traffic Controller
Property Name | Description | Default |
---|---|---|
traffic_controller.zone | Zone of the loggregator_trafficcontroller | |
traffic_controller.debug | boolean value to turn on verbose logging for loggregator system (dea agent & loggregator server) | false |
loggregator.outgoing_dropsonde_port | Port for outgoing dropsonde messages | 8081 |
loggregator.doppler_port | Port for outgoing doppler messages | 8081 |
traffic_controller.outgoing_port Port on which the traffic controller listens to for requests | 8080 | |
traffic_controller.status.user | username used to log into varz endpoint | |
traffic_controller.status.password | password used to log into varz endpoint | |
traffic_controller.status.port | port used to run the varz endpoint | 0 |
doppler.uaa_client_id | Doppler’s client id to connect to UAA | doppler |
uaa.clients.doppler.secret | Doppler’s client secret to connect to UAA | |
uaa.url | URL of UAA | |
uaa.no_ssl | Do not use SSL to connect to UAA (used in case uaa.url is not set) | false |
metron_endpoint.dropsonde_port | The port used to emit dropsonde messages to the Metron agent | 3457 |
loggregator.etcd.machines | IPs pointing to the ETCD cluster | |
loggregator.etcd.maxconcurrentrequests | Number of concurrent requests to ETCD | 10 |
system_domain | Domain reserved for CF operator, base URL where the login, uaa, and other non-user apps listen | |
nats.user | Username for cc client to connect to NATS | |
nats.password | Password for cc client to connect to NATS | |
nats.machines | IP addresses of Cloud Foundry NATS servers | |
nats.port | IP port of Cloud Foundry NATS server | 4222 |
loggregator_endpoint.shared_secret | Shared secret used to verify cryptographically signed loggregator messages | |
ssl.skip_cert_verify | when connecting over https, ignore bad ssl certificates | false |
cc.srv_api_uri | API URI of cloud controller |
Metron Agent
Property Name | Description | Default |
---|---|---|
syslog_daemon_config.address | IP address for syslog aggregator | |
syslog_daemon_config.port | TCP port of syslog aggregator | |
syslog_daemon_config.transport | Transport to be used when forwarding logs (tcp | udp |
syslog_daemon_config.fallback_addresses | Addresses of fallback servers to be used if the primary syslog server is down. Only tcp or relp are supported. Each list entry should consist of \address\, \transport\ and \port\ keys. | [] |
syslog_daemon_config.custom_rule | Custom rule for syslog forward daemon | |
metron_agent.incoming_port | Incoming port for legacy log messages | 3456 |
metron_agent.dropsonde_incoming_port | Incoming port for dropsonde log messages | 3457 |
metron_agent.debug | boolean value to turn on verbose mode | false |
metron_agent.zone | Availability zone where this agent is running | |
metron_agent.deployment | Name of deployment (added as tag on all outgoing metrics) | |
metron_agent.etcd_query_interval_milliseconds | Interval for querying ETCD for trafficcontroller heartbeats | 5000 |
metron_agent.logrotate.freq_min | The frequency in minutes which logrotate will rotate VM logs | 5 |
metron_agent.logrotate.rotate | The number of files that logrotate will keep around on the VM | 7 |
metron_agent.logrotate.size | The size at which logrotate will decide to rotate the log file | 50M |
loggregator.dropsonde_incoming_port | Port where loggregator listens for dropsonde log messages | 3457 |
loggregator_endpoint.shared_secret | Shared secret used to verify cryptographically signed loggregator messages | |
loggregator.etcd.machines | IPs pointing to the ETCD cluster | |
loggregator.etcd.maxconcurrentrequests | Number of concurrent requests to ETCD | 106 |
Syslog Drain Binder
See Using Log Management Services.
Property Name | Description | Default |
---|---|---|
metron_endpoint.host | The host used to emit messages to the Metron agent | 127.0.0.1 |
metron_endpoint.dropsonde_port | The port used to emit dropsonde messages to the Metron agent | 3457 |
loggregator.etcd.machines | IPs pointing to the ETCD cluster | |
loggregator.etcd.maxconcurrentrequests | Number of concurrent requests to ETCD | 10 |
system_domain | Domain reserved for CF operator, base URL where the login, uaa, and other non-user apps listen | |
syslog_drain_binder.drain_url_ttl_seconds | Time to live for drain urls in seconds | 60 |
syslog_drain_binder.update_interval_seconds | Interval on which to poll cloud controller in seconds | 15 |
syslog_drain_binder.polling_batch_size | Batch size for the poll from cloud controller | 1000 |
syslog_drain_binder.debug | boolean value to turn on verbose logging for syslog_drain_binder | false |
cc.bulk_api_password | password for the bulk api | |
cc.srv_api_uri | API URI of cloud controller | |
ssl.skip_cert_verify | when connecting over https, ignore bad ssl certificates | false |