Название: IT Cloud
Автор: Eugeny Shtoltc
Издательство: ЛитРес: Самиздат
Жанр: Зарубежная компьютерная литература
isbn:
isbn:
Zabbix was developed in 1998 and released to OpenSource under the GPL in 2001. At that time, the traditional interface:
without any design, with a lot of tabs, selectors and the like. Since it was developed for
own needs, it contains specific solutions. He is oriented
monitoring devices and their components such as disks, networks, printers, routers and the like. For
interactions can be used:
Agents – installed on servers, collecting many metrics and poisoning Zabbix server
HTTP – Zabbix makes requests over http, for example printers
SNMP – a network protocol for communicating with network devices
IPMI is a protocol for communicating with server devices such as routers
In 2019, Gratner presented a rating of monitoring systems in its square:
** Dynatrace;
** Cisco (AppDynamics);
** New Relic;
** Broadcom (CA Technologies);
** Riverbed and Microsoft;
** IBM;
** Oracle;
** SolarWinds;
** Micro Focus;
** ManageEngine and Tingyun.
Not included in the square:
** Correlsense;
** Datadog;
** Elastic;
** Honeycomb;
** Instant;
** Jennifer Soft;
** Light Step;
** Nastel Technologies;
** SignalFx;
** Splunk;
** Sysdig.
When we run an application in a Docker container, all the standard output (what is displayed in the console) of the running program (process) is buffered. We can view this buffer with the docker logs name_container program . If we follow the Docker ideology – "one process, one container" – we can view the logs of an individual program. It is convenient to use the less and tail commands to view logs. The first program allows you to conveniently scroll through the logs with the keyboard arrows, search for the desired one based on matches and using a regular expression pattern, like the text editor vi . The second program displays the amount we need
An important criterion for ensuring smooth operation is the control of free space. So, if there is no space left, then the database will not be able to write data, with other components the situation can be more dire than the loss of new data. Docker has limit settings not only for individual containers, at least 10%. During imaging or container startup, an error may be thrown that the specified limits have been exceeded. To change the default settings, you need to tell the Dockerd server the settings, after stopping it with service docker stop (all containers will be stopped) and after resuming it with service docker start (the containers will be resumed). Settings can be set as options / bin / dockerd –storange-opt dm.basesize = 50G –stirange-opt
In Container, we have authorization, control over our containers, with the ability to create them for testing and see graphs on the processor and memory. More will require a monitoring system. There are quite a few monitoring systems, for example, Zabbix, Graphite, Prometheus, Nagios, InfluxData, OkMeter, DataDog, Bosum, Sensu and others, of which Zabbix and Prometheus are the most popular. The first is traditionally used, since it is the leading deployment tool, which admins love for its ease of use (all you need to do is to have SSH access to the server), low-level, which allows you to work not only with servers, but also with other hardware, such as routers. The second is the opposite of the first: it is focused exclusively on collecting metrics and monitoring, focused as a ready-made solution, and not a framework and fell in love with programmers, set it according to the principle, chose metrics and received graphs. The key feature between Zabbix and Prometheus is not in the preferences of some to customize in detail for themselves and others to spend much less time, but in the scope. Zabbix is focused on setting up work with a specific hardware, which can be anything, and often very exotic in a corporate environment, and for this entity, a manual collection of metrics is written, a schedule is manually configured. For a dynamically changing environment of cloud solutions, even if it is just a Docker container, and even more so if it is Kubernetes, in which a huge number of entities are constantly created, and the entities themselves, apart from the general environment, are not of particular interest, it is not suitable for this in Prometheus Service Discovery is built-in and navigation is supported for Kubernetes through the namespace, the balancer (service) and the group of containers (POD), which can be configured in Grafana in the form of tables. In Kubernetes, according to The News Stack 2017, Kubernetes User and Experience is used in 63% of cases, in the rest there are more rare cloud monitoring tools.
Metrics can be system (for example, CRU, RAM, ROM) and application (service and application metrics). System metrics are core metrics that are used by Kubernetes for scaling and the like and non-core metrics that are not used by Kubernetes. Here is an example of bundles for collecting metrics:
* cAdvisor + Heapster + InfluxDB
* cAdvisor + collectd + Heapster
* cAdvisor + Prometheus
* snapd + Heapster
* snapd + SNAP cluster-level agent
* Sysdig
There are many monitoring systems and services on the market. We will consider exactly OpenSource, which can be installed in your cluster. They can be divided according to the model of obtaining metrics: into those who collect logs by polling, and those who expect that metrics will be poisoned in them. The latter are simpler both in structure and in use on a small scale. An example would be InfluxDB, which is a database that you can write to. The downside of this solution is the difficulty of scaling both in terms of support and load. If all services write at the same time, then they can overload the monitoring system, especially since it is difficult to scale, since the endpoint is registered in each service. The first group to practice a pull model of interaction is Prometheus. It is also a database with a daemon that polls services based on their registrations in the configuration file and pulls labels in a specific format, for example:
cpu_usage: 2
cpu_usage {app: myapp}: 2
Prometheus is a mature product, it was developed in 2012, and in 2016 it was included in the CNCF (Cloud Native Computing Foundation) consortium. Prometheus consists of:
* TSDB (Time Series Satabase) database, which looks more like a storage queue for metrics, with a specified accumulation period, for example, a week, allowing hundreds of thousands of metrics to be processed per second. This base is local to Prometheus, does not support horizontal scaling, in the case of Prometheus it is achieved by raising several of its instances and sharding them. Prometheus supports data aggregation, which is useful for reducing the amount of accumulated data, as well as archiving the database from memory to disk.
* СКАЧАТЬ