Proper monitoring is a foundation for mindful configuration management. It’s a prerequisite for any optimization effort. Monitoring is useless without the ability to act on the insight.
What most monitoring solutions lack is easy access to instant and relevant action upon insights you gather. This post is about a bot-based notification and management layer that we’ve created to play with the ways we can deal with that exact problem and that’s, for now, is available in a beta mode for our selected customers.
We in the RC team resolve to provide three simple things — an ability to come up with your own insights (via continuous monitoring), get our insight to support you in your decision making (through anomaly & bottleneck detection), and give you means to act instantly.
We wanted to provide our customers with means to communicate with their infrastructure. So, the obvious idea was, why not integrate this workflow with Telegram (or any other messenger) — the tool that is always with you and which goal is exactly that — instant communication?
Below are some other features of our bot MVP:
- Integration with AWS
- Monitoring charts in chats
- Notification settings
- Reasonable security measures
- Start with the most useful cases (e.g. hibernate if something’s terribly off)
Integration with AWS
1.1 Metrics collection
1.2 Telegram authentication
1.3 Kapacitor’s events detection
RocketCompute uses Telegraf based metric collection agent it could be installed via command line or AWS launch template.
Telegram bot authentication
According to telegram API: you may use links like
t.me/your_bot?start=XXXX that open your bot with a parameter (could be more than one). We use this feature to authenticate a user in the bot by passing the token.
Charts with anomaly events
Initially, we didn’t send any pictures via the bot. There were just links to the Grafana-based monitoring portal. But after some feedback (basically, it’s too much trouble to decide if you’d like to click without at least a preview) we decided to try adding charts right to the bot message.
What can I say, we dig the result.
First of all, every engineer prefers to have her own customizable instance list that she wants to watch over.
Fortunately, there is a checkbox feature in telegram. Exactly what is needed — users can get a list of active instances.
- What if someone gets a telegram token and access to the AWS management console through it?
- What if someone stole a Telegram account from an engineer and somehow got AWS credentials through it?
- What if our engineer makes critical changes from the Telegram bot (clients want hibernating instances straight from the bot)?
In the initial version we’ve done the following:
- Account admin approves every Telegram user authentication
- All users must have two-factor authentication for a telegram account
- Only hibernating command available right from the Telegram bot. And it works only for instances in idle state (when the corresponding notification appears)
- No SSH credentials in Telegram bot. Only link to AWS console with relevant parameters
Best of all (from what we can gather so far) is the Hibernate button. It’s not something that is used often but it saves money.
Well, that’s it.
Pretty simple setup, that being combined with a personalized notification and anomaly detection system. It saves money from the very start and gives you the comfort of never losing sight of your infrastructure, which is especially important at times when you run something time consuming and infrastructure-heavy.
P.S.: If you think that this might be interesting for your applications (or if you’d like to have integrations with another messenger) — let us know, we’d be happy to give you a demo and discuss how we might be of help. We are keen to discover and learn about new cases to apply our tech.
P.P.S.: You can try our bot along with our monitoring and detection system via RocketCompute item on the AWS marketplace.