Simplify your Monitoring

Your systems are on fire. You didn't notice it. Your Customers did?

  • Your systems are not working. Feels like everyone is running around in circles, not knowing what to do?
  • Keep important, revenue-critical systems up and running despite hurricanes, outages, and configuration errors.
  • DevOps, SRE, all the complicated fuss. How can you start simple and get benefits immediately? Could everything be just a bit simpler?

Hi, I'm Jonas. I'm a Certified Kubernetes Administrator, and I love to help teams set up Monitoring and onboard them to the path of  Site Reliability Engineering.

Simplify your  life with an effective way to monitor your Infrastructure Systems and your applications.

When I started helping clients install and manage on-prem Kubernetes, I would see them struggling to monitor it. Typically they would ask me:

How do I integrate it with their existing systems, like Zabbix / Nagios?

We are so used to the good old way of writing Zabbix / Nagios bash checks. Prometheus or a similar metric monitoring system just does not come naturally.

If you think about it, actually Prometheus is hard. Nobody gets Prometheus query language (PromQL right first time, all the fussiness around instant vectors vs. range vectors, empty values, labels. What is going on?

Interestingly most of my clients are trying to monitor the same cloud-native open-source software. They spend their engineering hours writing bash scripts instead of doing work, which moves the business forward. So we keep wasting hours and hours of engineers solving essentially the same problem. 

But what if you could install a prepacked set of monitoring rules for Prometheus? If we can figure out a monitoring solution, which already has all the needed monitoring rules in place. That's a win for everybody! You get to have  a piece of mind that everything works and save your most valuable engineering time.

Many of my customers think It's enough to install any APM monitoring software like Sentry, NewRelic, Datadog, and just be done with it. Problem solved. Nothing more here to do! Right?

Most of them use it, and they like all the new observability they get from it. But only for the first two months. Then a real-world problem hits, they have a massive outage, and then they start seeing how limited their monitoring solution is.


Let me tell you a true story about what happened to me.

Only real-world outages separate the wheat from the chaff.

It's 10 PM on Friday. I get a call from my client: "Help Kubernetes is not working." As usual, I ask for more details. The issue is that their webshop is slow for customers. They check New Relic, they see than their typical daily usage. It turns out marketing did an upstanding promotion.

Web developers don't see anything out of the ordinary. Just that their PHP application sometimes logs these weird errors: file_put_contents(): Only 0 of 1705 bytes written, possibly out of free disk space.

So they blame Kubernetes and ask me to fix it. At this point, New Relic is pretty useless. It just shows more traffic, that's it.

I go into Prometheus, found that they have ten active alerts that they have been ignoring. The most important one named - WebCPUThrottlingHighCritical.

So I click on the alert, jump into the preconfigured Grafana dashboard, select their application, and see that it's CPU Throttled at 70%. A minute later, I bump their Pod CPU limits for them. The website works flawlessly once again.


The next day comes, and guess what? I get a call again. Now it's like 6 AM. We have the same problem - customers reporting that the website is not responding. Except that now they don't see any abnormal traffic in New Relic. 

They have no idea what is going on, and they still didn't believe me that CPU throttling was the previous issue, as they would see that in New Relic's CPU graphs.

I tell them - it's all a fluff. I ask for logs. They say there is a small amount of the same PHP error:  file_put_contents(): Only 0 of 1400 bytes written, possibly out of free disk space.

By taking a look at Prometheus metrics, I immediately notice a different set of alerts. First thing - we have 40% CPU throttling on some of the microservices of their web app. Additionally, their Horizontal Pod Autoscaler has maxed out. Grafana dashboard shows that they are receiving a lot of traffic.

We bump the Horizontal Pod Autoscaler limit and CPU limits. The website comes back to life. 

It turns out their internal batch job was hitting their website API with some heavy traffic.  Now they believe me, we write a Postmortem together, and they stopped ignoring Prometheus Alerts. I never got a call around CPU Throttling ever again.

Don't be limited by your observability tool.

Why is Prometheus different? Prometheus provides you with a platform. The platform includes a Time Series database, a simpleway to ingest data and Time Series Query Language - PromQL, a defacto standard for querying monitoring data. Most importantly,Prometheus does not lock in your view into any particular graph or dashboard. You get the data in and  query it, however, you want.

The biggest problem with Prometheus: it has a steep learning curve

I remember when I spent months learning PromQL. It's a challenging endeavor: sometimes queries would work straight away. Other times it would miserably fail with weird syntax errors.

Basic concepts have non-intuitive names. Can somebody explain what are instant vectors or range vectors? Keep in mind all the rules around statistics. Do you remember how to compute quantities correctly? What makes it more difficult is that results might be statistically invalid. For example, you can't aggregate quantiles. I mean, Prometheus won't stop you, but the data will be garbage.

Then comes Grafana dashboards. Grafana is an excellent tool. It usually catches up quickly. Everyone in your company starts creating dashboards. After a month or so, you have this huge mess of different dashboards: no standards, all of them look different, all trying to graph different metrics, some even have broken graphs. Then there are plain out wrong dashboards. I've seen all the statistical fallacies: Aggregated quantiles, latencies that show fake data, an axis that lies about units. 

It's the worst. You are debugging latency issues, spend hours investigating. All the graphs look good, but it turns out one is entirely wrong. And then you lose trust. You start verifying all the dashboards or stop using them altogether.

Things can be better - you can have good, simple, useful graphs and dashboards that help solve outages quickly.

We need to help teams to get started. Give lots of examples. Build typical dashboards, have standard methods to monitor applications, graph Request rate, Error rate & Latency metrics. Standardize on histograms and quantiles. Write runbooks. For more advanced teams, we can graph Service Level Availability too

You probably run a lot of open-source software. Many of those tools integrate Prometheus metrics directly. Others allow you to incorporate metrics via exporters. So you can get Prometheus metrics from almost all of them. Maybe you don't want to spend time building dashboards and alerting rules for them. Perhaps you wish to have it ready-made and customized for you, just the way you like it

It would help if you had great runbooks. Did you know that recording best practices ahead of time in a runbook produces roughly a 3x improvement in Mean Time To Repair as compared to the strategy of "winging it"? Runbooks is how Google runs their production systems. It's the basis of Site Reliability Engineering and what makes an excellent DevOps team.

What about open source software?

Open-source systems are the core building blocks of any modern infrastructure. It would help if you improved how you run it. Else your customers will be calling you when it breaks. I know that you are missing some critical indicator monitoring.

Streamline your operations

  • Prometheus Alerts. Best in class alerts that are fast to act upon, with proper links to Grafana dashboards and runbooks.
  • Dashboards. The standard layout for dashboards: showing Request rate, Error rate, and Duration. Complex OSS software should also cover internal job statuses and queue lengths.
  • Runbooks. Good runbooks for common issues so that everyone knows what to do when things go south. 

A simple solution for your monitoring needs

You've to know your company best. You know where you lack expertise. You know that without excellent Monitoring, you have no way to tell whether the service is even working; absent a thoughtfully designed monitoring infrastructure, you're flying blind

Building and operating your services is a lot of work. Successfully managing a service entails a wide range of activities:

  • Developing monitoring systems.
  • Planning capacity.
  • Responding to incidents.
  • Addressing the root causes of outages.
  • Improving testing and release procedures, and so on.

There is not enough time to do EVERYTHING...

Stop worrying about your Monitoring practices

The solution is simple; let's package up Prometheus Alerts, Grafana dashboards, and Runbooks for common open-source projects. You just installed this new shiny project? You install a simple monitoring package along with it.

Monitoring packages simplifies your operational process, saves your time, and adds tremendous value to your DevOps team. Stop worrying about your system availability with superb Monitoring. Free up your time to do more important things.

Monitoring is just one piece of the puzzle. And I believe you don't need to reinvent the wheel. Think about it. When you stop worrying about keeping up your systems because you have excellent Monitoring, your team will be able to deploy to production faster and safer, allowing you to iterate faster and bring value to your customers.

This is Why we CREATED the product

Here it is: Prometheus Kube

PrometheusKube gives you a prebuilt package of professionally made Prometheus Alerts, Grafana Dashboards and Runbooks for open source software you use daily.

PrometheusKube includes 19+ professionally built packages ready for you! We extensively test every alerting rule.

Open source is continuously changing. Therefore we regularly release updates, add new packages, and update existing ones. 

If we don't have a package for the open-source system you are using, don't worry, we will make it for you!

  1. 1
    Prometheus Alerts:  Simple and efficient alerting rules. Well tested and based on best practices.
  2. 2
    Grafana Dashboards: Graph key indicators first.
  3. 3
    Runbooks: Simple to understand, step-by-step instructions to resolve incidents quickly.
Basic

Best for professional DevOps teams who are familiar with Prometheus 

$

300

/mo

  • Basic Support
  • Prometheus Alerts
  • Grafana Dashoards
  • Runbooks
  • Regular updates and new packages
Cloud

Integrate PrometheusKube into your product 

pricing depends on the product

 
  • Priority Support
  • Prometheus Alerts
  • Grafana Dashboards
  • Runbooks
  • Regular updates and new packages

Unlock the power of PrometheusKube now! 

Open Source packages

  • Infrastructure: Kubernetes, CoreDNS, etcd, Docker, containerd,  ArgoCD and more...
  • Monitoring: Prometheus, Thanos, Cortex, jaeger.  node-exporter and more...
  • Databases & Storage: Redis, Memcached, CockroachDB, Ceph, gluster and more..
Jonas

Creator of  PrometheusKube

About the Author

Hi! I'm Jonas, and I help companies grow operational practices, adopt a DevOps mindset, and go from weekly deploys to daily.

After years of managing Kubernetes, Prometheus, and other open-source products, I finally made a revolutionary discovery: You can simplify the operational process and run it yourself. You don't need me to manage Kubernetes for you—you can do it yourself!

That's why I started PrometheusKube

PrometheusKube helps people run open-source software without a hassle. So that they save time and improve their DevOps practices, that's why I'm automating myself out of a job!

Simplify your Monitoring with PrometheusKube

PrometheusKube delivers the 19+ best in class monitoring packages to solve your operational problems and open your eyes to the world of observability. 

Become one of my limited number of innovative customers! Let's become partners and stop worrying about your infrastructure systems going down without notice.

What is your biggest frustration with Monitoring?

Prometheus focuses on providing a platform "for building your monitoring.". It provides you with the tools to collect metrics, store them, check them, and query them. We are all expected to go through this process.

However, most of our infrastructure is standardized. We run well known the same operating system, the same database, the same web server, etc.

Why can't we have installable packages that instantly provide feature-rich dashboards and alerts and runbooks about everything we use? Is there any reason you would like to monitor your web server differently than me?

What a waste of time and money! Hundreds of thousands of people do the same thing repeatedly: trying to understand what the metrics are, how to visualize them, how to configure alerts for them, and how to query them when issues arise.

What guarantees do I get?

We guarantee you that we will provide you with updates / new packages monthly.
If you are not satisfied in the first 30 days, we will refund you the money, and you will get to keep the packages, but no new alerting packages or updates.

Can I buy PrometheusKube as a one-time product? 

No, we sell PrometheusKube as a monthly or yearly subscription service. One time product doesn't work, as the IT industry never stops changing, and we need to adapt monitoring to it continuously. If the alerting rules work now, it doesn't mean it will work tomorrow!

What happens if I want to leave?

We will be super sad to see you go! You get to keep all the alerts, dashboards, and runbooks. Everything works as previously. We will stop supporting you, new OSS packages, tools, and updates.

Where can I contact you?

If you have any questions, please use this contact form, and we will get to you as soon as possible.

Whos is using PrometheusKube?

Currently, we install PrometheusKube for all our on-prem Kubernetes support services. Five of our clients are using PrometheusKube daily and are extremely happy. We want to deliver you 10x value for the cost and become your long term partners

How do I get  support?

Typically after we sign a simple contract, we provide support via slack or emails. If you have any questions about PrometheusKube or Observability, Monitoring, SLAs, or general DevOps, we are here to help you! 

Advantages vs. Disadvantages

Our solution is unique and better than anything out there. We tried making it work with Grafana dashboards and open source tools from the internet, and it's messy. We believe you will get tremendous value out of your purchase.

PrometheusKube

  • Prometheus Alerts, Dashboards, and Runbooks.
  • Ease of use, we provide you with simple way to install and manage your alerting configuration.
  • Quick bug fixes. We hate flaky & unactionable alerts.
  • Professional support & Consulting services - ask us anything, we are here to help!

Do it yourself

  • Alerts, Dashboards & Runbooks taken from the internet, everything looks different and weird. Nothing connects.
  • You have to connect all the information and try to bolt-on alerts manually.
  • Fix bugs or issues by yourself, or ask for overworked open source maintainers for help.
  • No support or consulting.

What You Get

  • Prometheus Alerting Rules - Best practice alerts based on critical metrics.
  • Grafana dashboards - Graph essential metrics first.
  • Runbooks - Simple to understand, step-by-step instructions to resolve incidents quickly.
  • Professional support for your observability needs - Improve your technology stack.
  • Custom made dashboards and alerting rules - Designed for you and your technology stack.
  • Consulting services - Your questions are our top priority.

Frequently Asked Questions

Why PrometheusKube?

How does PrometheusKube work?

Contact us

Can I cancel any time?

What payment options are available?

Can I get PrometheusKube for free?


P.S.: We are releasing PrometheusKube to a select number of customers only so that we can focus on providing the best experience possible for you.


Don't miss out. Jump on this opportunity right now!

Copyright - PrometheusKube