BARQ Systems

Overview

We are seeking a skilled Monitoring Engineer to design, implement, and maintain our monitoring and observability systems across infrastructure, applications, and services. The ideal candidate will have hands-on experience with monitoring tools, automation, and performance optimization to ensure system reliability, early issue detection, and rapid incident response.

Key Responsibilities

Monitoring & Observability

Design, configure, and maintain monitoring solutions across servers, applications, cloud resources, and networks.
Develop and optimize dashboards, alerts, and metrics for proactive monitoring.
Implement observability practices—including logs, metrics, traces—to improve system visibility.

Incident Detection & Response

Create effective alerting policies to identify and escalate performance degradation and system failures.
Collaborate with SRE/DevOps/IT teams to investigate incidents and provide root cause analysis (RCA).
Maintain incident and problem-tracking documentation.

Tooling & Automation

Administer monitoring tools such as Prometheus, Grafana, Zabbix, Nagios, Datadog, ELK/EFK, Splunk, New Relic, or similar.
Build automation scripts for monitoring agent deployment, configuration, and alert updates.
Integrate monitoring tools with ticketing systems (e.g., Jira, ServiceNow) and notification channels (Slack, Teams, email, SMS).

Performance & Reliability

Analyze system performance trends and provide recommendations for capacity planning.
Conduct health checks and performance tuning of applications and infrastructure.
Support optimization of CI/CD pipelines with monitoring data insights.

Documentation & Governance

Maintain monitoring standards, playbooks, and operational procedures.
Train engineering and operations teams on using monitoring dashboards and interpreting alerts.
Ensure compliance with internal reliability and SLA/SLI/SLO guidelines.

Required Qualifications

Bachelor’s degree in Computer Science, Information Systems, Engineering, or equivalent experience.
2–5+ years of experience in monitoring, observability, DevOps, or system administration.
Strong understanding of Linux/Windows environments, cloud platforms (AWS/Azure/GCP), and networking fundamentals.
Hands-on experience with monitoring tools like Prometheus, Grafana, Zabbix, ELK, Datadog, New Relic, Splunk, or equivalent.
Experience writing automation scripts (Python, Bash, PowerShell, etc.).
Understanding of logs, metrics, and distributed tracing concepts.

Career Opportunity

Monitoring Engineer