Job Title: Sr. Site Reliability Engineer
Location: Atlanta, GA
Salary Range: $130,000 - $160,000
Benefits: Healthcare, PTO, 401k
About the Position
-
You’re a software engineer who enjoys operating systems at scale and wants to design intelligent automation for incident detection and response. Your work will focus on preventing outages, applying AI to operational workflows, and enabling teams to build more durable services.
-
You’re a motivated, self-directed engineer who enjoys partnering with development teams to evolve reliability practices. You learn quickly, enjoy solving complex production problems with code, and value seeing your solutions adopted across a large organization.
In this Senior SRE role, you will:
-
Develop automation that minimizes manual operational effort and improves team effectiveness
-
Design and maintain internal tooling that provides visibility into service health and reliability trends
-
Rethink post-incident analysis by transforming learnings into proactive safeguards
-
Evaluate and implement modern techniques for observability, telemetry, and alerting
-
Apply deep engineering expertise to diagnose and resolve complex production issues
-
Investigate how machine learning and AI can enhance signal detection, prioritization, and response workflows
-
Collaborate with engineering partners to examine incidents and address systemic reliability gaps
-
Lead technical discussions that influence reliability architecture and operational standards
-
Translate recurring operational challenges into scalable engineering solutions
-
Help define and evolve best practices for incident response and operational excellence
About You
-
Experience working with compiled languages (such as Java, C#, or Go) and scripting or dynamic languages (such as Python, Ruby, or JavaScript), with a solid understanding of when to use each
-
Strong background in distributed systems, including familiarity with common failure scenarios
-
Hands-on experience creating internal platforms, automation frameworks, or developer-facing tools
-
Proficiency with version control systems and continuous integration / deployment workflows
-
Experience managing infrastructure through code and designing service APIs
-
Demonstrated ability to reduce operational overhead through thoughtful automation
-
Ownership of production systems, including participation in on-call rotations and incident response
-
Systems-oriented thinker who considers interactions and dependencies at scale
-
Comfortable investigating ambiguous problems and contributing solutions in collaborative settings
-
Receptive to feedback and able to synthesize multiple perspectives into practical outcomes
-
Clear technical communicator, capable of producing both detailed documentation and architectural diagrams
-
Detail-oriented with strong analytical problem-solving skills
-
Nice to have: chaos testing, lean or agile methodologies, open-source involvement, or public speaking experience
Why This Role Stands Out
-
Product-Oriented Engineering: You’ll define direction, document designs, and deliver features—not just react to alerts
-
Foundational Influence: Help establish reliability practices that support availability, performance, and scale across the organization
-
Applied AI: Work hands-on with modern LLMs and automation techniques in real operational environments
-
Growth & Leadership: Build technical depth, influence cross-functional teams, and contribute to long-term reliability strategy
Brilliant Staffing, LLC is an Equal Opportunity Employer and encourages applications from all individuals regardless of race, color, religion, gender, gender identity, sexual orientation, national origin, disability, or veteran status.
#LI-AG1