Added - 01/13/26

38228 | Software Engineer - Reliability Engineering

Technology

Grapevine, Texas | Direct Hire

Job Description

Job Title: Sr. Site Reliability Engineer

Location: Atlanta, GA

Salary Range: $130,000 - $160,000

Benefits: Healthcare, PTO, 401k

About the Position

You’re a software engineer who enjoys operating systems at scale and wants to design intelligent automation for incident detection and response. Your work will focus on preventing outages, applying AI to operational workflows, and enabling teams to build more durable services.
You’re a motivated, self-directed engineer who enjoys partnering with development teams to evolve reliability practices. You learn quickly, enjoy solving complex production problems with code, and value seeing your solutions adopted across a large organization.

In this Senior SRE role, you will:

Develop automation that minimizes manual operational effort and improves team effectiveness
Design and maintain internal tooling that provides visibility into service health and reliability trends
Rethink post-incident analysis by transforming learnings into proactive safeguards
Evaluate and implement modern techniques for observability, telemetry, and alerting
Apply deep engineering expertise to diagnose and resolve complex production issues
Investigate how machine learning and AI can enhance signal detection, prioritization, and response workflows
Collaborate with engineering partners to examine incidents and address systemic reliability gaps
Lead technical discussions that influence reliability architecture and operational standards
Translate recurring operational challenges into scalable engineering solutions
Help define and evolve best practices for incident response and operational excellence

About You

Experience working with compiled languages (such as Java, C#, or Go) and scripting or dynamic languages (such as Python, Ruby, or JavaScript), with a solid understanding of when to use each
Strong background in distributed systems, including familiarity with common failure scenarios
Hands-on experience creating internal platforms, automation frameworks, or developer-facing tools
Proficiency with version control systems and continuous integration / deployment workflows
Experience managing infrastructure through code and designing service APIs
Demonstrated ability to reduce operational overhead through thoughtful automation
Ownership of production systems, including participation in on-call rotations and incident response
Systems-oriented thinker who considers interactions and dependencies at scale
Comfortable investigating ambiguous problems and contributing solutions in collaborative settings
Receptive to feedback and able to synthesize multiple perspectives into practical outcomes
Clear technical communicator, capable of producing both detailed documentation and architectural diagrams
Detail-oriented with strong analytical problem-solving skills
Nice to have: chaos testing, lean or agile methodologies, open-source involvement, or public speaking experience

Why This Role Stands Out

Product-Oriented Engineering: You’ll define direction, document designs, and deliver features—not just react to alerts
Foundational Influence: Help establish reliability practices that support availability, performance, and scale across the organization
Applied AI: Work hands-on with modern LLMs and automation techniques in real operational environments
Growth & Leadership: Build technical depth, influence cross-functional teams, and contribute to long-term reliability strategy

Brilliant Staffing, LLC is an Equal Opportunity Employer and encourages applications from all individuals regardless of race, color, religion, gender, gender identity, sexual orientation, national origin, disability, or veteran status.

#LI-AG1

Back to Listings

Added - 01/13/262026-01-13

Grapevine, Texas, US | Direct Hire