OverviewAt PNNL, our core capabilities are divided among major departments that we refer to as Directorates within the Lab, focused on a specific area of scientific research or other function, with its own leadership team and dedicated budget.
Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have anEnvironmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus.
The National Security Directorate (NSD) drives science-based, mission-focused solutions to take on complex, real-world threats to our nation and the world.
The AI and Data Analytics Division, part of NSD, combines profound domain expertise and creative integration of advanced hardware and software to deliver computational solutions that address complex data and analytic challenges. Working in multidisciplinary teams, we connect foundational research to engineering to operations, providing the tools to innovate quickly and field results faster. Our strengths are integrated across the data analytics lifecycle, from data acquisition and management to analysis and decision support.
ResponsibilitiesWe are seeking a Lead DevOps/Platform Engineer to join PNNL's advanced AI engineering initiatives, contributing to next-generation systems spanning agentic AI platforms, large-scale data orchestration, and real-time intelligence processing. In this role, you'll apply your expertise in scalable system design and AI/ML engineering to build mission-critical capabilities while developing your technical leadership and establishing yourself as a key contributor to our engineering community.
Who You Are
You're an accomplished engineer with strong foundations in DevOps, scalable system design, AI/ML development, and production software engineering. You're ready to take on increasing technical responsibility, leading components of complex systems while mentoring junior team members. You excel at translating technical requirements into working solutions, selecting appropriate approaches for challenging problems, and contributing meaningfully to technical direction and project success.
What You'll Build
AI-Native Systems & Platforms
- Design and deploy scalable agentic AI systems with dynamic reasoning and decision-making capabilities
- Architect LLM orchestration frameworks using LangChain, LlamaIndex, and emerging agent platforms
- Build MLOps platforms spanning experiment tracking, model versioning, deployment, and governance
- Develop developer-focused tooling, adapters, and interfaces for AI-native frameworks
- Integrate multi-modal data sources (text, vision, structured/sensor data) into cohesive reasoning pipelines
Scalable Infrastructure & Data Systems
- Design microservices architectures coordinating across multiple domains and security enclaves
- Lead distributed system design processing data from hundreds of sources simultaneously
- Architect real-time streaming platforms handling terabytes per hour with event-driven architectures
- Build robust data pipelines for petabyte-scale ETL, data lake/mesh architectures, and real-time analytics
- Design container orchestration (Kubernetes) and CI/CD pipelines for classified and edge environments
Mission-Critical Production Systems
- Deploy AI systems in highly secure environments with resilient agent-to-agent communications
- Create monitoring and observability systems (logging, metrics, tracing) across secure enclaves
- Ensure compliance with ethical AI standards and security-first DevOps practices
- Build geospatial processing, time-series, and intelligence data fusion capabilities
Technical Leadership
- Lead a team of engineers to deliver on high risk / high impact ambiguous technical scope
- Drive technical strategy and architectural decisions across cross-functional teams
- Translate ambiguous requirements and cutting-edge research into actionable technical roadmaps
- Lead design discussions shaping team-wide engineering standards
- Mentor engineering teams and guide junior scientists/engineers
Technical Knowledge, Skills, and Abilities
Platform Architecture & Infrastructure Leadership
- Expert-level proficiency in Python and at least one additional language (Go, C#/.NET, C++) with proven ability to establish infrastructure automation standards, architect scalable tooling platforms, and guide teams in developing sophisticated automation frameworks
- Mastery of Infrastructure as Code principles with deep expertise in Terraform, CloudFormation, Pulumi, or ARM templates and demonstrated ability to design enterprise-wide IaC strategies, module libraries, and governance frameworks that enable consistent and secure infrastructure deployment
- Proven track record of architecting and leading implementation of enterprise-grade CI/CD platforms with ability to define build/release strategies, establish deployment patterns, and drive continuous delivery adoption while designing internal developer platforms that abstract complexity and accelerate team velocity
- Expert proficiency with GitOps methodologies (ArgoCD, Flux), infrastructure testing frameworks (Terratest, InSpec), and policy-as-code (OPA, Sentinel) with strategic application of AI assist tools to drive team productivity, accelerate automation development, and optimize operational efficiency
Cloud Architecture & Orchestration Expertise
- Demonstrated expertise architecting and leading multi-cloud infrastructure strategies across AWS, Azure, and GCP with deep expertise in containerization and Kubernetes ecosystem including production-grade container platforms, custom operators, CRDs, and multi-cluster strategies at organizational scale
- Expert ability to architect sophisticated event-driven systems using cloud-native services (EventBridge, Event Grid, Pub/Sub, SNS/SQS) with advanced knowledge of service mesh architectures (Istio, Linkerd, Consul) and API gateway patterns for zero-trust networking and complex microservice environments
- Mastery of cloud and container networking including CNI design, custom ingress implementations, advanced load balancing, service discovery patterns, and network security policies with ability to troubleshoot complex distributed system networking issues
- Experience architecting edge computing solutions, hybrid cloud strategies, and secure enclave deployments with understanding of data sovereignty, latency optimization, and security requirements for geographically distributed infrastructure
Reliability Engineering & Security Leadership
- Proven ability to architect comprehensive observability platforms integrating metrics (Prometheus, Thanos, Cortex), distributed tracing (Jaeger, Tempo), and logging systems (ELK, Loki, Splunk) with deep expertise in SRE principles including SLO/SLI frameworks, error budgets, and incident management
- Expert implementation of security-first infrastructure including secrets management (Vault, AWS Secrets Manager, Azure Key Vault), automated vulnerability scanning, DevSecOps toolchains, and security policy enforcement across all infrastructure layers
- Strategic capability to design enterprise disaster recovery and business continuity strategies including multi-region architectures, automated backup systems, RPO/RTO optimization, and regular DR testing with advanced chaos engineering practices to systematically improve system resilience
- Deep understanding of compliance frameworks (SOC 2, HIPAA, FedRAMP, PCI-DSS, GDPR) with proven ability to implement automated compliance controls, audit logging, and infrastructure hardening standards that meet regulatory requirements
MLOps & Data Platform Engineering
- Expertise in architecting end-to-end MLOps platforms with proven ability to design and implement model lifecycle management infrastructure including experiment tracking (MLflow, Weights & Biases), model versioning, model registries, feature stores (Feast, Tecton), and automated ML pipeline orchestration supporting continuous training and deployment
- Deep expertise in building infrastructure for ML model serving and deployment including real-time inference APIs, batch prediction systems, A/B testing frameworks, model monitoring for drift detection, and automated model retraining pipelines with canary deployments and rollback capabilities
- Advanced knowledge of distributed ML training infrastructure including multi-GPU and multi-node training orchestration, resource scheduling, and optimization for frameworks like PyTorch, TensorFlow, and JAX on Kubernetes-based platforms (Kubeflow, Ray, Spark ML) with deep understanding of compute resource management and cost optimization
- Proven ability to architect cloud-native data platforms with expertise in ETL/ELT orchestration frameworks (Airflow, Prefect, Dagster, AWS Step Functions), production data storage systems (S3, Redshift, Databricks Delta Lake, PostgreSQL, MongoDB, Snowflake), and distributed data processing frameworks (Spark/Databricks, Kafka, Flink, Ray) supporting petabyte-scale data systems and real-time ML feature pipelines
Technical Leadership & Strategic Impact
- Exceptional problem-solving and troubleshooting abilities with proven track record of resolving complex infrastructure incidents spanning ML pipelines, data platforms, and distributed systems while leading incident response and root cause analysis, combined with outstanding communication skills to translate technical complexity into business impact for executive leadership and stakeholders
- Demonstrated ability to establish infrastructure and MLOps documentation standards, create comprehensive runbooks for ML system operations and DR procedures, develop technical training programs, and build knowledge sharing practices while mentoring and developing platform engineering teams through technical guidance and architecture reviews
- Proven capacity to lead multiple concurrent infrastructure and MLOps initiatives while maintaining production reliability for both traditional applications and ML systems, managing competing priorities, balancing technical debt, and establishing on-call practices, incident response frameworks, and blameless post-mortem processes that drive systemic improvements
- Strategic ability to balance immediate operational needs with long-term infrastructure and MLOps vision, evaluate emerging ML infrastructure technologies, drive platform modernization initiatives including ML democratization, and establish technical roadmaps that enable organizational scaling, AI/ML innovation, and operational excellence
National Interest Project Examples
- Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
- Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
- Applying big data solutions to national security problems [Link]
- Applying image classification for nuclear forensics analysis [Link]
- Develop capabilities for scalable geospatial analytics [Link]
This position is based in Richland, WA or Seattle, WA and requires an onsite presence Monday through Thursday, with Friday as required by business needs.
QualificationsMinimum Qualifications:
- PhD and 3 years of software engineering experience -OR-
- MS/MA or higher and 5 years of software engineering experience -OR-
- BS/BA and 7 years of software engineering experience -OR-
- AA and 16 years of software engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development -OR-
- HS/GED and 18 years of software engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development
Preferred Qualifications:
- Degree in computer science, software engineering, or related field
- Track record of architecting and operating large-scale infrastructure supporting significant user bases, high-volume transaction systems, petabyte-scale data platforms, or production ML systems serving millions of predictions
- Experience building and leading high-performing platform engineering, DevOps, or MLOps teams through hiring, mentoring, technical guidance, and career development
- Experience establishing infrastructure practices, platform strategies, MLOps frameworks, and DevOps transformation initiatives at organizational scale
- Background in mission-critical, regulated, or high-security environments (government, defense, financial services, healthcare) with understanding of compliance requirements for both traditional systems and ML/AI applications
- Demonstrated success leading complex, multi-team infrastructure and MLOps initiatives from architecture through production deployment, operational handoff, and continuous improvement
Hazardous Working Conditions/EnvironmentNot applicable.
Additional InformationThis position requires the ability to obtain and maintain a federal security clearance.
A security clearance background investigation includes review of your employment, education, financial, and criminal history, as well as interviews with you and your personal references, neighbors, and co-workers to determine trustworthiness, reliability, and loyalty to the United States. The investigation also examines your foreign connections, drug and alcohol use, foreign influence, and overall conduct.
Requirements:
- U.S. Citizenship
- Background Investigation: Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements for access to classified matter in accordance with 10 CFR 710, Appendix B.
- Drug Testing: All Security Clearance positions are Testing Designated Positions, which means that the applicant selected for hire is subject to pre-employment drug testing, and post-employment random drug testing. In addition, applicants must be able to demonstrate non-use of illegal drugs, including marijuana, for the 12 consecutive months preceding completion of the requisite Questionnaire for National Security Positions (QNSP).
Note: Applicants will be considered ineligible for security clearance processing by the U.S. Department of Energy if non-use of illegal drugs, including marijuana, for 12 months cannot be demonstrated.eferral Eligible”
Testing Designated PositionThis position is a Testing Designated Position (TDP). The candidate selected for this position will be subject to pre-employment and random drug testing for illegal drugs, including marijuana, consistent with the Controlled Substances Act and the PNNL Workplace Substance Abuse Program.
About PNNLPacific Northwest National Laboratory (PNNL) is a world-class research institution powered by a highly educated, diverse workforce committed to the values of Integrity, Creativity, Collaboration, Impact, and Courage. Every year, scores of dynamic, driven people come to PNNL to work with renowned researchers on meaningful science, innovations and outcomes for the U.S. Department of Energy and other sponsors; here is your chance to be one of them!
At PNNL, you will find an exciting research environment and excellent benefits including health insurance, and flexible work schedules. PNNL is located in eastern Washington State—the dry side of Washington known for its stellar outdoor recreation and affordable cost of living. The Lab’s campus is only a 45-minute flight (or ~3 hour drive) from Seattle or Portland, and is serviced by the convenient PSC airport, connected to 8 major hubs.
Commitment to Excellence and Equal Employment OpportunityOur laboratory is committed to fostering a work environment where all individuals are treated with fairness and respect while solving critical challenges in fundamental sciences, national security, and energy resiliency. We are an Equal Employment Opportunity employer.
Pacific Northwest National Laboratory (PNNL) is an Equal Opportunity Employer. PNNL considers all applicants for employment without regard to race, religion, color, sex, national origin, age, disability, genetic information (including family medical history), protected veteran status, and any other status or characteristic protected by federal, state, and/or local laws.
We are committed to providing reasonable accommodations for individuals with disabilities and disabled veterans in our job application procedures and in employment. If you need assistance or an accommodation due to a disability, contact us at careers@pnnl.gov.
Drug Free WorkplacePNNL is committed to a drug-free workplace supported by Workplace Substance Abuse Program (WSAP) and complies with federal laws prohibiting the possession and use of illegal drugs.
If you are offered employment at PNNL, you must pass a drug test prior to commencing employment. PNNL complies with federal law regarding illegal drug use. Under federal law, marijuana remains an illegal drug. If you test positive for any illegal controlled substance, including marijuana, your offer of employment will be withdrawn.
Security, Credentialing, and Eligibility RequirementsAs a national laboratory, PNNL is responsible for adhering to the Homeland Security Presidential Directive 12 (HSPD-12) and Department of Energy (DOE) Order 473.1A, which require new employees to obtain and maintain a HSPD-12 Personal Identify Verification (PIV) Credential. To obtain this credential, new employees must successfully complete the applicable tier of federal background investigation post hire and receive a favorable federal adjudication. The tier of federal background investigation will be determined by job duties and national security or public trust responsibilities associated with the job. All tiers of investigation include a declaration of illegal drug activities, including use, supply, possession, or manufacture within the last 1 to 7 years (depending on the applicable tier of investigation). Illegal drug activities include marijuana and cannabis derivatives, which are still considered illegal under federal law, regardless of state laws.
For foreign national candidates:
If you have not resided in the U.S. for three consecutive years, you are not eligible for the PIV credential and instead will need to obtain a favorable Local Site Specific Only (LSSO) Federal risk determination to maintain employment. Once you meet the three-year residency requirement thereafter, you will be required to obtain a PIV credential to maintain employment. The tier of federal background investigation required to obtain the PIV credential will be determined by job duties at the time you become eligible for the PIV credential.
Mandatory RequirementsPlease be aware that the Department of Energy (DOE) prohibits DOE employees and contractors from having any affiliation with the foreign government of a country DOE has identified as a “country of risk” without explicit approval by DOE and Battelle. If you are offered a position at PNNL and currently have any affiliation with the government of one of these countries, you will be required to disclose this information and recuse yourself of that affiliation or receive approval from DOE and Battelle prior to your first day of employment.
Rockstar RewardsEmployees and their families are offered medical insurance, dental insurance, vision insurance, robust telehealth care options, several mental health benefits, free wellness coaching, health savings account, flexible spending accounts, basic life insurance, disability insurance*, employee assistance program, business travel insurance, tuition assistance, relocation, backup childcare, legal benefits, supplemental parental bonding leave, surrogacy and adoption assistance, and fertility support. Employees are automatically enrolled in our company-funded pension plan* and may enroll in our 401 (k) savings plan with company match*. Employees may accrue up to 120 vacation hours per year and may receive ten paid holidays per year.
* Research Associates excluded.
**All benefits are dependent upon eligibility.
Click Here For Rockstar Rewards
Notice to ApplicantsPNNL lists the full pay range for the position in the job posting. Starting pay is calculated from the minimum of the pay range and actual placement in the range is determined based on an individual’s relevant job-related skills, qualifications, and experience. This approach is applicable to all positions, with the exception of positions governed by collective bargaining agreements and certain limited-term positions which have specific pay rules.
As part of our commitment to fair compensation practices, we do not ask for or consider current or past salaries in making compensation offers at hire. Instead, our compensation offers are determined by the specific requirements of the position, prevailing market trends, applicable collective bargaining agreements, pay equity for the position type, and individual qualifications and skills relevant to the performance of the position.
Minimum SalaryUSD $161,300.00/Yr.
Maximum SalaryUSD $255,000.00/Yr.