Rasa Rasiulytė

AI Evaluation Specialist · Software Engineering · Testing & Governance

Seattle, WA · rasar@hotmail.com · LinkedIn · GitHub · Substack

Summary

Software engineer with 14 years at Microsoft, focused on quality, testing, and reliability across large systems. My background spans both development and test engineering, with deep experience in systematic testing, edge case analysis, and failure modes.

More recently, I've been working on evaluating LLM-generated code and exploring how established quality engineering practices apply to non-deterministic AI systems. I'm particularly interested in how careful evaluation can help build trust in AI-powered products.

Experience

AI Evaluation & Independent Research

2023 – Present

Exploring practical approaches to evaluating AI systems, with an emphasis on code quality, safety, and reliability.

Evaluating LLM-generated code at Outlier.ai, identifying correctness, safety, and quality issues
Studying and applying AI evaluation approaches such as LLM-as-Judge, Process Reward Models, and multi-agent frameworks
Developing familiarity with governance frameworks (NIST AI RMF, EU AI Act, ISO/IEC 42001) and thinking through how high-level requirements translate into testable criteria
Investigating which traditional QA skills transfer well to evaluating non-deterministic systems, and where new approaches are needed

Software Development Engineer (SDE)

Microsoft · 2016 – 2018

Xbox Backward Compatibility
Worked on infrastructure to ensure legacy games continued to function correctly on new hardware.

Built validation pipelines to detect compatibility issues before release
Collaborated closely with hardware, platform, and testing teams to diagnose and prevent regressions

Senior Software Development Engineer in Test (SDET)

Microsoft · 2004 – 2016

Office Security
Performed security testing and fuzzing for Microsoft Publisher.

Designed fuzzing approaches that uncovered buffer overflows and security vulnerabilities

Windows Movie Maker / Expression Encoder
Conducted exploratory testing focused on encoding, graphics compatibility, and edge cases.

Identified critical rendering and playback issues across a wide range of hardware configurations

Core Areas of Focus

AI Evaluation & LLM Testing Quality Engineering & Test Strategy Systematic and Exploratory Testing Edge Case & Failure Mode Analysis Governance-Aware Evaluation Systems Thinking & Cross-Functional Collaboration Fuzzing and Robustness Testing Tools & Languages: Python, C++, C#, Jupyter

Writing

Notes and short essays I've written while working through questions around AI evaluation and software quality:

"What I Look For When Evaluating Code" — How I think about code quality and recurring patterns
"From SDET to AI Evaluator: What Transfers?" — Which testing instincts still apply to AI evaluation, and which don't

Education

Master of Science in Computer Science

Johns Hopkins University · 2023

Bachelor of Applied Science in Application Development

North Seattle College · 2020

Let's connect →