Rasa Rasiulytė

AI Evaluation Specialist · Software Engineering · Testing & Governance

Seattle, WA · rasar@hotmail.com · LinkedIn · GitHub · Substack
Download PDF ↓

Summary

Software engineer with 14 years at Microsoft, focused on quality, testing, and reliability across large systems. My background spans both development and test engineering, with deep experience in systematic testing, edge case analysis, and failure modes.

More recently, I've been working on evaluating LLM-generated code and exploring how established quality engineering practices apply to non-deterministic AI systems. I'm particularly interested in how careful evaluation can help build trust in AI-powered products.

Experience

AI Evaluation & Independent Research
2023 – Present

Exploring practical approaches to evaluating AI systems, with an emphasis on code quality, safety, and reliability.

  • Evaluating LLM-generated code at Outlier.ai, identifying correctness, safety, and quality issues
  • Studying and applying AI evaluation approaches such as LLM-as-Judge, Process Reward Models, and multi-agent frameworks
  • Developing familiarity with governance frameworks (NIST AI RMF, EU AI Act, ISO/IEC 42001) and thinking through how high-level requirements translate into testable criteria
  • Investigating which traditional QA skills transfer well to evaluating non-deterministic systems, and where new approaches are needed
Software Development Engineer (SDE)
Microsoft · 2016 – 2018

Xbox Backward Compatibility
Worked on infrastructure to ensure legacy games continued to function correctly on new hardware.

  • Built validation pipelines to detect compatibility issues before release
  • Collaborated closely with hardware, platform, and testing teams to diagnose and prevent regressions
Senior Software Development Engineer in Test (SDET)
Microsoft · 2004 – 2016

Office Security
Performed security testing and fuzzing for Microsoft Publisher.

  • Designed fuzzing approaches that uncovered buffer overflows and security vulnerabilities

Windows Movie Maker / Expression Encoder
Conducted exploratory testing focused on encoding, graphics compatibility, and edge cases.

  • Identified critical rendering and playback issues across a wide range of hardware configurations

Core Areas of Focus

AI Evaluation & LLM Testing Quality Engineering & Test Strategy Systematic and Exploratory Testing Edge Case & Failure Mode Analysis Governance-Aware Evaluation Systems Thinking & Cross-Functional Collaboration Fuzzing and Robustness Testing Tools & Languages: Python, C++, C#, Jupyter

Writing

Notes and short essays I've written while working through questions around AI evaluation and software quality:

Education

Master of Science in Computer Science
Johns Hopkins University · 2023
Bachelor of Applied Science in Application Development
North Seattle College · 2020