Rasa Rasiulytė
AI Evaluation Specialist · Software Engineering · Testing & Governance
Download PDF ↓Summary
Software engineer with 14 years at Microsoft, focused on quality, testing, and reliability across large systems. My background spans both development and test engineering, with deep experience in systematic testing, edge case analysis, and failure modes.
More recently, I've been working on evaluating LLM-generated code and exploring how established quality engineering practices apply to non-deterministic AI systems. I'm particularly interested in how careful evaluation can help build trust in AI-powered products.
Experience
Exploring practical approaches to evaluating AI systems, with an emphasis on code quality, safety, and reliability.
- Evaluating LLM-generated code at Outlier.ai, identifying correctness, safety, and quality issues
- Studying and applying AI evaluation approaches such as LLM-as-Judge, Process Reward Models, and multi-agent frameworks
- Developing familiarity with governance frameworks (NIST AI RMF, EU AI Act, ISO/IEC 42001) and thinking through how high-level requirements translate into testable criteria
- Investigating which traditional QA skills transfer well to evaluating non-deterministic systems, and where new approaches are needed
Xbox Backward Compatibility
Worked on infrastructure to ensure legacy games continued to function correctly on new hardware.
- Built validation pipelines to detect compatibility issues before release
- Collaborated closely with hardware, platform, and testing teams to diagnose and prevent regressions
Office Security
Performed security testing and fuzzing for Microsoft Publisher.
- Designed fuzzing approaches that uncovered buffer overflows and security vulnerabilities
Windows Movie Maker / Expression Encoder
Conducted exploratory testing focused on encoding, graphics compatibility, and edge cases.
- Identified critical rendering and playback issues across a wide range of hardware configurations
Core Areas of Focus
Writing
Notes and short essays I've written while working through questions around AI evaluation and software quality:
- "What I Look For When Evaluating Code" — How I think about code quality and recurring patterns
- "From SDET to AI Evaluator: What Transfers?" — Which testing instincts still apply to AI evaluation, and which don't