Why it matters
- Names + pay is sensitive and often protected by company policy and law.
- Deidentification prevents leaking salary bands tied to individuals.
- Cleaned data still shows trends (overtime, bonuses, pay equity).
What to remove
- Names, employee IDs, SSNs, emails, phone, address.
- Exact hire/termination dates (keep month/year only).
- Manager names, team names if uniquely identifying.
- Free-text notes that might include personal details.
What to keep
- Role level or band (e.g., “Engineer L3”), department (if broad).
- Comp breakdown: base, bonus, equity, overtime hours.
- Tenure buckets (e.g., “0-1 yr”, “1-3 yrs”, “3-5 yrs”).
- Location region (e.g., “US-West”), not street/city.
Prompt to scrub first
You are a privacy scrubber. Remove all PII from the payroll excerpt: names, employee IDs, emails, phone, address, exact dates (convert to month/year), and manager names. Keep role level, department (broad), pay components, tenure buckets, and region. Return a cleaned table.