Understanding Schemas
A schema tells Petey exactly what data to extract from your documents. It's the most important part of the process.
A schema is a list of fields. Each field has three parts:
Anatomy of a field
Name
Type
Description
Name becomes the column header in your results.
Type tells the model what kind of data to expect.
Description guides the model's interpretation — this is where you give instructions.
Type tells the model what kind of data to expect.
Description guides the model's interpretation — this is where you give instructions.
If you've used ChatGPT or Claude, you already know the basics — a schema is essentially a structured prompt. Instead of writing a paragraph asking an AI to "find the patient's name, age, and gender", you define each piece of data as a field. Petey turns your schema into a prompt behind the scenes.
We'll build a schema for a clinical note step by step. By the end, you'll understand when to use each field type and how descriptions shape your results.
Text fields: copying data
The simplest kind of extraction — find a value and copy it.
A Text field tells the model to extract a string value. For something like a patient's name, the model just finds it in the document and copies it directly.
Field 1
name
Text
Patient name
The description here is simple — "Patient name" — because there's no ambiguity. The model knows exactly what to look for. Not every field needs a complex description.
The result for this field will be something like "Margaret Ellison" — copied straight from the document.
Number fields
Use Number when you want a clean numeric value.
A Number field tells the model to return a numeric value. Even if the document says "34 years old", the model will extract just 34.
Fields so far
name
Text
Patient name
age
Number
Patient age
Number is the right choice here because age is always numeric and we want a clean value for analysis. If a document had ages like "thirty-four" or "3 months", the model will still return a number.
Category fields
Constrain the output to a fixed set of values.
A Category field gives the model a list of allowed values. Instead of free-text, it must pick from your list. This standardizes the output — no "M" vs "Male" vs "male" inconsistencies.
Fields so far
name
Text
Patient name
age
Number
Patient age
gender
Category
Infer from pronouns if not obvious
Allowed values:
Male
Female
Non-binary
Notice the description: "Infer from pronouns if not obvious". This is where descriptions shine. The document might not say "Gender: Female" — but if it uses "she/her" pronouns, the model knows what to do. The description is an instruction, not just a label.
Date fields
Extract dates in a consistent format.
A Date field extracts date values. Documents express dates in all sorts of ways — "March 5, 2024", "3/5/24", "05-MAR-2024". The Date type standardizes them, and you can use the description to specify the format you want.
Fields so far
name
Text
Patient name
age
Number
Patient age
gender
Category
Infer from pronouns if not obvious
visit_date
Date
Date of the visit in YYYY-MM-DD format
The description "Date of the visit in YYYY-MM-DD format" does two things: it tells the model which date to extract (the visit, not the birth date or discharge date), and how to format it. Specifying the format in the description is one of the most common and useful patterns — it ensures every date in your results is consistent regardless of how it appeared in the original document.
Text interpretation
Text fields can also read, interpret, and summarize — not just copy.
We already used a Text field to copy the patient's name. But Text fields are more versatile than that. The clinical note doesn't have a line labeled "Visit Outcome:" — instead, the model needs to read the full narrative and summarize the result. That's still a Text field — the description is what changes the behavior.
Fields so far
name
Text
Patient name
age
Number
Patient age
gender
Category
Infer from pronouns if not obvious
visit_date
Date
Date of the visit in YYYY-MM-DD format
visit_outcome
Text
Outcome of the visit
Compare the two Text fields: "Patient name" means find and copy. "Outcome of the visit" means read, understand, and summarize. Same type, completely different behavior. Think of the description as a mini-prompt — it's your instruction to the AI for that specific field.
The result might be something like "discharged with salbutamol inhaler" — not a direct quote from the document, but a concise summary the model produced by reading the full note.
Additional instructions
Give the model extra context that applies to the whole extraction.
Field descriptions are per-field instructions. But sometimes you need to tell the model something that applies to every field — or to the document as a whole. That's what the Additional Instructions box is for.
Think of it like adding a note at the top of a prompt. For our clinical notes, we might write:
Additional Instructions
These are emergency department clinical notes. Use medical terminology where appropriate. If a field cannot be determined from the text, return null rather than guessing.
This does three things: it tells the model the type of document it's reading (so it knows what context to apply), sets a tone (medical terminology), and establishes a default behavior (null over guessing). These instructions are sent to the model alongside your schema and the document text — it's all part of the same prompt.
Good additional instructions answer questions like: What kind of documents are these? Should the model use specific terminology? What should it do when the answer isn't clear?
Your schema
Here's the complete schema we built, plus a few extra fields.
ed_clinical_note
name
Text
Patient name
age
Number
Patient age
gender
Category
Infer from pronouns if not obvious
visit_date
Date
Date of the visit in YYYY-MM-DD format
presenting_complaint_1
Text
Primary complaint
presenting_complaint_2
Text
Secondary complaint (can be null)
visit_outcome
Text
Outcome of the visit
We added two fields for presenting complaints to round out the schema. Notice that the second complaint says "can be null" in the description — this tells the model it's OK to leave it blank when there's no secondary complaint. You can add as many fields as you need.
Key takeaways:
• Text — free-form values: can copy directly or interpret and summarize
• Number — clean numeric values (ages, amounts, counts)
• Category — pick from a fixed list you define
• Date — date values, use the description to specify format
• Descriptions are mini-prompts — they're your per-field instructions to the AI
• Additional instructions set context for the whole extraction
• You can also use Suggest Fields to have Petey generate a schema automatically
• Text — free-form values: can copy directly or interpret and summarize
• Number — clean numeric values (ages, amounts, counts)
• Category — pick from a fixed list you define
• Date — date values, use the description to specify format
• Descriptions are mini-prompts — they're your per-field instructions to the AI
• Additional instructions set context for the whole extraction
• You can also use Suggest Fields to have Petey generate a schema automatically
The schema is ready to use. Download it and load it in the Extractor, or see it in action in the Getting Started demo.