Back to articles

VCF Columns Explained: How to Read CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, and Sample Fields

A plain-English guide to VCF columns, using one fictional row to explain variant identity, call quality, INFO fields, and cautious ClinVar context.

Start with one fictional VCF row

You opened a VCF and found a chromosome label, a coordinate, an rs-like ID, REF and ALT bases, PASS, and a long INFO field. The first mistake is to search the ID or decide ALT must be the concerning allele. Start by separating what identifies the variant from what describes the call and what may add optional context. This VCF columns explained walkthrough uses one row so each field stays anchored to a concrete example.

Teaching row only:

Header: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample_A

Row: chrDemo1 1234567 rsExample001 G A 48.6 PASS DP=21;AF=0.50 GT:GQ:DP 0/1:42:21

Do not search this as a real variant or a real person. It is deliberately fictional.

VCF is a recognized format for representing genetic variation data. The VCF specification defines metadata lines and fixed data columns, but a structured row is not a medical report. A genome browser or app may display a VCF row neatly; display and lookup are not the same as medical interpretation.

First split the row into three layers

The identity layer is the place to begin. CHROM, POS, REF, and ALT work together to say where the row is and what allele change is represented. ID can help with lookup, but it is not meaning by itself.

The caller or pipeline layer includes QUAL and FILTER, and may include parts of INFO and FORMAT. These fields describe file output, quality, filter status, or sample-value structure. PASS means the row passed the filters recorded for that file; it does not say the variant is important.

The optional context layer may include annotations already present in INFO or added later by another viewer or resource. Keep that layer separate from the raw identity fields.

Annotated field-by-field walkthrough

FieldExample valuePlain-language reading
CHROMchrDemo1The contig or chromosome label used in this file. Check the header and file conventions before comparing labels across files.
POS1234567The coordinate listed for the record. Position alone is not enough to identify the allele change.
IDrsExample001An optional identifier. An rs-like ID can support lookup, but it is not proof of medical importance. A dot may appear when no identifier is listed.
REFGThe reference allele represented in the row. REF does not mean healthy, preferred, or personally typical.
ALTAThe alternate allele relative to REF. ALT does not mean 'bad allele' or built-in effect.
QUAL48.6A quality field associated with the call in the file. It is caller output, not health meaning.
FILTERPASSThe caller's filter result. PASS is file or pipeline status, not interpretation.
INFODP=21;AF=0.50Record-level annotations written as tags. The header should define tags such as DP and AF for this file.
FORMATGT:GQ:DPThe key for the following sample column. It says the sample values appear in GT, then GQ, then DP order.
sample_A0/1:42:21Sample-level genotype-style values. In this row, they line up with FORMAT as GT 0/1, GQ 42, and DP 21, using the file's definitions.

INFO is not the same as the sample column

INFO describes the record as a whole. FORMAT and the sample column describe how sample-specific values are arranged. No command-line knowledge is needed: read FORMAT as the label row for the sample value string.

In the fictional row, GT:GQ:DP matches 0/1:42:21 in the same order. The first value belongs to GT, the second to GQ, and the third to DP. The same short tag can appear in more than one part of a VCF, so placement and header definitions matter.

Read the header before trusting abbreviations

VCF files can include metadata lines before the data rows, and the #CHROM header line names the fixed columns and sample columns. Header metadata can define INFO, FILTER, and FORMAT tags. It may also contain reference, pipeline, or annotation details needed to read the row carefully.

That matters because short labels such as DP, AF, GQ, or a filter name can look more universal than they are. Before using an abbreviation, look for the definition supplied with that file.

Where ClinVar-linked context can enter

ClinVar is a separate public archive of reports about relationships among human variations and phenotypes, with supporting evidence context. A raw VCF row is not the same thing as a ClinVar record.

ClinVar-linked context can enter later: for example, a viewer may show an added annotation, or a separate lookup may compare the variant with public context. Begin with checked identity fields, especially CHROM, POS, REF, and ALT, rather than ID alone. Absence of a visible ClinVar-linked match means that specific context is not shown there; it is not a conclusion that the variant is harmless, irrelevant, or personally meaningful.

Which VCF column should I look at first?

Reader questionLook firstCaution
What chromosome or contig is this on?CHROMLabel style depends on the file and reference context.
What coordinate is listed?POSPosition alone does not identify the allele change.
What changed relative to the reference?REF and ALTRead them together; ALT is not automatically harmful.
Is there a lookup identifier?IDAn identifier is not proof of medical importance.
Did the caller mark it as passing filters?FILTERPASS is pipeline status, not clinical meaning.
How strong was the caller's quality score?QUALQUAL describes the call in the file.
Where are row-level tags?INFOUse the header definitions for each tag.
What do the sample values mean?FORMAT, then sample_AFORMAT gives the order of sample subfields.
Where are abbreviations defined?Header metadataDo not assume tags such as DP, AF, or GQ mean the same thing without context.
Where could public context enter later?Checked CHROM, POS, REF, ALT, then separate resources such as ClinVarA VCF row does not guarantee a public match or medical interpretation.

Common VCF-reading mistakes

The practical mistakes usually come from asking one column to do too much.

  • Reading ID or an rs-like value as proof of importance. ID is a lookup-friendly field, not interpretation.
  • Treating ALT as the harmful allele. ALT is alternate relative to the reference sequence used for the file.
  • Treating PASS as a conclusion. FILTER tells you about recorded caller filters, not health meaning.
  • Assuming no visible ClinVar-linked match means harmless or irrelevant. It only means that specific linked context is not shown.
  • Ignoring the header. Header lines may define the INFO, FILTER, and FORMAT tags used later in the row.
  • Comparing two rows by position alone. A careful comparison checks CHROM, POS, REF, and ALT together.

A cautious next step

After you understand the mechanics, you may want a product-specific workflow for reviewing supported files locally and educationally. For that, see the BioDecode guide. Keep the same boundary in mind: column display, annotation, and lookup are context aids, not diagnosis, treatment guidance, or personal risk prediction.

Next step

See how BioDecode keeps genome analysis on your own machine.

Explore BioDecode

This article is educational and is not medical advice.