
MIDI (Musical Instrument Digital Interface) is a standard for sending musical instructions between instruments and software. It carries no audio — only data describing which notes to play, when, and how hard. That one distinction is the key to understanding why MIDI is the backbone of modern music production.
MIDI is not sound. A MIDI file or track is a set of instructions — press C3 now, hold it for a beat, at this velocity — that an instrument reads and turns into sound. The same MIDI part can play a piano, a synth pad, or a string section; you just point it at a different instrument.
That's why a MIDI file is tiny compared to an audio file, and why it's so flexible. Recorded audio is fixed the moment you capture it. MIDI stays editable forever: wrong note, change it; want it slower, drag the tempo; wrong sound, swap the instrument. Nothing is re-recorded.
Every MIDI event is a short message. These are the ones you'll touch daily:
| Message | What it carries |
|---|---|
| Note On / Note Off | Which note starts or stops, and when |
| Velocity | How hard the note was struck (0–127) — usually mapped to loudness |
| Pitch Bend | Smooth pitch slides up or down |
| Control Change (CC) | Knobs and pedals: mod wheel, sustain, expression, filter, etc. |
| Program Change | Switches the instrument's preset/patch |
| Clock / Tempo | Sync information that keeps gear locked to the same tempo |
Velocity is the one beginners overlook. It's what makes programmed parts feel human — a drum pattern with every hit at velocity 127 sounds like a machine; vary it and it breathes.
| Aspect | MIDI | Audio |
|---|---|---|
| Contains | Instructions (notes, timing) | Actual recorded sound |
| Editable | Every note, freely | Only by editing the waveform |
| Change instrument | Yes, instantly | No — it's baked in |
| Change key / tempo | Non-destructive | Stretches or degrades the audio |
| File size | Tiny | Large |
Most of a modern production is MIDI driving software instruments. You program drums by drawing or playing hits onto a grid; you write chords and melodies on the piano roll; you perform parts live from a MIDI keyboard or pad controller and then fix the timing and notes afterward. Because it's all data, you can audition a part through ten different instruments in seconds.

Writing chord parts is where MIDI shines — block in a progression, then voice it however you like. If you need a starting point, generate a progression and play it in.
Whether you're writing a MIDI part from scratch or matching one to an existing recording (a sample, an acapella), it has to be in the right key or it'll clash. Detect the key first, then keep your MIDI notes inside it.
From there it's the same flow every time: write the MIDI, pick the sounds, arrange, mix. When the track is finished, send it out as a delivery Room so collaborators can listen, comment, and grab the files in one place.
Hardware connects over the old 5-pin DIN MIDI ports or, far more commonly now, over USB. A single MIDI connection carries 16 independent channels, so one cable can address up to 16 instruments or parts at once. If a controller isn't making sound, it's almost always a channel or routing mismatch, not a broken cable — the controller sends data, but something downstream has to turn it into audio.
Musical Instrument Digital Interface. It's a standard, introduced in 1983, for sending musical performance data between electronic instruments, computers, and software.
No. MIDI is only instructions — which notes to play, when, and how hard. An instrument or software synth reads those instructions and produces the actual sound.
Audio is a recording of actual sound and is fixed once captured. MIDI is editable performance data, so you can change the notes, instrument, key, or tempo at any time without re-recording.
Velocity is how hard a note was played, on a scale of 0–127. It's usually mapped to loudness and is what makes programmed parts sound dynamic and human instead of robotic.
Sixteen channels per MIDI port. Each channel can control a separate instrument or part, so a single connection can address up to 16 sounds at once.