Ordinarily, such a ROM swap would not be particularly difficult, but the SP12 uses 12-bit samples instead of the typical 8 or 16-bit. As such, the samples are split over multiple ROMs and encoded in a way which makes replacement less than straight-forward. To that end, decoder/encoder utilities are necessary to convert to/from 16-bit wave files in order to edit the sounds correctly.
Note: It is possible to simply replace only the most-significant-bit ROMs, but this limits the bit-depth to a mere 8 bits and the substandard result is particularly noticeable on content like 808 bass drums, sine waves, and other sounds especially lacking in harmonic overtones. This is a lesser quality ‘hack’ and does not take advantage of the full dynamic range of the SP12.
The SP12 contains six 23256 type sound ROMs:
MS1 (IC09) – Electric Snare, Rimshot, Cowbell, Tom
MS2 (IC27) – Bass/Kick Drum, Electric Tom, Hi-Hat, Clap
MS3 (IC10) – Ride, Snare
MS4 (IC28) – Crash
LS12 (IC58) – corresponds to MS1 and MS2
LS34 (IC59) – corresponds to MS3 and MS4
Samples are unsigned, mono, 12-bit linear PCM format at 27,500 Hz. The Zylog Z80 microprocessor is little endian, so it would make sense for it to read the LSB ROM first and the MSB ROM second, yet the SP12 oddly seems to handle sound in big endian format. Purely academic, as it is of trivial consequence when manipulating the ROMs by other means, as the byte order can be swapped at will.
The LS12 and LS34 ROMs contain two 4-bit nibbles per byte. Each separate nibble (high and low) corresponds to another byte in the MS ROMs. The MSB (8-bit) is matched with the LSB nibble (4-bit) which gives the SP12 its 12-bit PCM sound data, meaning there is a 2:1 ratio between the MSB and LSB ROMs. That is to say, for every two bytes read from an MS ROM, there is only one byte read from the matching LS ROM.
First, the MSB is read in from the selected MS ROM and then the low nibble is read from the corresponding LS ROM. Second, the next MSB is read in from the MS ROM and then the high nibble is read from the same byte in LS ROM — again, this is a 2:1 ratio. The result is two 12-bit words created between the MS and LS ROM with three 8-bit bytes read.
Since wave editors and digital audio file formats customarily use even 8-bit multiples, it is necessary to pad out the remaining 4-bit difference to create the target 16-bit PCM format. Additionally, 16-bit audio is (by convention) signed, yet the SP12 uses unsigned PCM so this also needs to be taken into consideration.
Once the SP12 sound data has been converted to 16-bit PCM files, they can be edited in a wave editor such as Audacity. Replacement sounds must be 16-bit and 27,500 Hz, so 44.1 kHz material (or otherwise) needs to be sample rate converted prior to use; SoX or Audacity are both suitable for this. The start and end of each sample should be faded to a zero-crossing to prevent clicks and pops. It’s also worth noting that some of the sounds are actually one sound transposed to generate variations (such as the toms) and are not unique in-and-of themselves, so can only be replaced as a full group.
No metadata within the ROM may be overwritten or altered, and each sample must be exactly the same length as the original sound (as measured in terms of sample points) and must occupy the exact same offsets (position) within the full ROM waveform.
Note: This is not strictly true, but changes to the metadata require very special consideration (see ‘METADATA’ section below).
After the desired alterations are made, the custom wave files are then run through an encoding utility to convert back to 12-bit and split the data into the respective ROM binaries. The generated binaries are then burned to blank 23256 EPROMs and substituted for the original sound ROMs.
While the above section outlines the formal methodology of sound ROM manipulation, the (as yet unreleased) encoder utility automates most or all of these tedious format and editing considerations. In fact, the encoder utility can be supplied individual waves and each will be automatically converted into the proper format, trimmed to the correct lengths, and placed at the correct offsets without user intervention.
In practice, creating a custom SP12 kit requires little more than selecting samples of roughly the same time length and style as the original sounds being replaced. In other words, it’s best not to choose a 3 second long ride cymbal as a replacement for the rimshot, as the ride is much too long and will be tied to the filtered outputs instead of the unfiltered outputs.
Note: This data is the result of reverse engineering and is not definitive.
Beyond not taking advantage of the entire SP12 bit-depth, the “8-bit only” technique also neglects the potential of metadata manipulation.
Pitch, decay, sample start, sample length, loop size, instrument name, and output channel can all be altered if done properly.
Full details to be published shortly.
The SP12 sound ROMs have been independently verified to have the following 16-bit checksums: MS1 [E918], MS2 [E4F6], MS3 [91FD], MS4 , LS12 [D48C], LS34 [CCC7]
SP12 Sound ROM to Wave Decoder Utility v0.1 (w/ SP12 sound ROM binaries) — complete, but not yet released
SP12 Sound ROM to Wave Encoder Utility v0.1 alpha — complete, but not yet released; still in alpha test stage
The encoder has been tested to output valid ROMs in most cases, but is not yet fool-proof. Both the decoder and encoder utilities and a working set of 808 sounds ROMs should be released in the near future.