Example Input vector
Example of input vector for scikit-learn
>seq1
MRQA
HECH
—
Description:
y : labels (for example 0 = H = HELIX ; 1 = E = BETA ; 2 = C = COIL) X : features (one-hot encoding sliding window) One-hot encoding: A -> [1, 0, 0, 0, 0, ...] M -> [0, 1, 0, 0, 0, ...] R -> [0, 0, 1, 0, 0, ...] Q -> [0, 0, 0, 1, 0, ...] L -> [0, 0, 0, 0, 1, ...]
Example:
Window of 3, second position. Sequence: MRQ
X = [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0] y = [1]
Second and third position: MRQ, RQA:
X = np.array([ [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0], [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0]]) y = [1, 2]
Note: this example assumes there are only 5 amino acids. Your vectors should be actually longer.
Further reading: https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/