Hand coded binary executable programs!
This article will show you how to hand code 'executable' files (*.COM files) at the DOS prompt WITHOUT ANY OTHER 3rd party programming environments, debug tools or hex utilities. All that is required is a PC with MSDOS prompt, the ECHO command and a standard 104 keyboard (with keypad). With this technique you could create very small programs e.g Viruses, Copy/Dir utilities just by typing in the correct opcodes and values in a programatic way!
An understanding/appreciation of x86 assembly language is required since the input is directly related to x86 programming!
These examples were tested in DOS 6.22, DosBox (DOS 5.00), Vista and Win7 command boxes.
About
A few years
ago I found out by accident that if Num Lock was activated and a number between 0 and 257 was typed on the keypad whilst holding the Left Alt key, I could view the ascii character associated with that number at the MSDOS prompt. This trick is well know for accessing special ASCII characters such as the © copyright character which isn't available directly on the keyboard, not so well know is it's ability to create machine instructions and assembly style programming!
Printing ASCII characters to the screen using ALT
*NUMLOCK must be activated, use LALT, and type all numbers on the keypad
To understand what I mean try the
following at the MSDOS prompt while holding the Left Alt (LALT) key down, release the LALT when finished!
The following LALT number shows the single character 'H' at the DOS prompt.
c:\>[LALT-72]
Produces: H
The following LALT numbers show the word 'Hello' at the DOS prompt.
c:\>[LALT-72][LALT-101][LALT-108][LALT-108][LALT-111]
Produces: Hello
Certain ASCII characters for example CR/LF(LALT-13) or
BELL (LALT-7) will cause an ASCII action to happen e.g. newline or produce a beep
sound to the speaker, instead of showing its character
equivalent. It is this capability to access these special characters (all 256 ASCII characters) that intrigued me to the possibility of being able to produce binary files by hand!
ASCII as binary
Further tests showed that when I redirect a list of LALT ASCII codes through ECHO as output files they would be written as binary (hex) instead.
This example creates a binary data file named c:\hello.com containing "Hello" text.
c:\>ECHO [LALT-72][LALT-101][LALT-108][LALT-108][LALT-111] > hello.com
What is happening is that when typing the number (using LALT) on the keypad the ASCII character values are being converted to Hexadecimal (HEX) notation. This is exactly what we need because Hex code is the language used to talk directly to the PC!
COM files, x86 & opcodes
When I tried
running the outputted hello.com file above, DOS crashes with opcode errors
which is understandable since the hello.com file created above only contained simple HEX text values. The 'Hello.com' file above didn't contain any instruction mechanisms or MSDOS functions to interact with the user or system. Unless we add these to the file in a meaningful way we won't be able to do anything useful like printing strings to the screen, displaying graphics or writing / opening files using disk functions.
To produce a binary file that does something useful and executable we are going need a little more understanding about x86 assembler programming, registers and opcodes. Unfortunately this involves a lot or research of the assembly language itself, CPU registers and operations codes. These areas are way too vast to fully grasp in this text but with a lot of perseverance, exploring references at the footer of the page and playing with the examples should give you a good starting point to this exciting and unique way of programming!
Your first fully functional hand coded executable program
Type the following key commands at the DOS prompt remembering to hold Left ALT.
c:\>Echo LALT-178 LALT-36 LALT-180 LALT-2 LALT-205 LALT-33 LALT-205 LALT-32 > $.com
Your prompt should look something similar below when finished. Press enter to build!
c:\>Echo ▓$┤☻═!═ > $.com
Run the file '$.com' and you will see a single dollar ($) character displayed on the screen.
c:\>$.com
$
c:\>
Congratulations! You just created your first hand coded executable file..
$.com - x86 assembly language equivalent
The program you typed above is equivalent to writing thefollowing program into MSDOS debug.
-n $.com
-a
mov dl,24 ; place $ character into register dl
mov ah,2 ; sets the dos interrupt function to Display Output the character in register dl
int 21 ; execute the MSDOS function in register ah
int 20 ; MSDOS interrupt to terminate program and return to prompt,
-rcx 8 ; file length in bytes - number of operations
-w
-q
This program is identical and when run also displays a single '$' character on the screen!
$.com breakdown
We use the MSDOS interrupt function INT21 to display a character onscreen. Its associated sub function number is 2 which is used to 'Displays Output' to the screen.
The INT21 interrupt is a MSDOS function executor. When its called any value in the register ah will be used as INT21 sub function number. When the two are used together in this way a specific MSDOS action will be executed - in this case the Display Output function. This function also expects a value, a single character, to be present in register dl prior to executing. This character will be a single ASCII character (e.g. ASCII 36 = '$') - encoded into HEX (24 = '$'). See ASCII sheet in the reference section for all the code conversions.
Required programming steps
1) The register dl must first contain a valid single ASCII character
2) The number 2 is placed into the ah register, which is a MSDOS INT 21 sub function to display a single character to the screen.
3) Int 21 executes the sub function 2, which reads the ASCII value saved in register dl. This ASCII character is displayed onscreen as part of this function.
4) Int 20 terminates the program (doesn't require any extra sub finctions) after the character is shown and returns input to the DOS prompt!
80x86 Instructions
Intel 80x86 Assembler Instruction Set Opcodes are machine language instructions that specify operations to be performed. There are 256 (100 in HEX) instructions in the 80x86 set and each opcode uses one of these numbers. Several common opcodes are shown below:
Data movement
mov, lea, les, push, etc.
Arithmetic
add, sub, dec, cmp, etc.
Program flow
jmp, call, ret, etc.
A comprehensive MSDOS opcodes table can be found in the references section below.
Opcodes - HEX vs Decimal
The opcode table link is a good source for finding the correct hexadecimal opcode number for each instruction.
The '$.com' program you typed above lists the mov dl, opcode. This instruction can be cross referenced in the opcode table as B2. B2 is the binary coded mov dl instruction in hexadecimal format.
We need to convert this HEX number to decimal to use these instructions in ECHO. This will enable us to hand code the opcode at the dos prompt in native machine language.
Converting hexadecimal numbers to decimal is quite easy. Many sites around the internet give access to conversion tools or you can simply using the Windows Calculator.
A good online converter
http://www.mathsisfun.com/binary-decimal-hexadecimal-converter.html
Detailed example
STEP 1 - plan your program if you don't already know assembly opcode values!
This program prints 'hello' to the screen and is written in MSDOS Debug
-a ;assemble code
0100 jmp 108 ;jump past string location
0102 db "hello$" ;string storage
0108 mov dx,102 ;where in segment the string is located
010B mov ah,09 ;MSDOS output string function
010D int 21 ;execute MSDOS output string above
010F int 20 ;Terminate program
-n hello.com ;create file
-rcx:
20 ;rough estimate of program length
-w ;write binary
STEP 2 below shows all the associated opcodes values in HEX. I used the opcode table in the reference section to find these.
EB 06 ;jmp to memory address
68 ;h
65 ;e
6c ;l
6c ;l
6f ;o
24 ;$ - string termination
BA 02 01 ;mov dx, 102
B4 09 ;mov ah, 09
CD 21 ;int 21
CD 20 ;int 20
STEP 3 - convert all these HEX values to decimal ready for inputting into ECHO or to the CON device (see NULL next section)
*Don't include the comments when inputting!
235 ;jmp opcode
6
104 ;h
101 ;e
108 ;l
108 ;l
111 ;o
36 ;$
186 ;mov dx,
2
1
180 ;mov ah,
9
205 ;int
33
205 ;int
32
Your hand coded input with NUM LOCK activated and LALT will look like:
c:\>echo Ù♠hello$║☻☺┤○═!═ > hello.com
c:\>
When hello.com is executed the output will be:
c:\>hello.com
hello
c:\>
The NULL problem!
As you delve further into learning this technique one glaring problem will eventually arise, how to input NULL characters using ECHO. No matter how hard I tried I couldn't get ECHO to produce the Null code using values from 0-255. After much research I eventually found two other key combinations that create NULLs. A CTRL-@ and LALT- 256 both create NULLS!
If for some reason you can't recreate these key combinations e.g. laptop keypad issues you can circumvent this problem by sending the input to CON instead of ECHO.
c:\> copy CON test.com
CTRL @ ;NULL
LALT 256 ;NULL
LALT 67 ;C
LALT 111 ;o
CTRL Z ;save file and exit copy commands
OUTPUT
The small program above will create a non executable binary file with 2 NULLs and then letters 'Co'.
Conclusion
I confess that I'm not an assembler programmer or guru by any means. This article is just for information and interest about a passing project idea. I had to research x86 assembly, MSDOS debug, online references, ASCII codes, opcodes, HEX etc. before I was capable of even writing the few small programs above to demonstrate my ideas.
If nothing else I hope you find this article enjoyable and ultimately useful. As an interesting way to create binary code without any tools, it is pretty cool but I can't really imagine how this technique would be useful in any real situation. I do understand the many system admins still use MSDOS for invaluable scripting requirements so If you manage to find a clever use for any of these hints or tips please drop a comment!
References
ASCII Table
http://www.asciitable.com/
ASCII Table extended
http://www.commfront.com/images/Extended-ASCII-Table.jpg
MSDOS, BIOS, EMS and Mouse Interrupt lists
http://www.htl-steyr.ac.at/~morg/pcinfo/hardware/interrupts/inte1at0.htm
Intel x86 Assembler Instruction Set Opcode Table
http://sparksandflames.com/files/x86InstructionChart.html
Example assembler program
http://csclab.murraystate.edu/bob.pilgrim/405/x86_assembly.html
the number is between 0 and 255
ReplyDeleteI understand that the ASCII number would normally be written as 'between 0-255' but to create the binary NULL I had to use the extended ASCII number of 256. Also because ECHO can't process the 0 (zero) character either, I chose to write 'between 0-257' (or alternatively 1-256) for this application.
ReplyDelete