RE-Google is a plugin for the Interactive DisAssembler (IDA) Pro that
queries Google Code for information about the functions contained in a
disassembled binary. The top results are then displayed as comments to
the function and can be opened by just clicking on it.
The top results will often tell you what to the function is
actually doing or what you will find in the inside.
There is just one script (REgoogle.py), which you have to execute from
within IDA. The standard configuration will enumerate all functions
and set the results.
RE-Google can be configured in the top section of the script. Options
include e.g. the possibility to query just for the current function or
to define certain blacklists that are not to be included in the
results.
Some people say "Reverse Engineering is an art". Well, this might be
true if consider stuff like mathematics as art. It is more an
application of standard methods that evolve constantly. Actually,
everybody can learn these methods and start to RE executables. With
this plugin, even your granny can start reversing :)
Reverse engineering is like solving a jigsaw puzzle. In order to see
the whole picture you need to find the corner pieces, then the frame,
and then work your way forward from there. The corner pieces for
reversing are strings, constants and function names. The function
names that people normally start with are the one's imported from
shared libraries (e.g. Dlls). Strings contain human readable hints
about the functionality. Specific constants add more clues to solve
the puzzle or can sometimes even be used to identify certain (types
of) algorithms. The imported functions tell about the actions
performed by it.
The major problem is that a lot of experience is needed to identify
strings, constants or to know what the combination of imported
functions may result in. But why don't we use the combined knowledge
of many people in order to get this expertise. Google allows to search
for this.
Google code search is very valuable when trying to find algorithms or
code excerpts that contain this information. Often the few results you
see on one page can already tell you what the function might be doing.
This plugin enumerates all functions and extracts strings, constants
(also called immediate values), and the names of imported
functions. If there is sufficient data, a Google Code search is
performed and the result is added to the IDA database as function
comment. Reviewing these comments sometimes turns the analysis of the
considered function unnecessary and saves time.
Example A:
It seems to be very likely that the considered function is SHA-512
based on the results shown above. And it is :)
Example B:
UPX0:0040D7A5 sub_40D7A5 ; src/iexplorer/greta/regexpr2.cpp
UPX0:0040EA6D sub_40EA6D ; src/iexplorer/greta/regexpr2.cpp
UPX0:004102B7 sub_4102B7 ; src/iexplorer/greta/regexpr2.cpp
...
UPX0:0041E163 sub_41E163 ; src/iexplorer/greta/regexpr2.cpp
UPX0:0042183F sub_42183F ; src/iexplorer/greta/regexpr2.cpp
UPX0:0040EE2F sub_40EE2F ; trunk/shareaza/RegExp/regexpr2.cpp
UPX0:0041E945 sub_41E945 ; trunk/shareaza/RegExp/regexpr2.cpp
These functions seem to be part of a library related to regular
expression parsing. Saves some time because those don't have to be
investigated by hand, now.
Example C:
Some functions like the following only get a single result:
; openssl-0.9.8e/crypto/x509v3/v3_alt.c
Wow, perfect hit. And the result is pointing right to the source
code. This will help when investigating related functions.
Example D:
Enough examples... Try it out yourself :)
I get the error message "Too many Google queries in too little time."
Well, you are putting a lot of pressure on the Google services by sending too many queries too quickly. Raise the
AFTER_QUERY_WAIT and get yourself two or three cups of coffee while waiting for the script to finish.
You should also consider only querying for functions you are really interested in by going to that function and using the configuation settings
SEARCH_ALL_FUNCTIONS = False
Where do I find the configuration options?
It is all in the one single file (ugh, bad coding - I know). Look at the top part saying Configuration.
Thanks to Thomas Barabosch, Paul Mueller and Oliver Schmitt for
contributing many good ideas. Special thanks to the
Giraffe chapter of
the
Honeynet Project for the breeding ground.