Abstract
Fuzzing is a method to try out random input samples to find
bugs and vulnerabilities in the software application, where most software
testing is limited to unit tests and static and dynamic code analysis. In
recent years, after the heartbleed bug was found in a widely used open-source
library, it became evident that secure testing is paramount for any software
application used anywhere. As people are becoming increasingly reliant on the
latest technologies and getting used to working more efficiently than ever, it
becomes a necessity rather than a requirement to validate the software in all
possible scenarios unimagined by the developer or the validation team. Fuzzing
is becoming popular rapidly to secure systems for future attacks and
vulnerabilities due to a lack of proper testing and secure code reviews.
Keywords:
Secure Software, Fuzzing, Software
Testing, Secure Coding, Bug & Vulnerability Detection, System Crash
1. Introduction
The World depends on software now more than ever. As the day
goes by, the complexity of software systems and applications is growing
exponentially. Growing usage of software would increase the threat and put the
systems running this software in harm's way in unimaginable ways. There are
many critical systems running these software applications. Fuzzing is a
technique where vulnerabilities would be found using various methods, such as
malformed or out-of-bound input values. It adds significant value to any software
application used in mission-critical systems. Fuzzing generally operates on
significant, unpredicted, unexpected inputs to make software applications
behave abnormally and either crash or hang the system to identify the weak
entry point for any software application. The most common vulnerabilities in
any software application are memory-related, which can be exploited to gain
unauthorized access to the system and its confidential information.
Borzacchiello et al. 1propose
a novel approximate solver called Fuzzy-Sat for concolic and hybrid fuzzing
engines. Traditional consoles and hybrid fuzzing engines rely on Satisfiability
Modulo Theories (SMT) solvers to explain the symbolic expressions generated
during execution. But, the SMT solvers can be expensive and time-consuming,
especially for complex expressions. Fuzzy-Sat addresses this challenge by
providing an approximate solution that is faster than SMT solvers while still
being able to find bugs effectively. 2Pham,
2023 describes AFLSmart++ as an extension of the AFLSmart grey box fuzzer that
improves its effectiveness and usability. AFLSmart is a structure-aware greybox
fuzzer designed for testing programs that take highly structured inputs, such
as PNG, PDF, and WAV files. AFLSmart++ improves upon AFLSmart in several ways.
First, it introduces new structure-aware mutation operators more effective at
generating valid and interesting test inputs. Second, it presents a composite
input model that allows AFLSmart++ to handle more complex file formats.
The paper outlines the need for fuzzing methods to ensure
the software The development process incorporates these testing methods to
mitigate any potential vulnerabilities in software applications in the future.
In particular, we will detail the various fuzzing methods, including which
method is more effective than others to achieve a higher degree of confidence
in any software application.
2.
Background
This section briefly explains why secure software
development is required for maximum defense against future threats. Many
techniques currently used are inadequate to discover these threats beforehand;
hence, more advanced techniques, such as fuzzing, are required. The most
commonly used techniques are static analysis, dynamic analysis, symbolic
execution, and advanced fuzzing, based on various methods and algorithms.
Static analysis is one of the fastest methods to scan the source code and
sometimes the object code to detect the vulnerability. While it is the quickest
method, sometimes it can be a false positive, but the good thing is that all
the issues can be addressed without running the code, saving the over-detection
time. In contrast, dynamic analysis runs the program and monitors it in
real-time to detect the issues. It offers slower speed but higher accuracy and
fewer false positives for the code. Often, it includes human interfaces, which
require strong technical skills to debug and fix them. Because of human
intervention, it is not easily scalable to perform the analysis on larger
systems. The symbolic execution method maintains the set of constraints for
each execution path used in program inputs. When application execution
interacts with components out of The environment, as explained, there would be
system calls, APIs, or unreliable signals, and false positives would increase
exponentially with the application. It is tricky and time-consuming to scale
this type of threat detection. The last technique is the most advanced out of
all of the others and has its own merits. It secures the application or system
from current and future threats before they can happen.
3.
Fuzzing
Several fuzzers are used based on their purpose and need.
Based on the input, some of them are called generative and mutation fuzzers.
Generative fuzzers are there to generate the inputs that rely on the random
lines of slightly manipulated data. whereas mutation fuzzers take valid inputs
and mutate them to trigger the crash.
Other types of fuzzers are based on the awareness of the
different sets of developers or validators. There are three types of fuzzing in
this approach. If it is black-box fuzzing in this approach, the software would
be viewed as a black box, and testers would not have any idea about the
internal workings of the software; still, inputs are generated randomly to make
it crash or hang. The second one is white box fuzzing, where the tester or
validator is aware of the functionality and inner workings of the software to
generate the input to make the software more reliable using the fuzzing. The
third type is grey box fuzzing, where the approach is a hybrid, and the tester
may have some knowledge about the inner workings, but not all. Hence, it is
more like a hybrid.
Fuzzing is a technique where inputs are provided so that the
software malfunctions/hangs, or crashes in specific scenarios. The analogy
would be throwing whatever input we can to the software entry points and seeing
what makes the software behave abnormally or, better, make it crash. The
software's input would be unexpected data or symbols, out-of-range values, or
something totally out of any expectation input. These inputs would be sent or
bombarded to the software to make it malfunction or crash. fundamentally,
fuzzing mutates and twists the inputs in some peculiar combinations. It is the
most unconventional way to stress-test the software without rules or predefined
values.
The aim of running these sorts of stress testing is to
expose the issues, bugs, crashes, vulnerabilities, security holes, and other
unknown problems otherwise known. It is more like the simulation of real-world
attacks before the software is released, so it is more like precautionary
testing to build the secure software. While it sounds like a can-do-all tool,
it has limitations, such as it won’t guarantee that it will be able to find
100% of the bugs in the software. Also is time-consuming, and it would increase
based on The complexity of the software is exponential, and last but not least,
it would require a secure coding background or expertise to interpret these
results and fix them.
4.
Literature Survey
D. Kuts 3presented
a method for handling symbolic addresses in dynamic symbolic execution.
Symbolic addresses are addresses that are not known at compile time but are
instead determined at runtime. This can make it difficult for dynamic symbolic
execution tools to reason about the control flow of a program, as they need to
know which memory locations are valid. The method uses a combination of SMT
solving and symbolic execution to reason about symbolic addresses. SMT solving
is used to determine the bounds of symbolic addresses, and symbolic execution
is used to explore the possible values of symbolic addresses. This allows the
tool to discover new execution paths that would otherwise be missed. It’s a
promising new approach to handling symbolic addresses in dynamic symbolic
execution. It can improve the tool's accuracy and find new bugs in programs. 2Pham presented AFLSmart++, an extension of the
structure-aware grey box fuzzer AFLSmart. AFLSmart++ improves AFLSmart in two
main aspects: structure-aware low-level mutation and composite input model.
These tools make low-level mutations to data chunks, fundamental building
blocks for input data in any software application. At the same time, a
composite input model provides a structured way for inputs to the software
application. 4present Driller, a
hybrid vulnerability excavation tool that combines fuzzing and selective
symbolic execution to find deeper bugs in software. Traditional fuzzing is good
at exploring large portions of the program's state space, but it can struggle
to find bugs that require satisfying complex path conditions. Symbolic
execution, on the other hand, can be effective at finding such bugs, but it can
be expensive and time-consuming to run on large programs. Driller addresses
this challenge by using fuzzing to identify exciting paths in the program's
state space and then using symbolic execution to generate inputs that satisfy
the complex conditions on those paths. This allows Driller to find bugs that
would be missed by traditional fuzzing or symbolic execution alone. Generally,
a driller is a powerful bug excavation tool that can find bugs rooted deep
within software applications.
C. Zhang et al. 5have presented a method to extract the features
of a specific software application and use them to recommend the methods to
fuzz in the particular application targeted. The critical thing to remember is
before removing the features from any software application, the developer needs
to determine which features are required, and based on the requirement gathered
from the developer or security researcher, the features are selected to select
the appropriate fuzzer for the application6. proposed a malicious code image feature extraction method
based on entropy filtering. Introducing an entropy filter helps identify the
hidden patterns introduced by specific packers and encryptors; therefore, this
method performs better than the previous method. Due to the rise of deep
learning technology and its wide application in the field of vulnerability
detection, more and more feature extraction will choose to use deep neural
networks. 7proposed DL4MD. They
introduced the stack autoencoder (SAE) model into malicious code analysis for
the first time to realize unsupervised malicious code feature extraction. 8Popov used convolutional neural networks (CNN)
to extract code features. Automatic coding technology based on feature learning
and recurrent neural networks based on classifiers is used to realize feature
extraction of malicious code. At the same time, there are some automatic tools to extract the
characteristics of the target program, and most of them use artificial
intelligence technology.
J. Shao et al. 9proposed a reinforcement learning-based fuzzing
approach that can effectively improve the input and achieve higher code
coverage. It would divide the fuzzing process into two stages: the stage where
bytes causing the crash are identified and the The RL-based approach would
apply mutation operators to these bytes to maximize the detection rate10 studied the BUGOSS, a real-world benchmark
for regression bugs found in the OSS-Fuzz. It is designed to be used to evaluate regression fuzzing
techniques. Regression fuzzing techniques are fuzzing techniques that are
specifically designed to find regressions, which are bugs that are introduced
in a new version of a program. The BUGOSS benchmark can evaluate the
effectiveness of regression fuzzing techniques by comparing their ability to
find the bugs represented by the study artifacts11
use reinforcement fuzzing, a
newer approach that uses the learnings from past inputs to learn how to provide
the inputs that would likely find bugs in the software. Reinforcement
fuzzing is effective in finding bugs in a variety of software programs.
However, there are several challenges to using reinforcement fuzzing, including
the fact that it can be difficult to design good rewards and that it can be
computationally expensive to learn policies. The benefits would be more
efficient and faster than other normal fuzzers.
5. Conclusion
In conclusion, many methods exist to identify and secure
vulnerabilities in the code. Fuzzing is one of the most advanced techniques,
with its pitfalls, but it still has more coverage than any other method. It is
also proven to find any issues skipped during the code reviews or the unit
testing of the code base. It also ensures that software works in all possible
attack scenarios that would otherwise be catastrophic. It would be more
effective if it could be used in any continuous integration environment where
all the checked-in code is scanned for potential issues. Along these lines,
there are some fuzzers that are highly advanced in detecting bugs using
reinforcement-based learning to find bugs by predicting potential inputs that
would cause the program to crash faster. So, in final remarks, fuzzing is a way
forward, and using novel approaches surveyed in the paper, it is evident that
fuzzing would essentially become part of the development process at some point.
6. References