Avoiding Injection Attacks
Overview
All injection vulnerabilities stem from the same basic problem - mixing data with logic. Cross-site scripting (XSS) problems occur when untrusted data is echoed back onto a page and interpreted as script. SQL injection occurs when untrusted data is used to construct a SQL query such that the input can modify the original intent of the query. All injection attacks share this same basic nature, and as such, the mitigations revolve around keeping data in its place so that it cannot be interpreted in any other way. Encoding accomplishes this for XSS and parameterized queries accomplishes this for SQL injection.
Terms Defined
Untrusted sources - This is any entity or system not directly controlled by your organization.
Tainted Data - This refers to any data that could include input from untrusted sources. Effectively, this is any data that your application didn’t generate.
Application Data - This is data generated within the application as a result of its internal functioning and does not include any Tainted Data itself.
Stack-Supplied Automatic Encoding - This refers to any mechanism in a stack that supplies you with automatic encoding or sanitization of Tainted Data. An example is bind parameters in a SQL library or the automatic HTML entity encoding provided in ReactJS.
Tainted Data vs Application Data
In JSON, the Application Data frequently appears as the “keys” in the object, whereas the Tainted Data is typically the values.
{ "this-is-application-data": "this-is-Tainted Data", "application-data": [ "Tainted Data1", "Tainted Data2" ] }
From an application perspective, in HTML, Vue, Angular, or ReactJS, variables are likely to be where you find Tainted Data:
<h1>Welcome to the app, {props.user.firstname}</h1>
Tainted Data can be anywhere in a URL. It is frequently found in the parameters or the path.
https://some.site.com/tainted-data-path?tainted=1&tainted=2#tainted
Remember that any user or client-controlled data should be considered tainted until it has been validated. This includes anything from URL parameters to HTTP headers.
What Are These Attacks?
Injection attacks
An injection attack is one wherein an attacker can successfully modify a call out to any system behind the application for any reason. This includes:
SQL injection
NoSQL injection
Arbitrary code injection
LDAP injection
The vectors for this type of attack are:
Unsanitized user input from HTML forms
Unsanitized user input from automated scripts
Bulk data files
The goal of this attack is to:
Harvest unauthorized data from a database
Cause data loss
Inject data which would allow for other attacks, such as XSS
XSS attacks
A XSS (cross-site scripting) attack is one wherein an attacker can inject HTML, JavaScript, or any other type of information into a client-side application.
We commonly see the following vectors for this type of attack:
Unsanitized user input from HTML forms
Unsanitized user input from automated scripts
Unsanitized URL parameters
Unsanitized values in error messages or stack traces
Unsanitized values in
localStorage
or cookies
The goal of this attack is to inject HTML or JavaScript to exploit the presentation layer in the browser. A successful attack could have the impact of:
Execute malware on a user’s computer
Harvesting cookies and/or session data to allow for other attacks
Manipulate the UI to present malicious front-end code that could fool the user into providing sensitive data
Harvesting user data from the front-end layer
The Core of the Problem
The core problem in most injection and XSS attacks is that we sometimes treat all string data as benign. If we can adjust our thinking a little, it will help us to better anticipate these vulnerabilities.
JSON is not string data; it is a packed field that can drive application decision-making.
HTML is not just string data; it is a rich-text document that contains executable code.
A URL is not just a string, it is a packed field containing host names, port numbers, protocol specifications, paths, filenames, and application input parameters.
URLs, JSON, and HTML all have standardized forms of encoding to protect Tainted Data from being mistaken for Application Data, and most vectors for these attacks exploit poor implementations of this encoding.
The solution to the problem is to sanitize and verify your Tainted Data before use or persistence.