Saturday, June 21, 2014

Exposing the Circuit Breaker Pattern as a JMX MXBean

Circuit Breaker Pattern

I first read about the Circuit Breaker pattern in Release It as a way to prevent cascading failures. Martin Fowler has also written about them in CircuitBreaker. The basic idea is that you protect all your operations that are likely to fail or timeout (remote operations) with a circuit breaker that monitors the operation. The breaker is initially in a closed state, but after a failure threshold is reached it opens, or trips. When the breaker is open, no calls are made to the remote resource. This prevents you from having too many callers waiting on a remote resource that is not going to respond. Too many callers waiting and hogging resources could have caused the failure to cascade to their part of the system.

After a period of time the breaker can try the operation again and close, or reset, if successful. It would be nice to be able to view the state and manually open or close the breaker though. This is where JMX comes into the picture.

JMX

Java Management Extensions (JMX) is a Java technology for managing and monitoring resources represented by MBeans. MXBeans are a special type of MBean that is usable by any client. Normally JMX is used in the context of monitoring an application server and the JVM it is running in, but applications are free to provide their own and register them with the application server's MBean server.

Circuit Breaker MXBean

I felt that Release It glossed over how to monitor a circuit breaker. However in the Java world JMX seems like an obvious choice, and I had never written my own JMX MBean, so I wanted to see how it would work.

First you need an interface containing the things that will be exposed through JMX. The getters and setters map to attributes and other methods map to operations.



My implementation of the interface is pretty basic and not well tested, but I think it captures the basic functionality you would want though. Notice it contains an execute method that was not in the interface. I did not think it made sense to expose execute through JMX.



Lastly I created a servlet that registers the MXBean and invokes the execute method with fake tasks.



Controlling the MXBean With JConsole

JConsole is a graphical monitoring tool for JMX resources. You can view and modify the MXBean's attributes and trigger the operations that are exposed. These changes are made without needing to restart the application or application server.


Update May 1, 2015

The e-book Migrating to Cloud-Native Application Architectures also covers circuit breakers and has code examples showing how to add them to a Spring Boot application by using the Netflix OSS Hystrix library. Interestingly, Hystrix also uses another fault-tolerance pattern from Release It called bulkheads by operating each circuit breaker in its own thread pool. This library collects metrics like traffic volume, request rate, latency, etc. and emits them as an event stream which can be aggregated and visualized by other Netflix projects. That's obviously better for a cloud application that what I did above.

Partly Cloudy Distributed Transactions

General Background

Distributed transactions are atomic transactions involving two or more resources, usually residing on separate machines. Each resource is transactional and there is also a transaction manager, like an application server, that manages the global transaction. The resources could be multiple relational databases, JMS queues, JCA resource adapters, or some combination of these.

The four ACID properties still apply to distributed transactions:

1. atomicity - a transaction is all or nothing
2. consistency - any transaction brings the database from a valid state to a valid state
3. isolation - concurrent transactions result in the same system state as transactions executed serially
4. durability - once a transaction has been committed, it will remain so

The terms "2PC" and "XA" seem to sometimes be used interchangeably with "distributed transaction" and I want to note the distinctions here. Two-phase commit (2PC) is a common algorithm for coordinating the participants of a distributed transaction. It ensures that even if part of the system crashes, the distributed transaction can still be committed or rolled back. The XA specification describes the interface between the global transaction manager and the local resource manager. All XA transactions are distributed transactions, but XA supports single-phase commit and two-phase commit.

To enable recovery from crashes and hardware failures, a transaction log containing the transaction history is written by the global transaction manager. When restarting it can use the log to replay in-doubt transactions and bring the system back to a consistent state. An example of an in-doubt transaction would be where the application server crashes after the first phase of the 2PC protocol (prepare) has completed, but before the second phase (commit) has completed. When the application server restarts the transaction can be completed based on the transaction log.

In The Cloud

Distributed transactions do not work well in the cloud for several reasons, some described here. My list is an attempt at summarizing that post with a mix of my own cloud experiences:

1. A node that was part of a transaction might be removed during down-scaling an never reappear.
2. Failures everywhere in the cloud are expected, so in-doubt transactions might be common because of network failures, network latency, or even the transactional resources being unavailable.
3. The application's file system is probably ephemeral so the transaction log needs to be stored in a database. As application instances are updated, if they are completely recreated, it will be difficult to associate a server with its transaction log.

An alternative to distributed transactions could be to make all operations idempotent. Then they could be retried at the application level without any problems. There are also other distributed transaction algorithms besides 2PC that potentially scale better.

Testing Transaction Recovery

It is not trivial to reliably create in-doubt distributed transactions because the transaction API UserTransaction only allows you to trigger the transaction beginning, the commit (2PC), and a rollback. What you really need is to stop the server part way through the commit, which means the hooks to do that will be vendor specific.

I found an easy to use tool called Byteman that runs as a -javaagent and can instrument, or insert extra code into a Java program at run time. If you know a class name and method name, say for the prepare method, you can have Byteman kill the server when the method exits like this.

Saturday, June 14, 2014

Ethical Hacking

At work I have been participating in a cyber security war games event where each week new challenges are posted. They focus on OWASP Top 10 web application security flaws. It has been fun because we get to actually exploit the flaws in an application hosted for the war games, instead of just reading about them or watching a presentation.

I think these types of vulnerabilities are something any web developer should be aware of, even if you do not deal with security on a daily basis, so when you are coding something that could be vulnerable you at least recognize it and can do more research, more testing, ask an expert, etc.

I have needed two tools to complete the challenges. First, I used Firebug to inspect HTML, JavaScript, and cookies. Second, I used the proxy server that is part of Burp Suite. The idea is that your browser sends requests to the proxy and the proxy forwards them on to the real destination, but before the proxy forwards them you have a chance to view/change the request.

Below are my notes plus more research I have done, mostly focusing on how to prevent them in the first place. After all, I am supposed to be learning how to write more secure code not how to be a hacker.

SQL Injection

A SQL Injection attack can occur when a web application accepts user input and uses it as part of a SQL query. If the input is not properly filtered, then the attacker can provide partial SQL instead of valid input and obtain information they do not have access to.

A common example is: ' OR '1'='1

The best way to prevent this type of attack is to use prepared statements combined with the principal of least privilege and white-list input validation.

Broken Authentication And Session Management

Broken authentication and session management can be exploited when, for example, user credentials are not encrypted (encoding is not the same as encryption) in an HTTP request or passwords are not properly hashed in a database or sessions do not timeout. These would lead to an attacker being able to access someone else's account.

There is not a single best practice here as there are many attack vectors, but OWASP provides detailed cheat sheets for authentication and session management that should be read in their entirety.

Cross-site Scripting (XSS)

An XSS attack can occur when a web application accepts user input and includes it in a page that will be seen by other users. If the input is not properly filtered, then the attacker can insert code that will send user credentials to them (like a form submitted to another website).

A common, demonstrative example is: <INPUT TYPE="BUTTON" ONCLICK="alert('XSS')"/>

The best ways to prevent this type of attack are to escape HTML, JavaScript JSON, CSS, and URL inputs.

Insecure Direct Object References

Insecure direct object references can be exploited by an attacker who changes a parameter value that refers to a system object, to a value for a different system object they do not have access to. This might mean changing a parameter in a URL or using the proxy to change data in the HTTP request.

The best way to prevent this type of attack is to use indirect object references and have the application map from the indirect object reference to the actual key. If you must expose direct object references, then verify, for each object reference, that the user is authorized for that object.

Cross-site Request Forgery (CSRF)

A CSRF attack can occur when a user, whom is authenticated with a third site, visits an attacker's site (or a site which was the victim of his XSS attack) and a hidden, forged HTTP request is submitted to the third site. Because the user is authenticated with the third site already, their browser will automatically send their session cookie along with the forged request and it will be indistinguishable from a legitimate request.

CSRF seems to be particularly effective when the user falls victim to the attack just by loading a page. The XSS code might not even be visible to the user. An image source from a URL would do this, or for a HTTP POST a hidden form will achieve the same thing:

<img src="http://example.com/app/transferFunds?amount=1500&destinationAccount=attackersAcct#" width="0" height="0" />

<form name="csrf" action="http://example.com/app/transferFunds" method="post">
<input type='hidden' name='amount' value='1500'>
<input type='hidden' name='destinationAccount' value='attackersAcct#'>
</form>
<script>document.csrf.submit();</script>

Preventing CSRF requires including an unpredictable token with each HTTP request. It must, at least, be unique to a session. It it usually included in a hidden field so it will be included in the body of the HTTP request, thus being less exposed than in the URL itself. A CAPTCHA can also be helpful in determining if a request came from a real user.

Insecure Cryptographic Storage

Like with broken authentication and session management, insecure cryptographic storage is a broad category
of application mistakes that can be exploited. Attackers can, for example, find encryption keys, gain access to a database that automatically decrypts data, or a weak encryption algorithm is used.

To protect against this kind of vulnerability, there are several steps to take:

1. Sensitive data should be encrypted if it is stored long-term, and the key should be stored separately
2. Strong, standard encryption algorithms should be used along with proper key management
3. Passwords should be hashed and a salt should be used
4. Never store unnecessary data
5. Infrastructure credentials like passwords in a config file should be protected with file system permissions

Failure To Restrict URL Access

An application that fails to restrict URL access gives anonymous users access to pages that should be protected. The attacker might guess an admin URL or even change the CSS in their browser to show something that was hidden.

To prevent these types of attacks, the application needs to always verify that the user is authorized and not just hide links and buttons that they are unauthorized for. Authentication and authorization policies can be role-based to minimize the effort of maintaining them. They should be easily configurable and not hard-coded.

Invalid Redirects And Forwards

Invalid redirects and forwards occur when an attacker links to a site's redirect page and specifies the redirect as a malicious site that contains malware or does phishing.

If you cannot avoid using redirects and forwards, then check that the supplied destination parameter is valid and authorized for the user. Destination parameters can also be a value that is mapped to a URL on the server.