rdb - repairable database systems
TRANSCRIPT
A Portable Implementation Framework for Intrusion-Resilient Database Management Systems
Alexey Smirnov and Tzi-cker ChiuehDepartment of Computer Science
SUNY at Stony Brook
DSN 2004
Outline Motivation System Architecture Transaction Dependency Tracking Database Repair Process Performance Evaluation Summary
Motivation Suppose you are a DBA and you have
just noticed that your database has been compromised 24 hours ago. How would you repair the database?
Currently, the only way to do this is to restore a database backup and manually recommit benign transactions.
Difficulties: (1) how to tell benign transactions from malicious; (2) the amount of data can be huge and the repair process is very error-prone.
Motivation Ideally, an intrusion-resilient DBMS should
be able to Track inter-transaction dependencies; Perform a selective transaction rollback.
We propose implementation framework called RDB that can render an off-the-self DBMS intrusion resilient without modifying its internals. RDB has two components: tracking subsystem which runs at run-time and recovery subsystem which runs offline.
Definition of Transaction Dependency A read set of an SQL statement S is the
set of rows fetched by this statement. We will say that statement S2 depends on
statement S1 if at least one row from the read set of S2 was modified by S1.
We will say that transaction T2 depends on transaction T1 if at least one statement of T2 depends on a statement from T1.
Limitations of Transaction Dependency Model This definition is prone to both false
positives and false negatives. Example of a false positive dependency:
A1 A2 A3
100 3 5
200 2 6
300 1 7
Limitations of Transaction Dependency Model This definition is prone to both false
positives and false negatives. Example of a false positive dependency:
A1 A2 A3
100 5 5
200 5 6
300 1 7
T1: SET A2=5 WHERE A1<250
Limitations of Transaction Dependency Model This definition is prone to both false
positives and false negatives. Example of a false positive dependency:
A1 A2 A3
100 5 5
200 5 6
300 1 7
T1: SET A2=5 WHERE A1<250T2: SELECT A3 WHERE A3>3
Limitation of Transaction Dependency Model Another limitation is that in
general, it is impossible to determine all transaction dependencies by looking at the traffic between a client and the DB server only because part of the logic may be inside the application itself.
How to Track Transaction Dependencies? Such a tracking mechanism should be
able to intercept both read and write actions performed in the database. Possible approaches are:
How to Track Transaction Dependencies? Such a tracking mechanism should be
able to intercept both read and write actions performed in the database. Possible approaches are:
Database log analysis – read actions are not logged;
How to Track Transaction Dependencies? Such a tracking mechanism should be
able to intercept both read and write actions performed in the database. Possible approaches are:
Database triggers – will miss out SELECT statements;
Database log analysis – read actions are not logged;
How to Track Transaction Dependencies? Such a tracking mechanism should be
able to intercept both read and write actions performed in the database. Possible approaches are:
Database triggers – will miss out SELECT statements;
Database log analysis – read actions are not logged;
Tracking proxy – will intercept SQL statements coming from the client to the server;
Transaction Dependency Tracking RDB inserts a proxy JDBC driver between the
DB server and a client that transparently intercepts all queries and results. The proxy can be either on the client side
Transaction Dependency Tracking RDB inserts a proxy JDBC driver between the
DB server and a client that transparently intercepts all queries and results. The proxy can be either on the client side
Transaction Dependency Tracking RDB inserts a proxy JDBC driver between the DB
server and a client that transparently intercepts all queries and results. The proxy can be either on the client side or on the server side.
Transaction Dependency Tracking The following changes are made to the
database at the time of its creation: Table trans_dep(tr_id:INTEGER, dep_tr_ids:VARCHAR) – stores IDs of transactions that depend on transation tr_id;
Table annot(tr_id:INTEGER, descr:VARCHAR) – stores annotation for transaction tr_id;
A new field tr_id is added to each table. It contains the ID of last transaction that modified each row.
The proxy uses its own transaction IDs because there is no standard way to access internal transaction ID.
Transaction Dependency Tracking The JDBC proxy needs to update field tr_id
when the data is modified and to select it when the data is fetched. The proxy rewrites SQL statements coming from the client.
SELECT a FROM t WHERE c
SELECT a, t.tr_id FROM t WHERE c
Transaction Dependency Tracking The JDBC proxy needs to update field tr_id
when the data is modified and to select it when the data is fetched. The proxy rewrites SQL statements coming from the client.
UPDATE t SET a=v WHERE c
UPDATE t SET a=v, tr_id=curTrID WHERE c
Transaction Dependency Tracking The JDBC proxy needs to update field tr_id
when the data is modified and to select it when the data is fetched. The proxy rewrites SQL statements coming from the client.
INSERT INTO t(a) VALUES(v)
INSERT INTO t(a, tr_id) VALUES(v, curTrID)
Transaction Dependency Tracking The JDBC proxy needs to update field tr_id
when the data is modified and to select it when the data is fetched. The proxy rewrites SQL statements coming from the client.
COMMIT
INSERT INTO trans_dep(curTrID,…)
COMMIT
Summary of the Tracking Subsystem Transaction dependency tracking
is implemented as a JDBC proxy driver and is therefore highly portable across different DBMSs.
The proxy uses a lightweight approach aimed at tracking all read actions in a database.
Database Repair Process The database is repaired by committing
compensating transactions. When using RDB, the repair process
consists of: Database log analysis (to reconstruct
complete dependency information and generate compensating transactions);
Dependency graph visualization; Repairing the database by committing
compensating transactions;
Database Log Analysis At repair time, RDB analyses the
database transaction log to build a complete dependency graph and to generate compensating transactions. Different DBMSs provide different facilities for log analysis. We have studied three DB servers: PostgreSQL 7.2.2 Oracle 9.2.0 Sybase ASE 12.5
Database Log Analysis Oracle LogMiner – translates binary log into a
database view that can be queried. It contains the transaction ID, the original SQL statement and a compensating SQL statement.
PostgreSQL – no end-user programs or APIs for log analysis. We have implemented a plugin that provides a LogMiner-kind functionality
Sybase – can provide a dump of its binary transaction log. The format of this dump is partially described in Sybase manuals. We have developed a tool that parses this dump and generates compensating statements.
Dependency Graph Visualization
We used GraphViz (AT&T) The application allows the
user to select an initial set of malicious transactions and computes its transitive closure. Then the result can be refined by the user to build the final set of transactions to be undone.
We are working on a more powerful tool that can discard certain types of dependencies.
Performance Evaluation We used TPC-C benchmark to evaluate
the run-time overhead of JDBC proxy. Test database size ~ 4GB. We varied the following parameters:
Transaction mix (read intensive and read/write intensive);
Connection type (local or over a network); Total footprint size (effect of database
cache);
Performance Evaluation
Overhead is between 6% and 13%.
Performance Evaluation Our interpretation of these results: the
overhead comes mostly from additional writes to the database and transaction log.
Why overhead for read-intensive transactions is less than that for read/write intensive: when there are few dependencies, the number of additional writes is also small.
Why overhead increases when the footprint decreases: because there are fewer disk accesses performed on behalf of the client.
Summary We developed RDB, a portable
framework that can render an off-the-shelf DBMS intrusion resilient without having access to its internals.
The prototype has some limitations: The tracking mechanism is row-based rather
than column-based. This can lead to false dependencies.
No support for stored procedures. Many DBMS vendors provide custom
extensions to SQL. Currently, only part of SQL-92 is supported.