Friday, January 25, 2013

Experian for All or Encrypt Your Archives

I have just signed up with Experian to have my credit activity monitored. It was free; free that is to all who submitted a South Carolina state tax return in the last fifteen years. If you are attuned to corporate computer security, you probably heard about how a hacker made off with 3.8 million tax returns from the South Carolina Department of Revenue. Anyway, to make good on their after-the-hack responsibility, SCDOR offered free credit monitoring for a year from Experian, and credit resolution for life. Free to individuals, but we South Carolina taxpayers will still be footing the bill; and what is worse, we may be faced with identity theft and credit harm due to this data loss.

When I was at my dentist recently, he asked if the book I wrote dealt with this, “Expert Oracle and Java Security”. I answered “yes, and it’s too bad Governor Haley didn’t read it before this happened.” Actually the folks who work in SCDOR should have been up on security, not necessarily the Governor; but the Governor has been the spokeswoman for the official response. Unfortunately, she said these things: “The industry standard is most Social Security numbers are not encrypted. A lot of banks don’t encrypt… It’s very complicated. It’s very cumbersome. There’s a lot of numbers involved with it.” So I associate her with the problem.

The only difficulty with encryption is analysis. I believe that even on a system the size of the South Carolina Revenue system, the analysis should only take a few weeks. To you readers, I want to say that you also have no excuse to delay analyzing your system and then encrypt the sensitive data.

After discussing encryption in my book -- encryption over the wire, encryption on disk and hardening the encryption process -- I wrote a supplement that deals specifically with encryption of archived data. I proposed a scenario where a corporate policy required deletion of unneeded sensitive data from archive records, and encryption of any remaining sensitive data. I want to share a simplified version with you now. We will use the sample Sales History (SH) schema (the archive of the sample Order Entry (OE) schema) that comes with Oracle 11g. I will be skipping a lot of analysis and further technical information in this article (like single sign-on, 2-factor authentication, key storage and PL/SQL Wrapping) that you can find in my book. Hopefully, you have already mitigated SQL Injection and applied solid database security, also covered in the book.

To comply with our imaginary corporate data security policy, we will remove the marital status and income level data from our sales history customers archive table. We will retain the customer year of birth data, but we must encrypt it. That was a simple analysis result – your analysis may also be relatively simple. So the difficult part is done! Now the easy part – data expunging and encryption. You can apply these same steps to whatever archive data you are protecting.

First let’s get the ball rolling by enabling the sample schema and account. We also grant the SH user access to the DBMS_CRYPTO PL/SQL package that comes with Oracle 11g.

ALTER USER sh ACCOUNT UNLOCK;
ALTER USER sh IDENTIFIED BY password;
GRANT CREATE SESSION TO sh;
GRANT EXECUTE ON sys.dbms_crypto TO sh;

We will start out with a simple procedure which does Passphrase-Based Encryption (PBE); specifically, using the Advanced Encryption Standard (AES) algorithm. In my book, I also cover Public-Key Encryption (PKE) using the RSA algorithm. These are very easy to implement using Oracle 11g.

As SYS or a DBA user, create the following function in the SH sample schema. This function takes a string and encrypts it with a passphrase which is also passed in. Compare this to the F_MASK() function from “Expert Oracle and Java Security”. There is no rocket science here. The bulk of the code has to do with converting VARCHAR strings to RAW (byte arrays) and handling RAWs, which are needed for encryption. The final call to SYS.DBMS_CRYPTO.ENCRYPT() applies the AES 256-bit algorithm with the specified chaining and padding to the clear data (in RAW form). We use a passphrase and initialization vector (which I’ve assembled as appropriate-length RAWs). The return value is a RAW data value representing the encrypted data.

CREATE OR REPLACE FUNCTION sh.encrypt_data(
    clear_string VARCHAR2,
    pass_phrase  VARCHAR2 )
RETURN RAW
AS
    clear_raw RAW(32767);
    iv        RAW(16);
    pass_key  RAW(32);
    crypt_raw RAW(32767) := NULL;
BEGIN
    -- Cast to RAW
    clear_raw := SYS.UTL_RAW.CAST_TO_RAW(clear_string);
    -- Concatenate VARCHAR, cast to RAW, get substring (16)
    iv := SYS.UTL_RAW.SUBSTR(
        SYS.UTL_RAW.CAST_TO_RAW(pass_phrase||'SufficientLength'), 0, 16 );
    -- Concatenate VARCHAR, get hash (16), concatenate RAW (32)
    pass_key := SYS.UTL_RAW.CAST_TO_RAW(pass_phrase||'SufficientLength');
    pass_key := SYS.DBMS_CRYPTO.HASH( pass_key, SYS.DBMS_CRYPTO.HASH_MD5 );
    pass_key := SYS.UTL_RAW.CONCAT( pass_key, pass_key );
    crypt_raw := SYS.DBMS_CRYPTO.ENCRYPT( clear_raw,
        SYS.DBMS_CRYPTO.ENCRYPT_AES256 + SYS.DBMS_CRYPTO.CHAIN_CBC +
        SYS.DBMS_CRYPTO.PAD_PKCS5, pass_key, iv );
    RETURN crypt_raw;
END encrypt_data;

By the way, until the actual step of encryption (the call to SYS.DBMS_CRYPTO.ENCRYPT()), hardly any computing resources are required. Casting VARCHARs as RAWs and concatenating them or getting substrings are hardly calculations at all. There is one minor calculation of the MD5 hash, but even with that, the compute cost of this function is almost solely the cost of the final encryption effort – no extra overhead. Test the function by logging in as the SH user and select from the function. This query selects the encrypted form of “this” using the passphrase “that”.

SELECT sh.encrypt_data('this','that') FROM DUAL;

I’m not going to spend any more words here describing the encryption. In fact, you don’t really need to know how it works, if you know how to use it and what the results are. Contrary to what Governor Haley said, you can encrypt data without a lot of numbers and complexity.

We are going to clean up the archived customers table, sh.customers. Removing data is the easiest part, and we have already setup the encryption. What we still need is to store the encrypted data; and in encrypted form, the data won’t fit back into the original column. The customer year of birth column is defined as NUMBER(4), and the encrypted form will be RAW(2000). Let’s add a column for the encrypted data:

ALTER TABLE sh.customers ADD cust_yob_encr RAW(2000);

Now we’re set! There are two steps to the clean-up process: 1) encrypt records when they are inserted or updated, and 2) encrypt preexisting records in the archive. We can automatically handle step (1) with an “on insert” trigger. Execute the following as SYS or a DBA user.

CREATE OR REPLACE TRIGGER sh.t_customers_biur BEFORE INSERT OR UPDATE
    ON sh.customers FOR EACH ROW
BEGIN
    :new.cust_marital_status := '';
    :new.cust_income_level := '';
    :new.cust_yob_encr := :old.cust_yob_encr;
    IF( :new.cust_year_of_birth > 0 )
    THEN
        :new.cust_yob_encr := 
            sh.encrypt_data(TO_CHAR(:new.cust_year_of_birth), 
                TO_CHAR(:new.cust_id) || 'PPhrase' );
    END IF;
    :new.cust_year_of_birth := 0;
END;
/

ALTER TRIGGER sh.t_customers_biur ENABLE;

As you can see, whenever a row is inserted or updated, we simply set the marital status and income level to blank; then we don’t allow insert of the encrypted year of birth, so we overwrite what’s submitted with what already exists (:new.cust_yob_encr := :old.cust_yob_encr ); then, if a positive unencrypted year of birth is submitted, we encrypt it to store in our new cust_yob_encr column; and finally we set the unencrypted year of birth to zero. Those secure values will be stored in the database instead of the values that were submitted.

There are two things to note. One is the pass phrase, which we build by concatenating the alphabetic (TO_CHAR()) representation of the customer ID and the string “PPhrase”. The pass phrase can be whatever you want, but once you set it you must not change it; otherwise, you won’t be able to decrypt data that was encrypted with the previous pass phrase. We prepend the pass phrase with the customer ID so that no 2 people with the same birth year will have that same encrypted birth year. Otherwise, while we may not be able to decrypt them, we would still know when people share the same birth year. Why not use a different column to be the pass phrase, like the customer last name? Well, that is not a primary key column (in the oe.customers source table), is not unique, and may change. Again, if the pass phrase changes after we encrypt a value, we will not be able to decrypt it.

The second thing to note is that this trigger is only protected by belonging to the SH schema. It must not be visible or granted to other users or roles. In this way, only the SH user can see the pass phrase, and can encrypt and decrypt the customers data. Yes, there is an analogous decrypt function, shown here:

CREATE OR REPLACE FUNCTION sh.decrypt_data(
    crypt_raw   RAW,
    pass_phrase VARCHAR2 )
RETURN VARCHAR2
AS
    iv           RAW(16);
    pass_key     RAW(32);
    clear_raw    RAW(32767);
    clear_string VARCHAR2(2000) := NULL;
BEGIN
    iv := SYS.UTL_RAW.SUBSTR(
        SYS.UTL_RAW.CAST_TO_RAW(pass_phrase||'SufficientLength'), 0, 16 );
    pass_key := SYS.UTL_RAW.CAST_TO_RAW(pass_phrase||'SufficientLength');
    pass_key := SYS.DBMS_CRYPTO.HASH( pass_key, SYS.DBMS_CRYPTO.HASH_MD5 );
    pass_key := SYS.UTL_RAW.CONCAT( pass_key, pass_key );
    clear_raw := SYS.DBMS_CRYPTO.DECRYPT( crypt_raw,
        SYS.DBMS_CRYPTO.ENCRYPT_AES256 + SYS.DBMS_CRYPTO.CHAIN_CBC +
        SYS.DBMS_CRYPTO.PAD_PKCS5, pass_key, iv );
    clear_string := SYS.UTL_RAW.CAST_TO_VARCHAR2( clear_raw );
    RETURN clear_string;
END decrypt_data;

Nothing new here, except that we call SYS.DBMS_CRYPTO.DECRYPT() instead of ENCRYPT(). You can test both encryption and decryption as the SH user with the following select query. We use the same passphrase (“that”) for both encryption and decryption, so the result should be the same clear text that we started with (“this”).

SELECT sh.decrypt_data(sh.encrypt_data('this','that'),'that') FROM DUAL;

Alright, now that we’ve got everything defined, let’s go ahead and expunge and encrypt the existing sensitive data in our sh.customers archive table. We can do that by applying the trigger to each row by causing an update to each row. Execute the following as the SH user.

UPDATE sh.customers target 
SET target.cust_year_of_birth = (
    SELECT source.cust_year_of_birth
    FROM sh.customers source
    WHERE source.cust_id = target.cust_id    
)
WHERE target.cust_year_of_birth > 0;

COMMIT;

You should only execute this once. It may take a while (about eight seconds on the 55,500 rows in the sample data on my workstation). Also be aware that this, and every transaction on the sh.customers table, places sensitive data into the redo logs, which must be cleaned up separately.

Now you can select records from the sh.customers table and verify that the marital status and income level columns are blank, the year of birth column is zero, and our new cust_yob_encr column contains an encrypted value.

SELECT * FROM sh.customers;

Presumably, since we have retained the customer year of birth, there must be some instance or circumstance in which we need to view it. We can’t simply select it any longer from the sh.customers table, and only the SH user can decrypt the data, so how do we make this data accessible as needed? Well, first we need to do it securely! Whatever method we provide has to be restricted.

I recommend creating a view of the data that includes the decrypted values. Select on this view should ONLY be granted to those who need it, only when they need it, and only for as long as they need it. Access should be revalidated on a periodic basis. Here is a view that you can define to provide future access to the decrypted data:

CREATE OR REPLACE FORCE VIEW sh.v_customers_sensitive
AS SELECT
    cust_id, cust_first_name, cust_last_name, cust_gender,
    TO_NUMBER( sh.decrypt_data( cust_yob_encr, TO_CHAR(cust_id)|| 'PPhrase' )
        ) cust_year_of_birth,
    cust_marital_status, cust_street_address, cust_postal_code, cust_city,
    cust_city_id, cust_state_province, cust_state_province_id, country_id,
    cust_main_phone_number, cust_income_level, cust_credit_limit, cust_email,
    cust_total, cust_total_id, cust_src_id, cust_eff_from, cust_eff_to, cust_valid
FROM sh.customers;

You can select from this view to see the decrypted birth year.

SELECT * FROM sh.v_customers_sensitive;

Note that the pass phrase, “TO_CHAR(cust_id)|| 'PPhrase'” that we pass to decrypt_data() in this view is the same as what we passed to encrypt_data() in the trigger. One thing that I would like to point out with this approach is that the passphrase is kept separate from the data. Presumably, a backup of the data or even a hacker dump of the table, will not carry with it the trigger and view definition. In that case, the stolen data is encrypted and cannot be read – the thief does not have the pass phrase.

So we learned a simple method to encrypt data using inherent capabilities of Oracle 11g. We learned how to remove sensitive data from our archives and how to encrypt any sensitive data that we retain. We also learned how to view the decrypted sensitive data securely, as needed. If we have also learned about the need for data protection from the SCDOR hack, and done the analysis, and taken corrective actions, then we may have actually learned our lesson.