Thoughts on an audit/compliance tool for Apache Sling or CQ5

In many cases we’d like to audit a system like a content management server. Of course, storing it in the system itself has its advantages but for circumstances where data needs to be captured for a long time and be searchable or retrievable long into the future by any forensics team, a more decoupled approach might be appropriate. So here’s what I’ve been thinking:

1. Use something like Apache Servicemix (https://servicemix.apache.org/index.html) or some other ESB to listen for all change events in the server (using either JCR Observation listeners or some Remote Event handlers (see http://r-osgi.sourceforge.net/userguide.html)
2. Capture all pertinent information of the event in a rendering agnostic format (say JSON) and store it in a Write Once Read Many (WORM) device or database
3. Index the data so it can be searchable via some common criteria (full text search, dates, users, etc)

Additions to this could include a crawl of the website (if it is indeed a website) and storing an offline mirror or snapshot of that somewhere. This could take many forms but a simple case would be a zip of the mirrored site which could be retrieved on demand according to date and unzipped into an embedded web server for each viewing. Alternatively, a crawler which creates a PDF or PDF/a would be created of the entire site and stored on the WORM device.

The idea behind this is not to be dependent on the codebase that renders the content because the code will definitely change over time. As well, even the system itself could be different. How many times have we decommissioned vendor products and lost the audit data?

I know we say we want to keep tape backups of CQ5 or Sling instances nightly or weekly and keep them for years and years but it could be that 9 years from the event being investigated, the specific vendor product devs or system admins might be long gone from the company.

Anybody else have any ideas on this? Would love to hear of real world implementations of this type of thing.

– Sarwar Bhuiyan

Advertisements
Thoughts on an audit/compliance tool for Apache Sling or CQ5

2 thoughts on “Thoughts on an audit/compliance tool for Apache Sling or CQ5

  1. Gadi Eichhorn says:

    Good (old) post Sarwar,
    I just found it by mistake looking for OSGi related posts.

    The problem can be solved by externalising the data from the application and storing it in a platform built for the search and the long storage time requirements.

    several benefits are:
    * separation of concerns: the main app doesn’t need to know where and how the data is audited
    * technology: the choice of technology is not limit by the app under audit
    * retention: keep the data as long as you need to, build a BigData solution for your spec

    I would love to hear how you solved this challenge?

  2. Sarwar Bhuiyan says:

    Hello Gadi,
    So interestingly enough I work for Elastic now (the people behind Elasticsearch). Also, if I was to do this now, I’d parallel write everything into a Kafka cluster (via Event Handlers in Sling/AEM) and get different consumers reading from that and store to Elasticsearch/Hadoop/S3 for the various use cases. That way, if they need to be read and processed for any other needs, it can still be done even when rendering code in Sling or CQ5 no longer exists. As long as we have an immutable, time-based indices data model, that’s the way to do it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s