In many cases we’d like to audit a system like a content management server. Of course, storing it in the system itself has its advantages but for circumstances where data needs to be captured for a long time and be searchable or retrievable long into the future by any forensics team, a more decoupled approach might be appropriate. So here’s what I’ve been thinking:
1. Use something like Apache Servicemix (https://servicemix.apache.org/index.html) or some other ESB to listen for all change events in the server (using either JCR Observation listeners or some Remote Event handlers (see http://r-osgi.sourceforge.net/userguide.html)
2. Capture all pertinent information of the event in a rendering agnostic format (say JSON) and store it in a Write Once Read Many (WORM) device or database
3. Index the data so it can be searchable via some common criteria (full text search, dates, users, etc)
Additions to this could include a crawl of the website (if it is indeed a website) and storing an offline mirror or snapshot of that somewhere. This could take many forms but a simple case would be a zip of the mirrored site which could be retrieved on demand according to date and unzipped into an embedded web server for each viewing. Alternatively, a crawler which creates a PDF or PDF/a would be created of the entire site and stored on the WORM device.
The idea behind this is not to be dependent on the codebase that renders the content because the code will definitely change over time. As well, even the system itself could be different. How many times have we decommissioned vendor products and lost the audit data?
I know we say we want to keep tape backups of CQ5 or Sling instances nightly or weekly and keep them for years and years but it could be that 9 years from the event being investigated, the specific vendor product devs or system admins might be long gone from the company.
Anybody else have any ideas on this? Would love to hear of real world implementations of this type of thing.
– Sarwar Bhuiyan