In our previous blog post, I have presented some detail on the process we went through to get OMERO up and running. Of course, that is only a part of the job: after it's up and running, we need to make sure it's working properly, that it can "talk" to existing data, that it's being updated and that people are actually using it. This is still very much a work in progress, we are slowly getting there. In this post, I will just go through some points that are important for a live OMERO install.
- Integration with existing data
Most of the PIs around here are (at least to some degree) committed to adopting OMERO moving forward. Storage for our install is using our brand new, petabyte-scale server. The future is more or less taken care of; but how about the past? People have data in other servers. They have their ways of doing things, and changing those mean a bit of extra work. Inertia is a hell of a thing. The big question becomes: how do we deal with that and get people on board?
There are a couple of avenues we have been exploring. The first one is using an the in-place import feature of OMERO. This allows for existing files to be imported into OMERO without needing to physically upload them to our storage server. In practical terms, it means we can import the data from other storage servers without creating duplicates. People can see and use their older data in OMERO without needing to do any transferring.
There are downsides, of course. The main one is that it either requires "freezing" the imported data on their actual locations (making the files read-only, for example) or regularly checking for changes on the files locations and then deleting and reimporting the changed data. The latter option is not great (deleting and reimporting means we would get rid of annotations, attachments and so on), and the former option limits the data we can actually import to "dead" datasets, i.e. datasets where data will not change or be moved.
Currently, we are not doing any of that. It turns out that our storage server does not want to play nice with OMERO. In-place importing relies on symlinks to point to the existing data, and somewhere in that process we are hitting a snag. We're still working together with IT services to get that one sorted. In the future, our plan is to do such imports on a user-by-user basis, explaining clearly that the imported data will become read-only and that they won't be able to move it any longer.
- Maintaining an OMERO server
This will be a very short section. Other than troubleshooting the eventual issue, maintenance consists basically of updating OMERO itself and the systems on the server machines. The latter is managed by our good friends in IT Services; the former is a fairly painless process on our end. The instructions on the OMERO website are very good, but also very comprehensive. For minor updates, it consists mostly on replacing the binaries and moving over custom scripts.
- There are still issues...
We are still sorting things out. Our main issue for the moment is adoption: our shiny server is not being very heavily used. As I previously mentioned, there are all sorts of explanation for that: inertia is probably the big thing. To deal with that, we're now testing a solution to import data from microscope computers automatically into OMERO (blog post on that soon!). Not only that will mean people don't need to book time in the microscopes only to transfer their data to a storage server, but it will also mean that, since their data is already in OMERO, the entry barrier is much lower.
Another issue is that some of our older microscope computers just cannot do OMERO at all. The OMERO client requires a Java version that just cannot be installed on those computers (I've recently learned what happens when you try to force a Linux system to use a newer glibc version than it's supposed to...). This will most certainly be mitigated by auto importing data into OMERO, so it's not a huge deal.
In general, running an OMERO server has been a pretty smooth process. Other than a few snags (certain file formats tend to crash the server...), our main obstacle is just getting people to use it. It will take time and it might take some work, but I'm convinced we'll get there!
Two of my main tasks as soon as I started this position in September 2017 (and, effectively, CAMDU started existing) were to establish an Electronic Lab Notebook system that could be used by multiple groups and to finally implement OMERO for the Division of Biomedical Sciences. Task 1 was relatively straightforward: two groups were already using Wordpress for their ELNs and there was plenty of expertise around. Task 2 was a completely different beast.
People spoke of OMERO in hushed tones. Multiple people mentioned trying to run an OMERO install and failing at it. No one knew much about it. We didn't have any infrastructure to run it at scale and offer it to the whole division.
We eventually got there. It took much longer, much more effort and some significant support from IT Services, but we have a working OMERO installation available to everyone who decides to use it. Even if it took a lot of work, each individual step was not daunting at all! On this blog post, I will try to walk through the whole process, step by step, detailing and explaining our decisions.
- Provisioning Infrastructure
As mentioned, we did not have any kind of infrastructure that would scale for the possible number of users we might have in the future. Our options were, then, either buying a new machine specifically for the task or talking to IT Services to see what they could offer. We are incredibly lucky for having a great Linux hosting team that provides free CPU, memory and limited storage. It's all based on virtual machines, which is great news when it comes to resiliency (multiple data centres around campus and so on). My experience dealing with them has been fantastic.
After a couple of meetings and a couple of weeks, we were just handed four new shiny VMs running CentOS 7, and already pre-installed with all the software prerequisites. We have decided to separate the web-facing server from the backend, dedicating one machine for each; that would make it easier to adjust necessary resources for each portion of the task at hand. We have also established a test/traning environment and a production environment. So two environments, each with two servers: that's how we used the 4 VMs we received.
Again, I cannot overstate how much the support from IT Services made our lives easier: not only we're using they equipment, but their support for any issues that have arisen over time has been incredible.
The observant reader will have noticed that I mentioned "limited storage" when talking about the resources we were able to obtain. That was absolutely fine for us: it turns out that the storage space for the actual data to be saved in our OMERO install was never an issue. A recent grant from the Wellcome Trust meant that we had just acquired a new petabyte storage array!
- The installation Process
Given that we already had servers with all prerequisites installed and ready to go, this went pretty smoothly. If you are reusing an old machine, I'd definitely recommend wiping it clean and starting from scratch. We had to slightly deviate from the official OMERO installation instructions since we were using two different machines for server and web client, but otherwise it was very by the book. Short description follows:
1) For the OMERO server:
- Download OMERO server (just a simple wget) and decompress it (just a simple unzip)
- Created a symlink to the unzipped folder named "OMERO.server". This makes the server folder path nicer to look at and will help you a lot in the future when updating OMERO versions.
- Basic server configuration: set data directory, database name, user and password, generate basic database and start using it. This followed the install instructions almost exactly, so I won't bother you with the details.
That's it! Start the omero-server service (we're using systemctl) and the server should be running and accepting connections on port 4063.
2) For the OMERO web client:
- Download OMERO server (just a simple wget) and decompress it (just a simple unzip)
- Created a symlink to the unzipped folder named "OMERO.py". This makes the server folder path nicer to look at and will help you a lot in the future when updating OMERO versions. (we're using a different name for this symlink to make sure we're not mixing server and web client up.)
- Create a Python virtual environment and install the web server requirements into it: you can use something like
$ virtualenv /home/user/omerowebvenv --system-site-packages
$ /home/user/omerowebvenv/bin/pip install --upgrade -r /home/user/OMERO.py/share/web/requirements-py27.txt
- Activate virtualenv:
$ source /home/user/omerowebvenv/bin/activate
- Finally, there are just a few configuration steps for the server:
$ OMERO.py/bin/omero config set omero.web.application_server wsgi-tcp
$ OMERO.py/bin/omero web config nginx --http "443" > OMERO.server/nginx.conf.tmp
$ OMERO.py/bin/omero config set omero.web.server_list '[[""]]'
- Now you can start the omero-web service:
$ sudo systemctl start omero-web
- Extra tools
A basic OMERO install has lots of functionality right out of the box. However, there are plenty of interesting extensions and tools out there to complement and enhance what it can do. We installed some of them in our servers: they tend to be very very straightforward to deploy and at least one of them is almost essential, in my view.
OMERO.figure is often described as the "killer app" when it comes to getting people to use OMERO. It is a fantastic tool that is a web-based Illustrator-like interface for creating (you guessed) figures. What makes it really shine, especially when compared to dedicated software like Adobe Illustrator, is the fact that it is, at all times, using raw pixel data in your figures. That means that changing LUTs, turning channels on and off, adjusting brightness and contrast and adding labels based on metadata are straightforward operations. It's a bit hard to convey exactly how incredible OMERO.figure is without showing it, so I'll just embed the demo recorded by the OME team:
Next up, a gentle bump on viewer quality: OMERO.iviewer is not radically different from the default image visualisation built into OMERO, but it has a few nice extra features that make it worth installing: multiple side-by-side viewers, ROI support, rendering settings and so on. Installation is incredibly straightforward and it has given me zero headaches.
Of course, that's great if you want to view a plane at a time, but if you want 3D rendering you're out of luck. That is, unless you also install FPBioImage. It is a Unity-based tool that renders Z-stacks as volumes and allows the user to navigate the space around it using keyboard and mouse. It works surprisingly well and it's pretty robust.
- LDAP integration
So this is where things get complicated, at least to my non-LDAP-knowing mind. Using University-level sign-in information sounded like a great idea, so we decided to go for it.
First things first: we had to talk to IT Services and ask for an LDAP service account, since they do not allow any anonymous queries on that system (with good reason!). They gave me credentials for a service account and I had absolutely no idea what to do with them. The LDAP documentation on the OMERO website is fairly comprehensive, or at least I imagine it is if you know what you're doing.
So I did what I always do if I don't know something: I started poking and prodding at the system, trying to figure out how things work. My best friend in this process was ldapsearch, which is very useful for querying LDAP systems and seeing the results. Enough time staring at incomprehensible strings and eventually I figured out how to configure our server the right way.
It was a long and complicated process: first, we set up a truststore and a keystore for Java to use. Some sample commands that might help:
1) Creating truststore from certificate:
$ cat QuoVadisRootCA2.cer | keytool -import -alias ldap -storepass <PASSWORDHERE> -keystore /home/user/.keystore -noprompt
2) Setting up OMERO to look for the truststore in the right place:
$ OMERO.server/bin/omero config set omero.security.trustStore /home/user/.keystore
$ OMERO.server/bin/omero config set omero.security.trustStorePassword <PASSWORDHERE>
3) Setting up keystore based on certificate:
$ keytool -importcert -file QuoVadisRootCA2.cer -keystore /home/user/.mystore
4) Pointing OMERO to the keystore:
$ OMERO.server/bin/omero config set omero.security.keyStore /home/user/.mystore
$ OMERO.server/bin/omero config set omero.security.keyStorePassword <PASSWORDHERE>
After all that, LDAP configuration was relatively straightforward following the documentation. One last tricky bit: by default, OMERO won't follow LDAP referrals, which might make it not work depending on the way your LDAP system is set up. We needed to run the following command to get it to work:
$ OMERO.server/bin/omero config set omero.ldap.referral ‘follow’
- What next?
Well, that was all it took to get OMERO up and running. In a follow-up post, we will talk about integration, maintenance and the issues we have encountered once everything was operational.
This is the first post on the CAMDU blog! Our aim is to give everyone a small glimpse into our day-to-day work, explain how we did some of the things we did and (hopefully) help people out there who have similar issues.
As this is the first post, it might be worth explaining who we are: CAMDU (Computing and Advanced Microscopy Development Unit) is a small team of dedicated researchers at Warwick Medical School who support microscopy-based research, from acquisition to image analysis and storage. We're home to multiple commercial light microscopes and custom-built systems alongside Wellcome-funded lattice light sheet microscopy and visitor programme (coming soon); computational workstations, software development and petabyte data storage array are also in place.
For our first entry, we have decided to talk about our solution for Electronic Lab Notebooks for multiple labs. It is Wordpress-based: the reasons for picking Wordpress have been detailed by Steve Royle on this blog post. In summary: it's easy to use, free, ubiquitous and takes care of issues like backups and versioning in a nice, transparent way. Also, Steve had already been using it as a solution for his own lab for more than 6 months when we started implementing our solution, so we already knew it worked!
- Our infrastructure
We happened to have a pretty decent server already running VMware ESXi in the building - it's inside our local network, which would make sense from a lab notebook point of view (you're not supposed to take them home anyway). It was super easy to spin up an Ubuntu 16.04.3 virtual machine and start playing with it. Nick, our resident expert in all things everything, handed me that VM and gave me a hand on setting up a local IP and local domain for that machine.
Having the whole install encased on a virtual machine was a great idea for ease of transfer and backup; the physical server running the ELNs is (as I understand) quite old and might just give up the ghost at any time. Our IT Services Linux hosting team also works based on VMs, so our contingency plan has always been telling them "here's the virtual machine backup, can you get us some resources to run this?". Our server is still holding up, though!
So why not go straight to an IT Services-hosted virtual machine from the start? Well, we like having control over our machines, it turns out. Also, having the server in our local network means we have control over who can see what, and what kind of firewall exceptions make or don't make sense, without having to deal with a third party that, as good as it is (and the Linux hosting team at Warwick is fantastic!), would always introduce a bit of delay and extra complications to the process.
- Installing Wordpress
This is the easy part! There are plenty of tutorials out there (I basically followed the one at Wordpress Codex). If you have some familiarity with terminal commands you can probably do it without any issues. I am not particularly competent when it comes to anything web, and I had a server running in about 10 minutes.
If you don't feel particularly confident just going for it, I strongly recommend running a local install before you try it on your server. I followed this tutorial for a local install to make sure I actually knew what I was doing before putting my hands on the actual server!
- Making Wordpress Multisite
A basic wordpress install supports a single site. That's fine if you are establishing an ELN for a single lab, but if you want multiple labs with independent feeds, and you want to keep each individual lab's information contained, then you will need to set up Wordpress to work as a multisite. In this mode, each lab can have its own site. Each site can, then, have its own admin structure, plugins can be activated granularly, you get a lot of flexibility to operate multiple streams of information in parallel.
Again, the Wordpress Codex has an excellent guide to migrate your Wordpress install to a multisite. Dave Mason also has an excellent guide on this process on his blog - it helped me a lot when setting this up! My experience is that making a fresh WP install into multisite is very straightforward, and that I only ran into issues when trying to convert an install that was already being used into a multisite install.
- Site structure and Permissions
So we have a multisite Wordpress installed and ready to go. We have multiple labs who want their own ELN. The obvious choice is the right choice: we just added one site per lab. Each lab member can see everyone's posts on their lab site, but cannot see anything else. Site admins (i.e. PIs or the "tech person" in the lab) have some degree of authonomy over their own site (activating plugins, changing themes, etc).
The biggest debate we've had was regarding super admins. They have permission to do anything on the whole network, and importantly they are the only ones who can add new users to the multisite. The big question was: should PIs be super admins? If the answer is yes, that gives us a lot of flexibility, with different PIs being able to just add new users/researchers to their groups, install new plugins they might require and so on. Of course, the downside is that every PI can do everything, which means that it takes a single super admin to download a malicious plugin and the whole network is infected. We have decided to trust people to do the right thing and gave our PIs super admin permissions.
- Customising things
Luckily, a lot of the customisation work had already been done by Steve on his group's Wordpress install. We are reusing lots of his choices there.
- Theme: we're using Steve's fork of the gitsta theme. It looks super nice and clean!
- Plugins: My Private Site (by David Gewirtz) is absolutely essential if you want to make sites non-public. We are using TinyMCE Advanced (by Andrew Ozz) and WP-Markdown (by Stephen Harris) for extra features when editing posts. To make sure all kinds of data look good, we have Code Prettify (by Kaspars Dambis), Mathjax-LaTeX (by Phillip Lord, Simon Cockell, Paul Schreiber) and TablePress (by Tobias Bäthge). Finally, we have added PDF Embedder (by Dan Lester) Mammoth .docx converter (by Michael Williamson) for importing data from elsewhere.
- Contingency planning: for the moment, we have an automated weekly backup that is encrypted (just in case) and pushed to our IT-services managed storage server, where it's further backed up according to their policies. This is a "manual" backup: we're not using any plugins for that. Both the Wordpress folder and a dump of the MySQL database are included. We've tested restoring an install from these backups and it's a very straightforward, 5-minute process.
- Issues we still want to deal with
- I still don't like having all PIs as super admins. Not that I don't trust them (I do!), but people make mistakes, and limiting permissions to what's necessary and nothing beyond that is always a good idea.
- The virtual machine image is not being backed up. It's not a huge deal since we can restore our install from the backups we currently take, but I'd like to have the extra redundancy there!
- Adoption: this is the hardest challenge we face. Currently, there are only 3 or 4 groups using the ELN solution heavily, while everyone else still relies on their paper notebooks. Even for the groups where adoption is widespread, there's still a lot of resistence to what's seem as "duplicated effort".