Problem diagnosis procedures

The Diagnosis Procedures are the first steps that a System Administrator will take to locate the source of an error in Orion. Once the nature of the error is identified with these tests, the system admin will often have to resort to more concrete and specific testing to pinpoint the exact point of failure and a possible solution. Such specific testing is out of the scope of this section.

Please report any bug or problem with Orion Context Broker by opening and issue in github.com.

Sanity check procedures

The Sanity Check Procedures are the steps that a System Administrator will take to verify that an installation is ready to be tested. This is therefore a preliminary set of tests to ensure that obvious or basic malfunctioning is fixed before proceeding to unit tests, integration tests and user validation.

Checking Orion is up and running

curl localhost:1026/version
<orion>
  <version>0.22.0-next</version>
  <uptime>0 d, 1 h, 34 m, 25 s</uptime>
  <git_hash>6e2aca5ebe287083efa306fc84d213fa24309a63</git_hash>
  <compile_time>Fri Jun 19 10:34:41 CEST 2015</compile_time>
  <compiled_by>fermin</compiled_by>
  <compiled_in>debvm</compiled_in>
</orion>

List of Running Processes

A process named "contextBroker" should be up and running, e.g.:

$ ps ax | grep contextBroker
 8517 ?        Ssl    8:58 /usr/bin/contextBroker -port 1026 -logDir /var/log/contextBroker -pidpath /var/log/contextBroker/contextBroker.pid -dbhost localhost -db orion

Network interfaces Up & Open

Orion Context Broker uses TCP 1026 as default port, although it can be changed using the -port command line option.

Database server

The Orion Context Broker uses a MongoDB database, whose parameters are provided using the command line options dbhost, -dbuser, -dbpwd and -db. Note that -dbuser and -dbpwd are only used if MongoDB runs using authentication, i.e. with --auth.

You can check that the database is working using the mongo console:

mongo <dbhost>/<db>

You can check the different collections used by the broker using the following commands in the mongo console. However, note that the broker creates a collection the first time a document is to be inserted in it, so if it is the first time you run the broker (or if database was cleaned) and the broker hasn't received any request yet no collection exists. Use show collections to get the actual collections list in any given moment.

> db.registrations.count()
> db.entities.count()
> db.csubs.count()
> db.casubs.count()

Diagnose database connection problems

The symptoms of a database connection problem are the following ones:

X@08:04:45 main[313]: MongoDB error

      ...
      <errorCode>
        <code>500</code>
        <reasonPhrase>Database Error</reasonPhrase>
        <details>collection: ... - exception: Null cursor</details>
      </errorCode>
      ...

      ...
      <errorCode>
        <code>500</code>
        <reasonPhrase>Database Error</reasonPhrase>
        <details>collection: ... - exception: socket exception [CONNECT_ERROR] for localhost:27017</details>
      </errorCode>
      ...

      ...
      <errorCode>
        <code>500</code>
        <reasonPhrase>Database Error</reasonPhrase>
        <details>collection: ... - exception: socket exception [FAILED_STATE] for localhost:27017</details
      </errorCode>
      ...

      ...
      <errorCode>
        <code>500</code>
        <reasonPhrase>Database Error</reasonPhrase>
        <details>collection: ... - exception: DBClientBase::findN: transport error: localhost:27017 ns: orion.$cmd query: { .. }</details>
      </errorCode>
      ...

In both cases, check that the connection to MonogDB is correctly configured (in particular, the BROKER_DATABASE_HOST if you are running Orion Context Broker as a service or the "-dbhost" option if you are running it from the command line) and that the mongod/mongos process (depending if you are using sharding or not) is up and running.

If the problem is that MongoDB is down, note that Orion Context Broker is able to reconnect to the database once it gets ready again. In other words, you don't need to restart the broker in order to re-connect to the database.

Diagnose spontaneous binary corruption problems**

The symptoms of this problem are:

The cause of this problem is prelink, a program that modifies binaries to make them starting faster (which is not very useful for binary implementing long running services as contextBroker is) but that is known to be incompatible with some libraries (in particular, it seems to be incompatible with some of the libraries used by Context Broker).

The solution fo this problems is:

-b /usr/bin/contextBroker
yum remove contextBroker
yum install contextBroker