Project description

Below is the description of my Summer of Code project. Enjoy 😉

Support for generating COM wrappers

Google Summer of Code project

Author: Jan Jeżabek

Abstract

The goal of this project is to enhance SWIG to be able to generate COM wrappers for C/C++ code. COM is a component technology used universally in MS Windows. It is designed to be language neutral and can therefore be used as an interface between different languages – for example a properly implemented COM object can be used from C/C++, Delphi, Visual Basic, all .NET languages, and also – if the IDispatch interface is implemented – from scripting languages like VBScript and JScript. Objects implementing the IDispatch interface are called Automation objects – they can be used for executing automated tasks from MS Office, OpenOffice.org and other programs.

To wrap C/C++ objects and functions three steps are needed:

1. Creation of COM wrapper classes and interfaces. This means wrapping C++ objects inside of objects implementing the IUnknown interface, creating ‘getters’ and ‘setters’ for public member variables, etc. This part also includes the wrapping of global functions and variables; they will be accessible from an object representing the wrapped module – in a similar way that globals are accessible in bindings generated for Java (the moduleJNI class) and C# (modulePINVOKE). One notable difference is the lack of static methods in COM – but this can be easily worked around by letting the object representing the module be a singleton (and even that is not necessary).

2. Creation of an interface definition in MIDL (Microsoft’s version of the Interface Definition Language). This definition contains details about the wrapped classes and interfaces – like the number and names of methods, their parameters, etc.. It is compiled and linked into the component and can afterwards be used to automatically create object definitions for other languages and to generate proxy classes.

3. Implementation of the IDispatch interface. This interface can be used to inspect an object at runtime (somewhat similar to reflection in Java) – to list the supported methods and their parameters, and to execute them. It is used mostly by scripting languages such as VBScript, JScript and VBA.

In addition some ‘boilerplate’ code needs to be created to deal with object creation (a special factory object) and component registration.

The goal is to create all the code that is needed; the resulting code will need to be compiled, linked and registered using regsvr32 and will be ready to use by applications.

Detailed Description

My plan is to start with the Java binding and support only single inheritance in the first iteration. While supporting multiple inheritance is possible there are some major obstacles which will be described later in this document.

Object wrapping

Each COM object needs to implement the IUnknown interface consisting of 3 methods – QueryInterface (basically responsible for type casting), AddRef and Release (used for reference counting). The wrapping is best explained using an example – consider the following three classes:

/* Abstract class */
class A {
  virtual int methodX(int) = 0;
};

/* Concrete class */
class B {
  virtual int methodX(int a) { return 2 * a; }
  virtual int methodY(int a) { return 3 * a; }
};

/* Concrete class */
class C {
  virtual int methodX(int a) { return 4 * a; }
  virtual int methodY(int a) { return 6 * a; }
  virtual int methodZ(int a) { return 8 * a; }
};

Wrapping these classes will result in the creation of 3 interfaces, let’s call them IA, IB and IC (although I don’t plan to add a prefix, at least not by default). Also two concrete wrapper classes will be created (class A is abstract, and therefore will not be wrapped). The class wrapping B will support interfaces IUnknown, IA and IB (and also IDispatch), and C will additionaly support IC. An interface definition for IA might look like this (in C++):

class IA {
  virtual HRESULT QueryInterface(...) = 0;
  virtual ULONG AddRef(void) = 0;
  virtual ULONG Release(void) = 0;
  virtual int methodX(int) = 0;
};

With only single inheritance supported casting to a superclass is trivial – the objects address will be exactly the same. This would be much harder to do using multiple inheritance. In addition there would be some problems with non-virtual inheritance – e.g. consider this example:

  • A – base class
  • B, C – non-virtual subclasses of A
  • D – subclass of both B and C

In this case casting to IA is ambiguous.

Method overloading

Unfortunately AFAIK method overloading is not possible in COM. Therefore the user will either need to %ignore some methods, or they will need to be mangled in some way (e.g. by appending numbers to them). In the latter case it might be possible to add a directive to specify a single method whose name should not be mangled, e.g. a method which is the most generic.

Constructors

The easy case is when there is only one public constructor taking no parameters – in this case it will be called by the factory when creating an object. In other cases – multiple constructors or a constructor taking parameters – a workaround will be needed. My idea is to employ a two step construction process:

  • the factory will create only a wrapper object with no underlying object,
  • the wrapper will support one or multiple Construct methods – one for each public constructor. Their purpose is to create the underlying object using the appropriate constructor.

For example this class:

class A {
  A();
  A(int x);
};

could be wrapped as

class IA {
  virtual HRESULT QueryInterface(...) = 0;
  virtual ULONG AddRef(void) = 0;
  virtual ULONG Release(void) = 0;

  virtual bool Construct_1() = 0;
  virtual bool Construct_2(int) = 0;
};

One bad consequence of this approach is that subclasses will expose the constructors of their parents; this situation is the intended use of the return value – if a Construct method from a parent is called the method will return false, otherwise it will return true.

Constants and variables

Public constants and variables will need to be wrapped by methods. There will be an object representing all the globals in a module (methods and variables). The object factory will make sure that this module is a singleton.

CLSIDs and IIDs

Each class and interface will need its own GUID. The correct approach would be to let the user generate them and make them available using a directive. However we cannot rely on this and need a sensible strategy in the case that the user does not supply the GUIDs. My proposal is to:

  • concatenate the name of the module, the namespace and the wrapped class (if any) and compute its hash (e.g. MD5),
  • use the resulting bits for the recommended procedure for generating GUIDs.

This will make sure that the results of wrapping are deterministic and that software will not break because of a regeneration/recompilation.

Generated files

The wrapping procedure will create two files – a C++ file containing the wrapping code and an IDL file describing the interfaces and classes. The wrapping code will also include ‘boilerplate’ code to deal with object registration and with creating an object factory (unless suppressed by the user). The wrapping classes will also support the IDispatch interface. My goal is for the user to only need to compile their objects, the wrapper code and the IDL, link them together, call regsvr32 to be able to use the component.

Possible future work

COM has been copied almost as often as it has been bashed. For example COM-lookalikes are used in Mozilla (XPCOM), OpenOffice (UNO), KDE (KParts) and to some degree GNOME (Bonobo). The differences are mainly in the way the objects and interfaces are identified. With some work these component systems could be supported by SWIG, making extension of the above applications much simpler.

My background

I have almost 10 years of experience using C++. I have both created and used COM controls – for example I have created a DirectShow filter for a video codec that was part of my MSc thesis. I cannot say that I am an expert wrt. COM, but I understand its design and concepts.

Timeline

This timeline is a little bit coarse, as the task is not easily partitioned into small pieces. Below is what I expect to be working after each major stage of development:

  • ~2-3 days – a separate COM target based on the Java backend. The generated .java file will be a base for the IDL file (as they both basically describe the programming interface), and the .cxx file will contain the COM wrapper
  • 2 weeks – basic IDL generation – C++ classes are transformed into COM classes and interfaces, method arguments and return values are mapped to their COM counterparts
  • 3 weeks – some tricky details are ironed out – globals, overloading, etc. Work on generating the C code starts
  • 5 weeks – C wrapper generation – all the classes and interfaces are declared as C structs, the class wrappers are functional – methods are mapped to corresponding methods of the underlying objects
  • 6 weeks (mid-term) – classes and globals are properly wrapped, IDL code is being generated for them
  • 8 weeks – IDispatch is implemented for each class
  • 10 weeks – the C wrapper now includes code to deal with object creation (the factory class) and component registration. GUIDs are created automatically if the user does not specify her own ones.
  • 11 weeks – the code is tested with different freely available C compilers (MinGW, Digital Mars, OpenWatcom, Borland 5.5, MSVC). The resulting components are tested in VB, Delphi, VBScript, JScript, Python
  • 12 weeks – the code is cleaned up and ready for inclusion in SWIG
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: