MDDL- Mapping the Market Data Genome with the Market Data Definition Language
The aura of eXtensible Markup Language (XML) has lured members of
SIIA, in conjunction with SIIA's Financial
Information Services Division (FISD), to foster an organization
dedicated to drafting a new data interchange specification
specifically for financial market data. Will this volunteer group
successfully decipher the complex composition of our global markets?
Article by MDDL Technical Committee Co-Chair, James E. Hartley, Reuters, for publication in the October issue of Upgrade
While, perhaps, not as monumental as the Human Genome Project (HGP),
mapping all market data is a ponderous enough task - entailing the
appropriate sequencing of data, vendors, and users - that many would
not consider it possible. Such an attempt is underway. The goal of
Market Data Definition Language (MDDL), is to map all market data
into a common language and syntax to facilitate the interchange and
processing of this complex and voluminous data set. Unlike the
fifteen year planned lifetime of the HGP, however, fewer than
fifteen months have been allotted to define and implement MDDL
within the financial industry.
And so much progress has been made that the MDDL team is poised to
present its "MDDL 1.0" specification at the fifth World
Financial Information Conference (WFIC) in London, England on 02
November 2001. The architects of the standard believe that MDDL will
be sufficiently mature by the WFIC that vendors and users will be
able to begin to implement this data specification within their
products and processing systems.
How has MDDL happened?
Jeremy Sanders, formerly of Merrill Lynch, spearheaded the concept
of a common eXtensible Markup Language (XML) specification for
market data in a convincing presentation during the FISD Meeting of
members on 06 December 2000 in New York City. On 17
January 2001, FISD held an organizational
meeting and accepted the requests of its members to guide the
creation of a new XML specification. Under the wise yet capable
direction of Michael Atkin, Vice President of SIIA and Director of
FISD, the disparate members of the financial markets community have
banded to together as a volunteer team to decode market data into a
common interchange format.
FISD defined an organizational structure for the development of the
specification involving a Technical Committee and Vocabulary
Committee reporting to a Steering Committee so individuals and
responsibilities could be matched consistently with skills and
interests. The Steering Committee, the overall guiding force and
authority for MDDL, is comprised of the co-chairs of the other
committees and several key members active in SIIA and FISD to
balance the technical and business related firms as well as the
market data vendor and user communities.
As the specification became more palpable other XML initiatives in
the financial industry began to take notice. Accordingly, the
Steering Committee recognized the need for formal representation of
the specification and for interaction with other XML specifications
and created the Liaison Committee. This group is building
relationships among both "standards" bodies and the other
XML groups.
The specification has received much attention and support from many
members and is a wonderful example of cooperation, even between
competitors, of forging an alliance to reach a common goal. This
effort will climax on 02 November 2001 with the release of
"MDDL 1.0" at the World
Financial Information Conference in London. This milestone will
mark several industry "firsts" eliciting a resounding
affirmation that the financial market industry has entered the 21st
century.
What is XML anyway?
The eXtensible Markup Language (XML) is a generic mechanism for
representing structured data in textual format. Programs that
produce and use data often store this information in disk files with
a proprietary or binary data format but only programs that
specifically know how to read and process that data format can make
use of the data. XML offers a way to "tag" the data in a
tree structure, using a clearly defined set of rules, allowing any
XML aware computer program to unambiguously process the data and
maintain any relationship between the data. Further, documents
generated using the XML standard can be extended to include new
groups of data without affecting existing applications.
XML documents are in textual format and humans can view them without
any special computer programs. However, the data is often very
complex and the additional XML markup is generally useless to a
human. As such, several supporting technologies have been developed
to assist computers in processing and to aid humans in viewing XML
structured data. For example, XML Stylesheet Language (XSL) allows a
user to define a stylesheet that, when properly processed by a
general XSL transform engine, turns the XML into something more
readily viewable (like HTML).
XML is already accepted as a web standard for information exchange
and, unlike former such efforts, individual organizations have not
made their own derivations. Thus XML is evolving as a true standard.
XML is not new but has been active recently. XML is derived from
Standard Generalized Markup Language (SGML) - an International
Standards Organization (ISO) standard since 1986. XML has been a
World Wide Web Consortium (W3C) standard since February 1998 and, in
May 2001, the W3C unleashed the true flexibility of the standard by
approving its newest form of presenting XML data specifications
called "schemas." The introduction of schemas into XML, as
well as a variety of related XML processing, has enabled XML to gain
even wider acceptance.
What will MDDL provide?
The principal tenet of MDDL is to exchange financial information on
financial instruments. Effectively passing this information between
providers and users implies a common way of defining and
representing the data without ambiguity. As such MDDL provides the
following basic benefits:
A common understanding of market data through the generation of
standard definitions of terms as well as normalized relationships of
various data elements to one another. This "glossary,"
(managed by the MDDL Vocabulary Committee) provides a common data
taxonomy for all market data.
A common format for data interchange so that it may be used by any
processing system from the genesis of the market data through to the
end user thus facilitating the exchange of the data amongst all
interested parties. The MDDL Technical Committee is managing the
specification.
What did Vocabulary do?
The Vocabulary Committee was chartered with the Herculean task of
creating a data dictionary - a glossary of terms - to precisely
define the variety of market data concepts. The intent of the
glossary is to clarify market data for the providers and the users
so interchange of specific data elements could be unambiguously
communicated with precision, transparency and consistency.
The Vocabulary Committee is accomplishing this by first defining
market data products by the instruments they
represent. For example, bonds and common equities are different
products. The categories of products defined by the committee
encompass a wide range of exchange and non-exchange traded
instruments. Each category is broken into groups called domain
further differentiated by class and subclass. Each product has
different terms associated with it as reflected in the
specification.
The various terms associated with market data are called descriptors
with each descriptor given a
unique definition. In most cases, the descriptors require other
modifiers to clarify the content. For example, the price related
descriptors (like "high") require a modifier
"currency" to clearly denote the currency units of the
price. In some cases, a single descriptor may have multiple
definitions and the meaning can be clarified with a controlled
vocabulary in much the same way a dictionary provides enumerated
meanings for ambiguous words.
What did Technical do?
The Technical Committee independently set about determining how XML
would be used to represent the concepts of the Vocabulary Committee
and the basic groups of products (called constructs) - snapshot and
end-of-day - initially targeted. The specification has the
flexibility to add other constructs, products, and properties in
future revisions as well as vendor specific additions to products
and properties.
The descriptors and modifiers defined by the Vocabulary Committee
are collectively called properties and each can hold a single data
type (number, date, string, etc.). The properties are arranged in a
simple hierarchy so it is clear which properties belong to which
product and which properties are used to modify other properties.
The Technical Committee chose to support both the newer XML schema
and the older Document Type Definition (DTD) formats for defining
the XML language used within MDDL. The specification is actually
written in XML schema and then XSL scripts are used to convert the
schema to a DTD. In this way, those organizations familiar with
validating XML with a DTD can adopt MDDL readily and MDDL can still
use the advanced features defined in XML schema in the future.
What makes MDDL viable within data processing systems?
Once an organization understands how to process XML documents, MDDL
is a straightforward extension of that same processing. The
specification includes mechanisms to streamline handling and support
proprietary extensions not covered in the initial specification. In
addition, once MDDL documents are processed from one source this
same processing should apply to MDDL documents from all sources.
Common material need not be included in each instance document: In
large documents, or to make smaller documents, it is possible to
identify groups of common data in a special section of the MDDL
document called "reference". These groups (such as the
formal name and address of a corporation) can be referenced within
the same MDDL document or by another MDDL document externally.
Compression techniques are available to minimize bandwidth and
storage requirements: A variety of general text-based compression
techniques are widely available that can compress XML documents
efficiently. If desired, firms can implement MDDL-specific methods
for greater compression.
Proprietary products and properties can be added: MDDL is extensible
due to the very nature of XML but specific elements have been added
to facilitate the identification of proprietary products and
properties thus facilitating generalized processing. The addition of
proprietary constructs is expected in a later release.
Properties applied to entire data tree: The properties associated
with MDDL items are inherited throughout the hierarchy of products
and properties defined in the specification. It is not necessary to
explicitly define all common properties with every term they modify.
For example, the currency can be defined at the highest level in the
document and the implication is that all other prices within that
hierarchy are quoted in that currency unless specifically stated
otherwise.
What other specifications relate to MDDL?
Other existing XML specifications are related to market data
although none specifically addresses all the data represented in
MDDL. In fact, most other XML activities in the financial space will
use a portion of MDDL data or would like to have a markup for that
data. Of these, the following are most prevalent:
XBRL
- Extensible Business Reporting Language - for transporting
corporate statements and other financial reports.
NewsML
& NITF - defining news industry standards for relaying
stories and supporting elements.
RIXML
- Research Information Exchange Markup Language - defining
investment and financial research.
FpML
- Financial products Markup Language - focusing on over-the-counter
derivatives like interest rate swaps and forward rate agreements
SWIFT
/ ISO15022 - assisting
in definition of electronic messages pertaining to securities and
related financial instruments.
It must be noted that a particular area of interest to each of these
various specifications, as well as MDDL, is the unique
identification of all financial instruments. This is an open
industry wide concern that affects MDDL directly though is not
within the purview of MDDL to solve (however there is another
activity within FISD working on symbology and unique security
identification). MDDL is written with the anticipation that such a
unique identifier mechanism will be made available.
What does all this mean?
MDDL, Market Data Definition Language, has been created to describe
financial market data in a way that facilitates the understanding
and the interchange of this very industry specific data. Financial
industry and technical experts are working together in an open forum
to ensure functionality and content consistent with established and
evolving practices. Developed in a short time frame, the initial
specification focuses on a limited number of instruments but is
poised to encompass the full range of market data products.
From the user firm perspective, the primary goal is to make it
simpler and more cost-effective to integrate and normalize data
delivered from multiple sources -- vendor, third parties and direct
from exchanges -- and to make it easier for the development of
web-based applications. From the vendor and data provider
perspectives, MDDL promotes data linkages by allowing the creation
of relationships between internal and external data. MDDL also
allows for data transparency at a very granular level. As such, it
is a powerful technology for product enhancement and a useful tool
for mass customization of data.
Note that MDDL will likely spawn new processing strategies and
business opportunities for many corporations. As existing vendors
and users incorporate MDDL into their existing systems, many groups
may find it possible to transfer assets from traditional collection
and operations information technology roles to data quality and
assurance positions. The easier transference of data will hopefully
lead to a concentration on quality and thus an overall more
satisfying user experience.
James Hartley
is Catalyst, Office of the Chief Scientist, for Reuters and serves as co-chair of the MDDL Technical and Liaison
Committees.