SOAP versus REST

Copyright 2019. Graham Berrisford. One of about 300 papers at http://avancier.website. Last updated 20/01/2019 11:46

 

This paper was composed from notes given to me by students on our architecture classes.

It serves as footnotes to two other popular papers:

·         Microservices (downloaded > 10,000 times a year).

·         SOA and the Bezos mandate.

 

RPC evolved, first using Object Request Brokers, and then using SOAP and REST principles.

This paper compares SOAP and REST.

Contents

Preface and recap. 2

A conceptual difference. 2

SOAP. 3

Defining an interface using the Web Service Definition Language. 3

What is SOAP for?. 4

Related standards. 4

Service registries and catalogues. 4

REST (Representational State Transfer) 4

Uniform interface. 5

Link resources using URIs. 5

Use only the operations in a standard web protocol 6

Flexibility of representation (data/media type) 7

Communicate statelessly. 8

Architectural properties of REST. 8

SOAP or REST?. 8

Idempotence in REST. 10

Compensating transactions. 10

Questions. 11

How to convert from SOAP to REST?. 11

If the SOAP / REST choice doesn’t matter, then what does?. 11

What is "pure REST"?. 11

Links and frameworks. 12

Links. 12

Frameworks. 12

 

Preface and recap

"The beginning of wisdom for a computer programmer is to recognise the difference between getting a program to work and getting it right" M.A. Jackson (1975).

 

What makes a software architecture good or right?

Traditionally, it means meets requirements more elegantly and economically than alternative designs.

It also means enabling change, which is sometimes contrary to the above.

 

We can easily agree a software component should be encapsulated

It should be defined primarily by its input/output interface, by the discrete events it can process and services it can offer.

But that doesn’t get us very far, because we have to make a series of modular design decisions.

·         What is the right size and scope of a component?

·         How to avoid or minimise duplication between components?

·         How to separate or distribute components?

·         How integrate components?

 

Since the 1970s, the IT industry has continually revisited modular design and integration concepts and principles.

Many architectural styles or patterns have been promoted; e.g. distributed objects (DO), service-oriented architecture (SOA) and REST.

Each architectural style is defined by some core ideas, presumptions and constraints.

At the turn of the last century, SOA was a reaction against the constraints of distributed objects using object request brokers like DCOM.

SOA evangelists advocated a more loosely-coupled kind of modular design and integration style, with opposing features.

 

Early distributed objects presumptions

Later SOA design presumptions

Object identifiers

Internet domain names or URIs

One name space

Several name spaces

Stateful server objects

Stateless server components

Reuse by inheritance

Reuse by delegation

Intelligent domain objects

Intelligent process controllers

Request-reply invocations

Message/event queues

Blocking servers

Non-blocking servers

 

REST adopts the features listed under SOA in the table above, and adds some more.

A conceptual difference

In SOA, a service is a component that can be called remotely, across a network, at an endpoint.

A component may act as a client/consumer and/or server/provider of data.

Components typically exchange request and response data in the form of self-describing documents.

Components make few if any assumptions about the technological features of other components.

 

SOAP is a technology standard for client components making remote procedure calls to server components.

It presumes that clients

·         send data in XML documents

·         invoke operations in WSDL-defined interfaces

·         use the protocol called SOAP over standard internet protocols.

                                                                                                                     

SOAP is used to invoke operations on large application components (using many verbs and a few nouns).

Those server-side components may call other components, and so on.

 

REST is better seen as a theory that happens to be associated with web technology standards.

It feels less like asking a servant to do something for you and more like asking a servant to act on a remote resource.

A resource is anything that can be given a domain name, can be identified by one or more Uniform Resource Identifiers (URIs).

REST is primarily a set of principles for using web standards to invoke operations acting on remote resources. 

It also encourages you to rethink how you structure server-side components/resources.

SOAP

By the end of the 1990s, loose-coupling had become the mantra of software architects.

Microsoft deprecated the constraints of connecting distributed objects using object request brokers like DCOM.

Instead, they advocated service-oriented architecture (SOA) as a more loosely-coupled kind of modular design and integration style.

Their core ideas might be distilled as:

·         Clients send data in XML documents

·         Client invoke operations they find in WSDL-defined interfaces

·         Clients use the protocol called SOAP over standard internet protocols, usually HTTP.

 

People observed that SOAP was not simple and not object-oriented.

The SOAP standard is now maintained by the XML Protocol Working Group. 

And SOAP now means nothing, it is merely a name.

Defining an interface using the Web Service Definition Language

At the same time, Microsoft introduced an interface definition language called WSDL.

A WSDL-defined interface has two parts.

Logical or abstract part:

·         The data types used in request and reply messages, defined in an XML schema

·         The signatures of operations (procedures or methods), each composed of name, request and reply messages

Physical or concrete part

·         The end point addresses (URIs) where the operations can be found

·         The protocols used to access the operations at those addresses.

What is SOAP for?

SOAP defines how to code a remote procedure call using an XML message. 

It assumes client components access remote server components by sending XML messages using internet protocols.

This table shows the format of a remote procedure call in SOAP.

 

Element

Description

Required    

Envelope

Identifies the XML document carried in the message. 

Yes   

Header

Contains header information

No    

Body

Contains call and response information

Yes   

Fault

Provides information about errors that occurred while processing the message

No

 

Typically, the internet protocol is HTTP, but you can instead use SMTP or JMS.

A benefit of using HTTP is that it allows SOAP to tunnel through firewalls and proxies encapsulated in the HTTP traffic.

Related standards

SOAP is sometimes referred to as WS-SOAP, where WS = Web Services.

Related standards define extensions for security, service location and reliable messaging.

WS-* is SOAP with an extension, such as WS-Security or WS-ReliableMessaging.

Service registries and catalogues

Microsoft originally proposed listing services in a registry called UDDI along with links to WSDL’s and other metadata. 

The idea was that clients can find services in the registry and get more information about them from a service catalogue.

In practice UDDI use is not widely used - people just use the web - or an internal Wiki to find what they want.

REST (Representational State Transfer)

Roy T. Fielding, in his Ph.D. thesis, formalised the ideas behind web protocols and invented the term REST.

Representational State Transfer (REST) means a server component represents its state in messages using internet-friendly text data formats, which can include hyperlinks.

 

REST supports the general principles of SOA with more specific guidance.

It is a set of principles for using web standards to invoke operations acting on remote resources. 

It suggests ways modularise and integrate application components using web standards.

It encourages you to rethink how you structure server-side components/resources.

It takes advantage of hyperlinks and the web protocols used over ubiquitous TCP/IP networks.

 

REST feels less like asking a servant to do something for you and more like asking a servant to act on a remote resource.

A resource is anything that can be given a domain name, can be identified by one or more Uniform Resource Identifiers (URIs).

·         a document or image,

·         a temporal service (e.g. "today's weather in Los Angeles")

·         a collection of other resources

·         a chunk of related information, such as a user profile

·         a collection of updates (activities)

·         a global user ID (GUID)

·         a non-virtual object (e.g. a person), and so on.

 

Designers commonly implement REST using the HTTP and URIs.

But REST is an abstract style that can be implemented using other technologies and in various ways.

 

A so-called RESTful architecture contains RESTful client components.

Every resource of a software application (Web Service, web site, HTML page, XML document, printer, other physical device, etc.) is named as a distinct web resource.

A client component can only call a server component/resource using the operations available in a standard protocol - usually HTTP.

This decouples distributed components; it means a client component needs minimal information about the server resource.

 

A so-called REST-compliant architecture contains REST-compliant server components/resources.

A REST-compliant server component/resource can offer only the services named in an internet protocol.

Given there are fewer operations (verbs) per component, there must be more components (nouns).

One may have to divide a large data resource into many smallish elements (many nouns).

Clients must orchestrate or integrate those smallish resources.

 

This section continues with a brief discussion of REST principles, and a link to a further discussion.

Uniform interface

Identification of resources

·         Requests identify resources, usually using URIs

·         A server represents its state using languages like HTML, XML or JSON (none of which are the server's internal representation)

 

Manipulation of resources through these representations

·         When a client holds a representation of a resource, including any metadata attached, it has enough information to modify or delete the resource.

 

Self-descriptive messages

·         Each message includes enough information to describe how to process the message.

·         Responses also explicitly indicate whether they are cacheable or not.

 

Hypermedia as the engine of application state (HATEOAS).

·         Client applications can retrieve a resource representation containing hyperlinks, and use those links to access related resources.

·         Clients make state transitions only through actions that are dynamically identified within hypermedia by the server (e.g. by hyperlinks within hypertext).

·         A client does not assume that any particular action is available for any particular resource beyond those described in representations previously received from the server.

Link resources using URIs

We humans navigate around web pages by following hyperlinks, without having to remember the URIs.

So, RESTful software navigates around web resources by following hyperlinks.

A client component doesn’t have to know anything about the overall structure of the system it is a component in. 

It invokes operations on remote resources without knowing what machines they are hosted on

You could say the name space for an application designed to REST principles is the whole internet.

Use only the operations in a standard web protocol

 

Restful clients call services using only internet protocol operations.

A client can access any and every resource using that same general interface.

It uses a resource URI, and one of the operation names defined in a standard internet protocol.

 

REST-compliant server components

REST-ful clients can “parameterise” to invoke several operations on one server component, using the same name.

But REST-compliant servers perform a limited range of operations, each corresponding to an internet protocol operation.

 

Interface definition?

REST simplifies the programming by using only the operation names in an API.

But what do those operation names mean?

The trouble is that Roy Fielding did not prescribe the meanings of internet protocol operations.

This creates the problem that different programmers use them differently.

 

Some say a WADL (the equivalent of SOAP's WSDL) should be used to describe a REST web service

Some say a WADL is helpful for development and for testing, such as loading all the service resources into SoapUI (which supports REST testing).

Some say WADL is vapourware.

Some say WADL is not needed if the system is fully RESTful and REST-complaint.

They equate PUT, GET, POST and DELETE operations in HTTP to CRUD operations on data records, but that is an over simplification.

 

Surely, a development team, if not an enterprise, needs a standard?

Two different standards are tabulated below.

 

HTTP interface operation

Uniform meaning

GET

Read/retrieve a representation of the identified resource

PUT

Create a newly identified resource OR

Update an existing identified resource (replace the previous representation). 

POST

Add a new resource subordinate to an identified other/parent resource

DELETE

Delete the identified resource

HEAD

Get meta data about the identified resource

 

What if a server component does depend on whether the data resource is a collection or an item?

And notice that the PUT and POST operations have different meanings in the second table below.

 

HTTP Operation

The resource is collection

The resource is an item or element

e.g. http://example.com/resourcelist

e.g. http://example.com/resources/item17

GET (read)

List the URIs and perhaps other details of the collection's members.

Retrieve a representation of the addressed member of the collection,

expressed in an appropriate Internet media type.

PUT (update/create)

Replace the entire collection with another collection.

Replace the addressed member of the collection, or if it doesn't exist, create it.

POST (create)

Create a new entry in the collection.

The new entry's URI is assigned automatically and is usually returned by the operation.

Treat the addressed member as a collection in its own right and create a new entry in it.

Not generally used – because it is not idempotent.

DELETE (delete)

Delete the entire collection.

Delete the addressed member of the collection.

 

Idempotency

An important aspect of REST is the concept that some operations (verbs) are idempotent. 

Adding 0 to a number is idempotent - the result is always the same regardless of the number of times you do it. 

The PUT method in REST is idempotent, which means the request can be executed an arbitrary numbers of times, but the result will always be the same as if it had only been done once. 

Flexibility of representation (data/media type)

Unlike SOAP, there are no formally defined data exchange standards for REST.

XML is used, but JSON is more popular as it is easily human readable and it parses very quickly.

 

Content negotiation

This means that client and service agree the data format for message content, JSON, XML etc.

For example, in Ruby 3.0 you’re able to use MIME types to request a representation of a resource in different formats.

 

OData

REST was extended by the Open Data Protocol (OData) protocol (by Microsoft 2007, then OASIS 2014).

This standard that helps clients access any remote data server wrapped up behind an OData interface.

The remote data server exposes its data structure in the form of an XML schema.

OData defines best practices for clients to obtain that schema and use it to make RESTful invocations.

Communicate statelessly

REST mandates that:.

     A server holds no client context between requests

     Session state is held in the client, or in a database.

     The client sends a request when it is ready to transition to a new state.

     While requests are outstanding, the client is considered to be in transition.

     The representation returned contains links the client may use initiate a new state-transition.

 

A server component should not have to hold state for any of the clients it communicates with - beyond a single request.

The reasons are:

·         scalability: it is difficult to scale out server components that maintain client state.

·         loose-coupling: the client is not dependent on talking to the same server component in two consecutive requests

 

Read Further discussion of REST principles paper for more about the five principles above.

Architectural properties of REST

 

Client–server separation of concerns

Servers and clients can be developed and replaced independently, as long as the interface between them is not altered.

 

Stateless servers

See above.

 

Cacheable responses

Responses must define themselves as cacheable, or not, to prevent clients from reusing stale or inappropriate data.

 

Layered system

A client cannot tell whether it is connected directly to the end server, or to an intermediary along the way.

Intermediary servers may improve system scalability load balancing, providing shared caches and enforce security policies.

 

Code on demand (optional)

Servers can temporarily extend or customize the functionality of a client by the transfer of executable code (e.g. Java applets and client-side scripts such as JavaScript.)

SOAP or REST?

Is the playstation better than the Xbox? 

SOAP and REST co-exist. 

Mandating either one or the other can create difficulties.

 

Brief side by side comparison

REST exposes data resources (and operations on them)

SOAP exposes operations - procedures

REST works with many data formats

SOAP encodes everything in XML.

REST uses several HTTP Verbs (GET, PUT, POST etc)

SOAP just uses only POST

REST is assumes peer-to-peer interactions (though often client-server in practice)

SOAP is assumes RPC-style client-server interactions

REST recommends stateless operations

SOAP supports both stateless and stateful operations.

REST has many endpoints (nouns) with a few standard operations (verbs)

SOAP has few endpoints (nouns) with many operations (verbs).

 

How to choose between REST or SOAP?

It depends on the use case, which clients prefer and what skills you have.

A rule of thumb rule - unless you have a clear reason to use SOAP, use REST.

Best to keep architectural options available, unless there's a compelling reason not to.

(In 2006 Google depreciated SOAP in favour of REST.)  

 

 

REST

SOAP

 

REST is more web friendly.

You can do most of what SOAP can, by hand, without established standards.

The WS-* standards stack is getting complex and has a steep learning curve.

But there is strong tooling/IDE support.

It supports strong typing.

It may be preferred for security, ACID transactions, and reliable messaging

 

REST is not prescriptive; so developers can fling stuff together.   

SOAP is very prescriptive; if you don’t do it right or it won't work. 

Speed

Typically similar to SOAP but it depends on the use case.

A poorly coded REST service on world class infrastructure could be slower than a highly efficient SOAP service on a laptop.

Depends on code design and implementation, as well as servers and infrastructure

Scalability

Good for scalability. Statelessness can enable simpler scaling of platforms.

Depends on code design and implementation, as well as servers and infrastructure

Caching

Reads in REST can be cached,

In SOAP they cannot be cached

Overheard

Minimal overhead on top of HTTP

SOAP messages are encapsulated in SOAP headers and other WS-* things which must be parsed

Network traffic

More chatty, more requests back and forward.

Security

You can use HTTPS.

But you can rarely guarantee an SSL tunnel from the client to the app server.

SSL secures the message on the network, but it is usually decrypted to plain text at the web server.

So, can you always trust the server-side environment / infrastructure?

WS-Security ensures security of the message

Through the outbound firewall to the process on the server handling the inbound SOAP requests. 

Arguably more secure, or at least more "enterprise level” security focused. 

Message-level security is in effect until the moment a message has to be in cleartext.

That means a SOAP message can be routed around a network securely until it reaches its final destination; that's generally not possible with HTTPS."

ACID Transactions

No call back, which can make transactions tricky.

REST solves this using a “Transaction resource” concept. 

The server sends the “Transaction resource” back to the client after certain requests.

The server component will continue processing.

The client can poll for an update on their request using the “Transaction resource” ID.

WS-AtomicTransactions enables a two phase commit. 

This often doesn’t make sense over the internet, but may in some enterprise scenarios. 

Generally, compensating transactions work well enough.

E.g. Payments from eBay to Paypal use compensating transactions instead two phase commit.

Reliable messaging

Use idempotency.

Either use GET until the subsequent processing succeeds as defined by the client, or use PUT followed by a verification GET until both work.  

WS-ReliableMessaging is not end-to-end reliable.

It covers the transport element but problems might occur during application processing. 

Read <http://www.infoq.com/articles/no-reliable-messaging> for more information.

Idempotence in REST

Source <http://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation>.

"Idempotence plays an important role in REST.

If you GET a representation of a REST resource (eg, GET a jpeg image from Flickr), and the operation fails, you can just repeat the GET again and again until the operation succeeds.

To the web service, it doesn't matter how many times the image is 'gotten'.

Likewise, if you use a RESTful web service to update your Twitter account information, you can PUT the new information as many times as it takes in order to get confirmation from the web service.

PUT-ing it a thousand times is the same as PUT-ing it once.

Similarly DELETE-ing a REST resource a thousand times is the same as deleting it once.

Idempotence thus makes it a lot easier to construct a web service that's resilient to communication errors.” 

Compensating transactions

"The steps in a compensating transaction must undo the effects of the steps in the original operation.

A compensating transaction might not be able to simply replace the current state with the state the system was in at the start of the operation because this approach could overwrite changes made by other concurrent instances of an application.

Rather, it must be an intelligent process that takes into account any work done by concurrent instances.

This process will usually be application-specific, driven by the nature of the work performed by the original operation.

 

A common approach to implementing an eventually consistent operation that requires compensation is to use a workflow.

As the original operation proceeds, the system records information about each step and how the work performed by that step can be undone.

If the operation fails at any point, the workflow rewinds back through the steps it has completed and performs the work that reverses each step.

Note that a compensating transaction might not have to undo the work in the exact mirror-opposite order of the original operation, and it may be possible to perform some of the undo steps in parallel.

Questions

How to convert from SOAP to REST?

First, why do you want to do it?

OK, experienced developers don’t like XML, too much syntax, too much overhead.   

But that is a weak reason to replace SOAP and XML messages by REST and JSON messages.

It is unwise to do convert from SOAP to REST without rethinking the architecture.

Does the client depend on SOAP extensions for security, transaction management or reliable messaging?

Do you need to restructure the system to achieve same using RESTful clients?

Should you restructure the server-side into REST-compliant servers?

It is easy for developers cobble a system together without following good practices and standards.

If you want a good REST system architecture, seek advice from a guru/teacher/expert.

If the SOAP / REST choice doesn’t matter, then what does?

Think carefully about the server component.

1.        How will the client access the component?

2.        What is the component’s interface - the contracted behaviours?

3.        How will the component failures recognise failures, handle them and communicate back to the client?

4.        Will the service quality – notably availability and speed – be good enough?

5.        Can the component be developed quickly? Is there a sandbox/dev environment for it? Is the contract intuitive, so we don’t need to read a heap of documentation.

6.        Can the component complete a business process without requiring the user to do more?

 

E.g. Don’t just submit an order leaving the user to phone up, or get email notifications of failures, or chase up to check the status of the order.

What is "pure REST"?

That is debatable, but purely restful API is saying to any client..

“To work with the data please send the proper HTTP verb to the URIs in our online docs and specify whether you want JSON or XML”. 

A purely restful API will:

 

·         Honor HTTP Verb Semantics - HTTP GET, PUT, POST, and DELETE

·         Support HATEOAS - To prevent tight coupling between the client and the service, truly RESTful APIs provide a discovery based API. Most of todays API’s do not honour this.

·         Utilize HTTP Status Codes - There are over 70 HTTP status codes, how many does your service handle?

·         Have Self Descriptive Messages - where are custom or specific formats declared (hint: in the HTTP header?)

·         Hypermedia Aware Media Type - HTML, XHTML, Atom, SVG (usually ignored and most used JSON / XML)

·         Have no version number - In pre REST version numbers should’t be needed. Most API’s use them though i.e: /v1/payment.

·         Not use static URIs - Most current API’s document precise URI’s and return types.

Links and frameworks

Links

Drawing on http://docs.oracle.com/javaee/6/tutorial/doc/giqsx.html

Resource oriented architecture on Wikipedia - http://en.wikipedia.org/wiki/Resource-oriented_architecture

Common REST Design Pattern - http://architects.dzone.com/news/common-rest-design-pattern

Paypals HATEOAS compliant REST API - https://developer.paypal.com/docs/integration/direct/paypal-rest-payment-hateoas-links/

SOAP message structure - http://kb.roguewave.com/kb/?View=entry&EntryID=1410&Msg=

WSDL - http://www.practicingsafetechs.com/TechsV1/WSDL/

HATEOAS - http://timelessrepo.com/haters-gonna-hateoas

SOAP vs REST - http://seanmehan.globat.com/blog/2011/06/17/soap-vs-rest/

A DECLARATIVE, DATA-RETRIEVAL AND AGGREGATION GATEWAY FOR QUICKLY CONSUMING HTTP APIS - http://ql.io <http://ql.io/> 

Frameworks

RESTful Web Services discusses many software frameworks which provide some or many features of the ROA.

These include: /db <http://www.slashdb.com/>  - constructs resource oriented architecture from relational databases

 

·         Django <http://en.wikipedia.org/wiki/Django_(web_framework)> 

·         TurboGears <http://en.wikipedia.org/wiki/TurboGears> 

·         Flask <http://flask.pocoo.org/> 

·         EverRest <http://code.google.com/p/everrest> 

·         JBoss RESTEasy <http://www.jboss.org/resteasy> 

·         JBoss Seam <http://en.wikipedia.org/wiki/JBoss_Seam> 

·         Apache Wink <http://incubator.apache.org/wink> 

·         Jersey <http://en.wikipedia.org/wiki/Project_Jersey> 

·         NetKernel <http://en.wikipedia.org/wiki/NetKernel>

·         Recess <http://www.recessframework.org/> 

·         Restlet <http://en.wikipedia.org/wiki/Restlet> 

·         Ruby on Rails <http://en.wikipedia.org/wiki/Ruby_on_Rails> 

·         Symfony <http://en.wikipedia.org/wiki/Symfony> 

·         Yii2 <http://www.yiiframework.com/>