You're building a SOAP request, everything works in testing, and then a customer submits data containing an & or a < and the entire integration breaks:
System.Xml.XmlException: An error occurred while parsing EntityName. Line 5, position 23.
Or in Java:
org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
This is one of the most common SOAP errors in production. It happens because XML has five characters that have special meaning in the markup syntax, and if user-supplied data contains any of them, the XML becomes malformed. This guide explains exactly which characters cause problems, how to escape them correctly in every major language, and when to use CDATA sections or Base64 encoding instead.
The Five Special Characters
XML reserves five characters for its syntax. When these appear in text content or attribute values, the XML parser interprets them as markup instead of data:
| Character | Name | XML Entity | Problem |
|---|---|---|---|
& | Ampersand | & | Parser thinks it's the start of an entity reference |
< | Less than | < | Parser thinks it's the start of a tag |
> | Greater than | > | Can break tag parsing in some contexts |
" | Double quote | " | Breaks attribute values delimited by double quotes |
' | Single quote (apostrophe) | ' | Breaks attribute values delimited by single quotes |
Here's what a broken SOAP request looks like:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<CreateCompany xmlns="http://example.com/api">
<!-- This is BROKEN — the & is not escaped -->
<CompanyName>Johnson & Sons</CompanyName>
<!-- This is BROKEN — the < is not escaped -->
<Description>Revenue < 1M</Description>
</CreateCompany>
</soap:Body>
</soap:Envelope>
And the corrected version:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<CreateCompany xmlns="http://example.com/api">
<CompanyName>Johnson & Sons</CompanyName>
<Description>Revenue < 1M</Description>
</CreateCompany>
</soap:Body>
</soap:Envelope>
Solution 1: XML Entity Escaping by Language
The correct fix is to escape special characters using XML entities. Here's how to do it in every major language:
Java
import javax.xml.stream.XMLStreamWriter;
import javax.xml.stream.XMLOutputFactory;
import java.io.StringWriter;
// Option 1: XMLStreamWriter (handles escaping automatically)
StringWriter sw = new StringWriter();
XMLStreamWriter writer = XMLOutputFactory.newInstance().createXMLStreamWriter(sw);
writer.writeStartElement("CompanyName");
writer.writeCharacters("Johnson & Sons"); // Automatically escaped to &
writer.writeEndElement();
writer.flush();
System.out.println(sw.toString());
// Output: <CompanyName>Johnson & Sons</CompanyName>
// Option 2: Apache Commons Text
import org.apache.commons.text.StringEscapeUtils;
String escaped = StringEscapeUtils.escapeXml11("Johnson & Sons");
// Result: "Johnson & Sons"
// Option 3: Manual replacement (not recommended but common)
public static String escapeXml(String input) {
if (input == null) return null;
return input
.replace("&", "&") // Must be first!
.replace("<", "<")
.replace(">", ">")
.replace("\"", """)
.replace("'", "'");
}
Important: If you do manual replacement, & must be replaced first. Otherwise < becomes &lt;.
C# / .NET
using System.Security;
using System.Xml;
// Option 1: SecurityElement.Escape (simplest)
string escaped = SecurityElement.Escape("Johnson & Sons");
// Result: "Johnson & Sons"
// Option 2: XmlWriter (handles escaping automatically)
using var sw = new StringWriter();
using var writer = XmlWriter.Create(sw, new XmlWriterSettings { OmitXmlDeclaration = true });
writer.WriteStartElement("CompanyName");
writer.WriteString("Johnson & Sons"); // Auto-escaped
writer.WriteEndElement();
writer.Flush();
Console.WriteLine(sw.ToString());
// Output: <CompanyName>Johnson & Sons</CompanyName>
// Option 3: XmlConvert
string encoded = XmlConvert.EncodeName("value<with>special&chars");
Python
import xml.sax.saxutils as saxutils
from lxml import etree
# Option 1: xml.sax.saxutils.escape
escaped = saxutils.escape("Johnson & Sons")
# Result: "Johnson & Sons"
# For attribute values (also handles quotes)
escaped_attr = saxutils.quoteattr('He said "hello"')
# Result: '"He said "hello""'
# Option 2: lxml (handles escaping automatically)
root = etree.Element("CompanyName")
root.text = "Johnson & Sons" # Auto-escaped when serialized
print(etree.tostring(root, encoding="unicode"))
# Output: <CompanyName>Johnson & Sons</CompanyName>
PHP
<?php
// Option 1: htmlspecialchars with ENT_XML1
$escaped = htmlspecialchars("Johnson & Sons", ENT_XML1 | ENT_QUOTES, 'UTF-8');
// Result: "Johnson & Sons"
// Option 2: DOMDocument (handles escaping automatically)
$doc = new DOMDocument('1.0', 'UTF-8');
$element = $doc->createElement('CompanyName');
$text = $doc->createTextNode('Johnson & Sons'); // Auto-escaped
$element->appendChild($text);
$doc->appendChild($element);
echo $doc->saveXML($element);
// Output: <CompanyName>Johnson & Sons</CompanyName>
// WARNING: Do NOT use htmlentities() — it creates HTML entities
// that are invalid in XML (e.g., instead of  )
JavaScript / Node.js
// Option 1: Manual escape function
function escapeXml(str) {
return str
.replace(/&/g, '&') // Must be first
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
console.log(escapeXml("Johnson & Sons"));
// Result: "Johnson & Sons"
// Option 2: xmlbuilder2 (handles escaping automatically)
const { create } = require('xmlbuilder2');
const doc = create().ele('CompanyName').txt('Johnson & Sons').up();
console.log(doc.end({ headless: true }));
// Output: <CompanyName>Johnson & Sons</CompanyName>
Solution 2: CDATA Sections
CDATA sections tell the XML parser to treat everything inside as literal text — no escaping needed. This is useful when the data contains many special characters or when you're embedding structured content like HTML.
<Description><![CDATA[Revenue < 1M & growing. Use <b>bold</b> for emphasis.]]></Description>
The parser reads the content between <![CDATA[ and ]]> as plain text. No entity escaping is applied.
When to use CDATA:
- The data contains many special characters (HTML content, code snippets, mathematical expressions).
- You want the raw text to be human-readable in the XML without entity clutter.
- The SOAP service's WSDL explicitly defines the element as accepting CDATA.
When NOT to use CDATA:
- The data might contain the literal string
]]>(which terminates the CDATA section). - The SOAP service doesn't support CDATA in that element (some services strip it).
- You're building XML programmatically — use the language's XML library instead.
Java CDATA example
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.newDocument();
Element description = doc.createElement("Description");
CDATASection cdata = doc.createCDATASection("Revenue < 1M & growing");
description.appendChild(cdata);
C# CDATA example
using var sw = new StringWriter();
using var writer = XmlWriter.Create(sw);
writer.WriteStartElement("Description");
writer.WriteCData("Revenue < 1M & growing");
writer.WriteEndElement();
Solution 3: Base64 Encoding for Binary or Unpredictable Data
When the data is binary (files, images) or contains truly unpredictable content (user input from global markets with various encodings), Base64 encoding eliminates all XML-related issues.
<FileContent>Sm9obnNvbiAmIFNvbnM=</FileContent>
import base64
raw_data = "Johnson & Sons <Special> \"Quoted\" Data"
encoded = base64.b64encode(raw_data.encode('utf-8')).decode('ascii')
# Result: "Sm9obnNvbiAmIFNvbnMgPFNwZWNpYWw+ICJRdW90ZWQiIERhdGE="
The receiving service must be designed to accept Base64-encoded data in that field. This approach is standard for SOAP services that accept file attachments via xsd:base64Binary.
Unicode and Encoding Issues
Beyond the five special characters, encoding mismatches cause a separate class of XmlException errors.
XmlException: Invalid character in the given encoding. Line 1, position 1.
Common causes:
- BOM (Byte Order Mark): The XML file has a UTF-8 BOM (
EF BB BF) but the parser doesn't expect it. - Encoding declaration mismatch: The XML declares
encoding="UTF-8"but the actual bytes are ISO-8859-1 or Windows-1252. - Control characters: Characters like
\x00-\x08,\x0B,\x0C,\x0E-\x1Fare invalid in XML 1.0.
Fix: Strip invalid XML characters
import re
def strip_invalid_xml_chars(text):
"""Remove characters that are invalid in XML 1.0."""
return re.sub(
r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]',
'',
text
)
clean_data = strip_invalid_xml_chars(user_input)
public static String stripInvalidXmlChars(String input) {
if (input == null) return null;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.length(); i++) {
char c = input.charAt(i);
if (c == 0x9 || c == 0xA || c == 0xD ||
(c >= 0x20 && c <= 0xD7FF) ||
(c >= 0xE000 && c <= 0xFFFD)) {
sb.append(c);
}
}
return sb.toString();
}
Debugging Tips
When you hit an XmlException in production, these steps will identify the problem quickly:
1. Log the raw SOAP request before sending
# Use curl to send the exact XML and see the raw response
curl -v -X POST \
-H "Content-Type: text/xml; charset=utf-8" \
-H "SOAPAction: http://example.com/CreateCompany" \
-d @request.xml \
https://example.com/service
2. Validate the XML before sending
from lxml import etree
xml_string = build_soap_request(user_data)
try:
etree.fromstring(xml_string.encode('utf-8'))
print("XML is valid")
except etree.XMLSyntaxError as e:
print(f"Invalid XML at line {e.lineno}, column {e.offset}: {e.msg}")
3. Find the exact offending character
def find_problem_chars(text):
"""Find characters that will break XML."""
problems = []
for i, char in enumerate(text):
if char in '&<>"\'':
problems.append((i, char, f"XML special character: use entity escape"))
elif ord(char) < 0x20 and char not in '\t\n\r':
problems.append((i, repr(char), f"Invalid XML 1.0 control character"))
return problems
issues = find_problem_chars(user_input)
for pos, char, reason in issues:
print(f"Position {pos}: {char} — {reason}")
How SOAPless Helps
XML escaping errors are a category of bug that shouldn't exist if you're working with JSON. When you use SOAPless, you send JSON request bodies and receive JSON responses. SOAPless handles the JSON-to-XML conversion server-side, which means it automatically applies correct XML escaping to every value you send. Special characters in company names, descriptions, addresses, or any other field are properly escaped before they reach the SOAP service.
You never construct XML strings, never worry about CDATA vs entity escaping, and never debug XmlException stack traces. Your request is {"CompanyName": "Johnson & Sons"} and SOAPless takes care of the rest, including proper namespace handling, SOAP envelope construction, and encoding declaration. The dashboard lets you test with special characters before going live, so you can verify the behavior without writing any code.