XML Introduction Laurea Magistrale in Informatica
Chapter 01
Modulo del corso
Thecnologies for Innovation
XML Introduction Laurea Magistrale in Informatica
Chapter 01
Modulo del corso
Thecnologies for Innovation
Agenda
What is ……
Ten points for XML
History and Evolution
Technologies for add funtionalities
XML Family
XML Application Areas
Electronic Data Interchange
XML: what is
The Extensible Markup Language (XML) is a general-purpose specification for creating custom markup languages
markup language is an artificial language using a set of annotations to text that give instructions regarding how text is to be displayed.
A well-known example of a markup language in use in computing is HyperText Markup Language (HTML)
It is classified as an extensible language because it allows its users to define their own elements
XML: cosa è
XML è un metalinguaggio, che permette di definire sintatticamente linguaggi di markup
definisce un insieme regole (meta)sintattiche, attraverso le quali è possibile descrivere formalmente un linguaggio di markup, detto applicazione XML
ogni applicazione XML eredita da XML un insieme di caratteristiche sintattiche comuni
ogni applicazione XML a sua volta definisce una sintassi formale particolare
XML permette di esplicitare la (le) struttura(e) di un documento in modo formale mediante marcatori (markup) che vanno inclusi all’interno del testo (character data)
Il markup rappresenta la struttura logica del documento
Il markup si riconosce dal resto del testo perché compreso tra delimiter, informalmente:
&yyyy;
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is for structuring data
XML documents reflect the structure of the data that they contain. For example, if the document were a book, it might contain elements, which would in turn contain elements, and so on.
XML is a set of rules (you may also think of them as guidelines or conventions) for designing text formats that let you structure your data.
XML makes it easy for a computer to generate data, read data, and ensure that the data structure is unambiguous.
XML avoids common pitfalls in language design: it is extensible, platform-independent, and it supports internationalization and localization. fully Unicode-compliant.
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML looks a bit like HTML
Like HTML, XML makes use of tags (words bracketed by '<' and '>') and attributes (of the form name="value").
While HTML specifies what each tag and attribute means, and often how the text between them will look in a browser, XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it.
In other words, if you see "
" in an XML file, do not assume it is a paragraph. Depending on the context, it may be a price, a parameter, a person, a p... (and who says it has to be a word with a "p"?).
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is text, but isn't meant to be read
Although XML is verbose, and it is all ASCII text, XML is still designed primarily to be used by automated systems, not necessarily read by humans.
Like HTML, XML files are text files that people shouldn't have to read, but may when the need arises.
Compared to HTML, the rules for XML files allow fewer variations. A forgotten tag, or an attribute without quotes makes an XML file unusable, while in HTML such practice is often explicitly allowed.
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is verbose by design
Since XML is a text format and it uses tags to delimit the data, XML files are nearly always larger than comparable binary formats.
That was a conscious decision by the designers of XML. The advantages of a text format are evident, and the disadvantages can usually be compensated at a different level.
Disk space is less expensive than it used to be, and compression programs like zip and gzip can compress files very well and very fast.
In addition, communication protocols such as modem protocols and HTTP/1.1, the core protocol of the Web, can compress data on the fly, saving bandwidth as effectively as a binary format.
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is a family of technologies
The core of XML is the XML 1.0 recommendation. Beyond XML 1.0, "the XML family" is a growing set of modules that offer useful services to accomplish important and frequently demanded tasks
XLink describes a standard way to add hyperlinks to an XML file.
XPointer is a syntax in development for pointing to parts of an XML document. An XPointer is a bit like a URL, but instead of pointing to documents on the Web, it points to pieces of data inside an XML file.
CSS, the style sheet language, is applicable to XML as it is to HTML.
XSL is the advanced language for expressing style sheets. It is based on XSLT, a transformation language used for rearranging, adding and deleting tags and attributes.
The DOM is a standard set of function calls for manipulating XML (and HTML) files from a programming language.
XML Schemas 1 and 2 help developers to precisely define the structures of their own XML-based formats.
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is new, but not that new
Development of XML started in 1996 and it has been a W3C Recommendation since February 1998, which may make you suspect that this is rather immature technology.
In fact, the technology isn't very new. Before XML there was SGML, developed in the early '80s, an ISO standard since 1986, and widely used for large documentation projects.
The designers of XML simply took the best parts of SGML, guided by the experience with HTML, and produced something that is no less powerful than SGML, and vastly more regular and simple to use.
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML leads HTML to XHTML
There is an important XML application that is a document format: W3C's XHTML, the successor to HTML. XHTML has many of the same elements as HTML.
The syntax has been changed slightly to conform to the rules of XML. A format that is "XML-based" inherits the syntax from XML and restricts it in certain ways (e.g, XHTML allows "
", but not ""); it also adds meaning to that syntax (XHTML says that "
" stands for "paragraph", and not for "price", "person", or anything else).
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is modular
Using XML, you can define vocabularies that are designed to be reused.
By creating DTDs or XML Schemas, you can create sets of documents that are all based on common vocabularies.
Similarly, using XML Namespaces, you can publish and share those vocabularies without conflicts.
Since two formats developed independently may have elements or attributes with the same name, care must be taken when combining those formats (does "
" mean "paragraph" from this format or "person" from that one?).
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is the basis for RDF and the Semantic Web
RDF, or the Resource Description Framework, and the Semantic Web are both initiatives of the W3C to help refine the way information is organized on the Web.
XML is the basis of these technologies, and will help organize the information on the Web, making it easier for users to find and access the information they need.
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
XML is license-free, platform-independent and well-supported
XML is not owned by any corporation, nor is it controlled by a corporation.
It is a publication of the W3C, and as such, it can be used freely by anyone.
And although some may have issues with the W3C process, or what ends up in the final Recommendations, the bottom line is that it makes XML a fairly open standard. (open standard is a standard that is publicly available and has various rights to use associated with it. )
Riferimenti in Italiano
XML in 10 punti
Questo sommario in 10 punti cerca di raccogliere alcuni concetti basilari che permettano al neofita di vedere un po' di luce attraverso la nebbia. di Andrea Benassi 26 Novembre 2003
http://www.indire.it/content/index.php?action=read&id=313
XML e W3C
XML is recommended by the World Wide Web Consortium (W3C).
The recommendation specifies both the lexical grammar and the requirements for parsing.
Lexical That is, the rules governing how a character sequence is divided up into subsequences of characters, each of which represents an individual token.
parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given (more or less) formal grammar.
History
It started as a simplified subset of the Standard Generalized Markup Language (SGML)
The versatility of SGML for dynamic information display was understood by early digital media publishers in the late 1980s prior to the rise of the Internet.
By the mid-1990s some practitioners of SGML had gained experience with the World Wide Web, and believed that SGML offered solutions to some of the problems the Web was likely to face as it grew.
Dan Connolly added SGML to the list of W3C's activities when he joined the staff in 1995; work began in mid-1996 when Sun Microsystems engineer Jon Bosak developed a charter and recruited collaborators.
Evolution
XML was compiled by a working group of eleven members, supported by an (approximately) 150-member Interest Group. Technical debate took place on the Interest Group mailing list and issues were resolved by consensus or, when that failed, majority vote of the Working Group.
The XML Working Group never met face-to-face; the design was accomplished using a combination of email and weekly teleconferences. The major design decisions were reached in twenty weeks of intense work between July and November 1996, when the first Working Draft of an XML specification was published.
Further design work continued through 1997, and XML 1.0 became a W3C Recommendation on February 10, 1998.
Working Group's goals
Internet usability, general-purpose usability
SGML compatibility
Facilitation of easy development of processing software and minimization of optional features
Legibility, formality, conciseness, and ease of authoring.
Like its antecedent SGML, XML allows for some redundant syntactic constructs and includes repetition of element identifiers.
In these respects, terseness was not considered essential in its structure.
The name “XML” …. other names (CURIOSITY)
"MAGMA" (Minimal Architecture for Generalized Markup Applications)
"SLIM" (Structured Language for Internet Markup)
"MGML" (Minimal Generalized Markup Language).
Comments