schemas.v1.2.0.external.maec_4.1.metadataSharing.xsd Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of stix Show documentation
Show all versions of stix Show documentation
The Java bindings for STIX v.1.2.0.2
The newest version!
A schema for sharing data associated with malicious software.
Utility type for integers between 0 and 100. Used in field data for commonality and importance.
Utility type for a string not including a question mark (?) for uri objects.
Utility type for ip ranges, for example 111.112.113.0-111.112.113.100.
All the different types of relationship that are possible.
relatedTo - generic relationship.
isClassifiedAs - Used to match an object to a classification to provide a "name" for the object.
hosts - Used when a web site hosts a file.
installed - Used to relate files that install one another.
isParentOf - Used to relate files that creates another one.
causesToInstall - As in web site causes file to be installed.
downloads - As in file retrieves data from a url, or file downloads file.
runs - File that a url with an exploit runs. Or a file (parent) that runs another file (child).
usesCNC - As in classification/object uses command and control url/domain/ip.
isNameServerOf - Maps name server ip to domain.
resolvesTo - Maps domain/url to ip address, could also use for ip address and asn.
verifiedBy - Mapping from object information (e.g. url) to entity, with date indicating verified time.
isServerOfService - Map from domain/ip to service object.
hasAssociatedConfiguration - Map from file to associated registry information.
operatedByEntity - Map from object to operating entity.
downloadedFrom - Map from file to url.
contactedBy - Map from file to url.
Top level types of classification. This is a high level type, not to be confused with the detailed category.
clean - the object can be regarded as not malicious.
dirty - the object can be regarded as malicious.
unknown - the object classification type is unknown.
unwanted - the object can be regarded as potentially unwanted. This is intended to cover the well known case of "potentially unwanted programs".
neutral - the object can be regarded as neutral, neither malicious nor legitimate.
A list of the various ways that geographical location can be represented.
The elements correspond to various levels of granularity of geographical data.
A list of the various units allowed to be used in volume tag in fieldDataEntry.
numberUsersAffected - The count of users (humans) affected by the items referenced by the field data entry.
numberMachinesAffected - The count of computers affected by the items referenced by the field data entry.
numberSeenInSpam - The count of spam messages containing the items referenced by the field data entry. Most commonly used for uris.
numberSeenInMalwareSamples - The count of malware samples containing the items referenced by the field data entry. Most commonly used for uris.
numberOfWebsitesHosting - The count of web sites hosting the items referenced by the field data entry. Most commonly used for uris.
numberOfWebsitesRedirecting - The count of web sites redirecting the items referenced by the field data entry. Most commonly used for uris.
Types of IP addresses
A list of regions, currently not used, but encouraged as values for 'region' when describing location.
A list of origins, used in fieldData, to show where objects originated.
user - Data originated from a user, normally meaning manual submissions from a user.
desktop - Data originated from a computer, normally meaning automated submissions from a product running on a users computer.
network - Data originated from a local network.
gateway - Data originated from measurements at a gateway.
isp - Data originated from measurements at an ISP.
honeypot - Data originated from internally gathered data using a honeypot or other collection device.
collection - Data from a malware collection.
spam - Data originated from spam (e.g. spam Email had a link to malware or the malware itself).
web - Data originated from the Internet.
internal - Internally generated object (e.g. replicants of a polymorphic malware) .
partner - Data originated from a partner.
unknown - unknown.
Property types allowed in an objectProperty.
filename - names of files, normally associated with file objects.
filepath - directory path of files, normally associated with file objects.
locationUrl - a url at which the file sample can be retrieved, associated with file objects.
isKernel - true/false if the malware has a kernel component. This can be applied either to a classification
or to a file object.
isParasitic - true/false if the malware infects other files by attaching to them (if it also replicates then it is a parasitic virus). This can be applied
either to a classification or to a file object.
isStealth - true/false if the malware uses rootkit style techniques to hide from users or security software. This can be applied
either to a classification or to a file object.
isPolymorphic - true/false if the malware is polymorphic, changing its appearance either through replication or server-side techniques.
This can be applied either to a classification or to a file object.
isVirus - true/false if the malware is a virus (replicates and propagates recursively). This can be applied either to a classification or to a file object.
isNonReplicating - true/false if the malware is non replicating. This can be applied either to a classification or to a file object.
isDamaged - true/false if the malware sample is damaged. This can be applied to a file object.
registryValueData - data from the registry from Microsoft operating systems. This is normally applied to a registry object.
It could also be applied to a relationship between a malware sample (file object) and a registry object, to indicate the data
that was written by the malware.
urlParameterString - parameter string information associated with a GET http request. This is normally applied to a uri object.
It could also be applied to a relationship between a malware sample (file object) and a uri object, indicating the parameters
associated with the communication.
postData - parameter information associated with a POST http request. This is normally applied to a relationship between a
malware sample (file object) and a uri object, indicating the data sent with the communication.
registrant - the registrant of a domain name, used for domain objects.
registrationDate - the registration date of a domain name, used for domain objects.
ownerAddress - the address associated with the owner of a domain name, used for domain objects.
adminContact - the administrative contact address associated with a domain name, used for domain objects.
technicalContact - the technical contact address associated with a domain name, used for domain objects.
nameServer - the name server associated with a domain name, used for domain objects.
countryCodeISO3166-2 - the ISO3166-2 code for country, usually associated with an ip address object,
e.g. the country where that IP address is hosted.
countryCodeISO3166-3 - the ISO3166-3 code for country, usually associated with an ip address object.
e.g. the country where that IP address is hosted.
countryCodeFIPS - theFIPS code for country, usually associated with an ip address object.
e.g. the country where that IP address is hosted.
city - the name of a city, usually associated with an ip address object, e.g. the city in which that IP address is hosted.
region - the name of a region, usually associated with an ip address object, e.g. the region in which that IP address is hosted.
isp - the name of a Internet Service Provider, usually associated with an ip address object,
e.g. the isp that hosts the IP address.
httpMethod - the http method (e.g. GET/POST/etc.) associated with an http request. This is usually associated with a
relationship between malware (file object), and a uri (uri object), to indicate the type of http request made.
referrer - the referrer uri, used when accessing a uri, associated with a uri object, or applied to a relationship between an
entity and a uri, for the user agent used when that entity visited that uri.
operatingSystem - environmental information of the operating system used. Normally used as a property of a relationship
between malware (file object) and some other object.
userAgent - User agent used when accessing a uri, associated with a uri object, or applied to a relationship between an
entity and a uri, for the user agent used when that entity visited that uri.
browser - browser used when accessing a uri, associated with a uri object, or applied to a relationship between an
entity and a uri, for the browser used when that entity visited that uri
comment - a human readable comment that can be applied to any object or relationship.
This is the top level element for the xml document. Required attribute is version.
Open issues:
2. Right way to express commonality in field data so that it can be combined properly
3. How to handle unicode in urls
Change list
08/26/2011
Clean-file attribute based changes
1. added digitalSignature to objects
2. added softwarePackage to objects
3. added taggant to objects
4. added numerous elements to fileObject
11/12/2009
1. adding documentation across the schema
2. added partner to OriginTypeEnum
3. made sha1 in fileObject optional
4. added isDamaged as a propertyType
5. changed property name isNon-replicating to isNonReplicating
6/11/2009
1. incremented version
2.Rename parents/children in relationship to source/target
3. Add generic relationship, ‘relatedTo’
4. Make commonality element in fieldDataEntry optional
5. Add unknown element to origintypeenum
6. Remove ipv4 and ipv6 from locationenum
7. Make id on ip object startaddress-endaddress even if startaddress == endaddress. Added IPRange type
8. Add optional firstSeenDate to fieldDataEntry, for first time entity providing data saw the object
6/4/2009
1. File - id should be a xs:hexBinary
2. File - extraHash should be a xs:string
3. Uri – add optional ipProtocol field, with enumeration of values tcp/udp/icmp etc.
4. Uri – add documentation that protocol in uri needs to be either from well known list (from iana.org) or ‘unknown’
5. Domain - need to fix documentation for domain – example is wrong
6. registry – remove valuedata – it is in a property
7. ip object – rename to ip, and give it a start address and end address. Share a single address by making start and end the same. Id will be address or startaddress-endaddress
8. service – delete – subsumed by uri with extra data elements in it
9. classification – remove modifiers (attributes) on category and put in properties
10. classification – add documentation that category is companyname:category
11. objectProperty – move timestamp to be top level instead of on each property and make it required
12. relationship – make timestamp required
13. relationship – add doc on runs. removed 'exploits' - it refers to environment object that no longer exists
14. added comment field to propertyenum
15. made timeStamp -> timestamp for consistency
16.incremented version
5/31/2009
1. incremented version
2. changed url to uri
3. removed environment object and related enumerations
4. added restriction on uri to not allow a question mark (?)
5/15/2009
1. incremented version
2. Added neutral classification type
3. Added numberOfWebsitesHosting and numberOfWebsitesRedirecting to volume units enumeration
4. added referrer, operatingSystem, userAgent and browser to properties
5. made classification type attribute required
5/8/2009
1. added new object type for asn
2. moved domain information to properties, so that domains info can be timestamped
3. added properties for geolocation of an ip address
4. added property for location url for a file
5. added VolumeUnitsEnum and volume tag in fieldData. This is to allow sharing of actual prevalence numbers,
with various units.
6. Added ipProtocol (tcp/udp) to service object. Also changed names of expectedProtocol and actualProtocol to be
expectedApplicationProtocol and actualApplicationProtocol
7. added 'references' surrounding tag to ref tag in fieldDataEntry and objectProperty, so that can assign multiple references if required
8. made id on file back to hexBinary. Use length to figure out what hash it is.
9. incremented version
10. added properties for httpMethod and postData
11. added relationship types 'contactedBy' and 'downloadedFrom'
4/17/2009
1. Incremented version
2. Added unwanted to ClassificationTypeEnum
3. Added text about ids for files to documentation
4. Removed filename from file object definition
5. Relaxed requirement on id of file to be an xs:hexString to be an xs:string to allow e.g. md5:aaaaabbbbccc as an id. Not enormously happy about that…
6. Made sha256 optional and sha1 required in files
7. Added “open issues” section in documentation for top level element
8. Category is now an xs:string; deleted CategoryTypeEnum
9. Added comment to doc on fieldDataEntry about using standard time periods, but kept start date and end date
10. Added objectProperties element, and example illustratingProperties.xml. Currently allowed properties are filename, filepath, registryValueData and urlParameterString. There is an optional timestamp on each property. I allowed objectProperty to have an id, so that it can be referenced elsewhere, although we might want to re-think that.
11. Added some better documentation to relationships
12. Added more documentation throughout
The company name for the entity generating the xml document, for example "AVG Technologies".
The author of the document, for example "Matt Williamson" or "Igor Muttik".
A human readable comment.
The time that the document was created.
Objects are globally unique files, urls, domain, registry, ipAddress etc. The data within the object is supporting data for the globally unique object.
For example, files have an id (by convention the hash, sha256 if available, else weaker ones), and the data for the file is the hashes, sizes etc.
Urls have an id (the url itself), and data which is simply the url parts broken out.
There are no dates, etc in the objects. These are first class, global objects.
Files or samples
URI (Uniform Resource Identifier) objects.
Domain names as administered by ICANN.
Configuration information from the registry on Microsoft Windows operating systems.
Internet Protocol (IP) addresses, both ipv4 and ipv6.
Autonomous System (AS).
A corporation or other entity.
Labels or names, for example detection names associated with malware samples.
Software packages, typically used for associating with the files that they install or create.
Digital signatures, for use in assocating with >1 binaries that may have the same signature. If only one binary with a signature is wished to be profiled, then the digitalSignature element inside the fileObject should be utilized.
Digital signatures, for use in assocating with >1 binaries that may have the same signature. If only one binary with a taggant is wished to be profiled, then the taggant element inside the fileObject should be utilized.
Properties of objects that do not make sense as relationships. e.g. file names, url parameter strings, registry value data.
Relationships between objects.
Prevalence data.
The version of the schema. This is currently fixed to be 1.1.
A required identifier for the document.
Object definition for files. The required attribute is the id, which needs to be globally unique.
By convention, the value used is a hash, the stronger the better.
The choice should be: use sha256 if you have it, if not use sha1, if not use md5.
Other hashes and file sizes are recorded in the elements.
File names are put in as properties.
The file size in bytes.
String describing the type of file, for example executable, script etc.
Element for inserting fuzzy hashes for example pehash, ssdeep. These are put in with this element, with a required attribute 'type' used
to hold the type of hash.
The normalized native path of the file, using standardized system path variables (for Windows see http://en.wikipedia.org/wiki/Environment_variable#System_path_variables) with prepended and appended percentage characters. E.g. %ProgramFiles%/Microsoft Visual Studio.
The name of the file within an installer or archive.
The folder the file resides in within an installer or archive.
The name of the vendor, if extractable from the file.
The internal name(s) of hte file, if applicable.
The language(s) the file is in.
The name of the product the file belongs to, if applicable.
The version of the product the file belongs to, if applicable.
The development environment used to build the file, if applicable.
The checksum of the file, if applicable.
The processor architecture of the file, if applicable.
The build timestamp of the file, if applicable.
The version of the compiler used to compile the file, if applicable.
The version of the linker used to link the file, if applicable.
The minimum operating system version needed to run the file, specifeid as a CPE name. The Common Platform Enumeration, or CPE, name of the package if one exists. CPE is a structured naming scheme for IT systems, software, and packages. For more information on CPE see http://cpe.mitre.org. For the official CPE dictionary see http://nvd.nist.gov/cpe.cfm.
The number of sections in the file, if applicable.
The minimum required to run the file, e.g. Administrator, if applicable.
Information on the digital signature of the file, if applicable.
Information on the taggant used to tag the file, if applicable.
Registry object. The required attribute is 'id', which is taken to be key\\valueName.
Keys end in a \, value names start with a \, so you have e.g.
key = hklm\software\microsoft\currentversion\windows\run\
value =\foo
making the id hklm\software\microsoft\currentversion\windows\run\\foo
Entity Object. This is used to record groups, companies etc., and departments within organizations.
The globally unique id (attribute) should be constructed from the company and department name,
e.g. "Company name:Department name", "Mcafee:AVERT labs", or "Russian Business Network".
Uri object. Only required element is uri string itself. There are elements for each of the broken out elements.
The protocol should be take from the list at http://www.iana.org/assignments/port-numbers, or if not in that list have the value 'unknown'.
The ipProtocol should be taken from the list http://www.iana.org/assignments/protocol-numbers/.
The elements correspond to the usual breakdown of a uri into its component domain, hostname, path, port etc, as
described at http://en.wikipedia.org/wiki/Uniform_Resource_Locator.
Protocol, for example http, ftp. value must match an element in the list hosted at http://www.iana.org/assignments/port-numbers.
IP protocol, for example. tcp, udp. value must match an element in the list hosted at http://www.iana.org/assignments/protocol-numbers/.
IP object. Used to hold ipv4, ipv6 ip addresses and address ranges. The globally unique id is 'startAddress-endAddress'.
There are two required elements, startAddress and endAddress, make these the same if you are
specifying a single address.
Thus for ip range id, would be e.g. 213.23.45.7-213.23.45.19
For a single ip, id would be e.g. 12.34.56.1-12.34.56.1
ip address - string for the actual address and attribute either ipv4, ipv6.
Domain object, used to hold internet domains, e.g.yahoo.com. The globally unique identifier (id attribute) is the domain itself.
whois information on domain is recorded using object properties.
Object used to hold information on Autonomous System Numbers. An autonomous system (AS) is a collection of connected
Internet Protocol (IP) routing prefixes under the control of one or more network operators that presents a common,
clearly defined routing policy to the Internet.
The id is the number, written as an integer for both 16 and 32 bit numbers.
Classification object, used to hold names or classifications of objects. The most common use case for this is detection
names for files from av scanners. However, this object could be used for general classification. The globally unique id (attribute)
should be created from "Company name:internal classification name", e.g. "Mcafee:Generic.DX". The other required attribute is the
type of classification, e.g. clean, dirty, unknown.
There are elements to capture the category of the classification. The category should be entered in the same way to the
classification name, e.g. company name:category name, e..g Mcafee:Trojan.
Category is "companyname:category".
Details of the classification, giving product details, particularly useful for anti-virus scanner detections.
Data structure to hold prevalence information. The data includes a reference to another object (which is an xpath
expression pointing to an object inside the 'ref' element), together with a time period (startDate -> endDate),
an origin - where the object came from, and various location tags. This allows rich information on prevalence to be recorded.
By convention, time periods should be wherever possible standard time periods, e.g. minute, hour, 24 hours, week, month, quarter, year. This
will facilitate combination of data from multiple sources.
To represent a single entry, make startDate == endDate.
Commonality is calculated from the sightings of malware objects (and so such calculation is easier to automate).
Importance is reserved for cases when “commonality” is not available or if there is a need to communicate the
importance when commonality is low.
We define the commonality on a scale 0 to 100 (0 means “never found in the field” and 100 means “found very frequently”). Scaling commonality to 0..100 range instead of using actual sample counts is to avoid the effect of the user base size on the commonality. We derive commonality from the number of affected computers – not from the number of samples (for example, a hundred parasitic infections of the same virus on a single computer are to be counted as one).
To calculate the commonality we use two-stage approach and logarithmic scale:
- If the number of affected users exceeds 0.1% of your user base (more frequent than 1 in a 1000) set commonality to “100”
- Otherwise, calculate the ratio of infected computers amongst your user base by dividing the real number of affected computers ‘n’ by the total number ‘N’
- Apply the following formula to get the commonality –( log2(1+n*1000/N) ) * 100
- Round to the closest integer
Obviously, the calculation above can only be applied to counting of malware sightings on desktops.
If telemetry is collected from a fraction of such desktops then an appropriate correction should be used.
For all other cases (e.g. sighting on gateways, in some network security appliance, on an ISP level, etc.)
please exercise your best judgment and apply provided desktop guideline as an example to make sure
the commonality factor is as comparable as possible.
For a URL object the commonality could reflect, for example, how widely it was spammed.
“Importance” should not be used together with “commonality” (unless commonality=“0”) to avoid possible confusion. High “importance”, for example, can be assigned to samples that are over-hyped by media when their commonality is still “0”.
Use the following guidelines for “importance” which is also defined on a scale 0..100:
100 – you’d expect your CEO and/or media to call you any second about this object
80 – you might get a call from your CEO and/or media
60 – you’d expect your boss to call you any second
40 – you might get a call from your boss
20 – someone is very likely to contact you about this object
10 – you might get contacted about this object
0 – you’d be surprised if anyone would ever contact you about this object
The objects the prevalence information pertains to.
The start date for this field data entry - the start date of the period over which the prevalence (commonality) and importance is measured.
The end date for this field data entry - the end date of the period over which the prevalence (commonality) and importance is measured.
The date that the object was first seen by the reporting entity.
An enumeration of common sources or origins of data associated with the field data.
Qualitative measurements of prevalence.
Quantitive measurements of prevalence.
Qualitative measurement of risk associated with the object.
Geolocation information for prevalence.
Reference element used to hold xpath expressions to objects, for example file[@id="12345"].
A property.
Property; a reference to the object, a timestamp and an unbounded set of properties.
This is used to describe extra information about an object. For example, to show the url parameter strings
associated with a particular URI object. Or to show file names associated with a particular file.
Properties can also be applied to relationships, by referencing the relationship by id. This allows use such as
e.g. recording the post data sent in an http request between a malware (file object) and a uri (uri object).
The objects the properties pertain to.
Relationships are used to express relationships between objects, and dates. Relationships have
a type (an attribute with a defined list of allowed relationships), source (a set of xpath references to the parent end of
the relationship), target (xpath references to the other end of the relationship) and an optional date.
The linking of objects with types is a powerful way of describing data. The dates can be used to provide context.
For example, to assign a classification to an object, that can done with an "isClassifiedAs" relationship, with the date meaning
that that was the data that that classification was assigned.
To show urls and the last visited data, this can be expressed as a "verifiedBy" relationship between the urls and the entity doing the
verification, with the date interpreted as the verification date.
References to objects at the parent end of the relationship.
References to objects at the child end of the relationship.
Software package object, used to store information about a software package, such as the vendor and version. Intended primarily for the clean-file metadata sharing use case.
The product group that the product belongs to, e.g. Microsoft Office.
The Common Platform Enumeration, or CPE, name of the package if one exists. CPE is a structured naming scheme for IT systems, software, and packages. For more information on CPE see http://cpe.mitre.org. For the official CPE dictionary see http://nvd.nist.gov/cpe.cfm.
The version of CPE that is used for the name in the CPEname element. As of 10/04/2011 this is 2.2.
Digital signature object, used to hold information about digitally signed binaries with regards to the certificate used and its validity.
Taggant object, for use in characterizing the software taggant that may be associated with a file or multiple files. For more information on the taggant system or the IEEE Malware Working Group that created it, please see http://standards.ieee.org/develop/indconn/icsg/malware.html.