
Overview
Brain Language Metrics on Company Filings
Overview
The Brain Language Metrics on Company Filings (BLMCF) dataset has the objective of monitoring several language metrics on 10-Ks and 10-Qs company reports for approximately 6000+ US stocks.
Recent literature works claim inefficiencies in the market response to company filings information due to the increased complexity and length of such reports, see for example "Lazy Prices" Cohen et al. 2018 or "The Positive Similarity of Company Filings and the Cross-Section of Stock Returns", M. Padysak 2020 .
Some literature works claim inefficiencies in the market response to company filings information due to the increased complexity and length of such reports; over the last 20 years, the length of the average 10-K has in fact increased dramatically.
Our dataset is made of two parts; the first one includes the language metrics of the most recent 10-K or 10-Q report for each firm, namely:
-
Financial sentiment
-
Percentage of words belonging to financial domain classified by language types: constraining, litigious, uncertainty and interesting language.
-
Readability score
-
Lexical metrics such as lexical density and richness
-
Text statistics such as the report length and the average sentence length
The second part includes the differences between the two most recent 10-Ks or 10-Qs reports of the same period for each company, namely:
-
Difference of the various language metrics (e.g. delta sentiment, delta readability score delta, delta percentage of a specific language type etc.)
-
Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language)
Our dataset includes the metrics and related differences both for the whole report and for specific sections (Risk Factors and Management Discussion and Analysis)
Feed Details
The dataset is updated with a daily frequency since new 10-Ks and 10-Qs reports are released every day for some of the universe companies. Clearly the largest update will be around February, April, August and November when the largest number of reports is released. The historical dataset is available from year 2010.
Historical Trial
The dataset contains historical data from January 2010 that can be freely accessed for 2 months. For a live feed please contact us at support@braincompany.co and we will make accessible a customized version of the product on AWS Data Exchange according to Client requirements.
Disclaimer
The content of this dataset is not to be intended as investment advice. The material is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provvaluee investment advisory or other services by Brain. Brain makes no guarantees regarding the accuracy and completeness of the information expressed in the dataset.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
No refunds are offered for this product, for more information please contact support@braincompany.co
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Additional details
You will receive access to the following data sets.
Data set name | Type | Historical revisions | Future revisions | Sensitive information | Data dictionaries | Data samples |
|---|---|---|---|---|---|---|
Brain Language Metrics on Company Filings - Extended Version (Historical Trial) | All historical revisions | All future revisions | Not included | Not included |
Resources
Vendor resources
Similar products

