ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2023
3

Subjects

Authors

Institution

result total 3.

Hide Summary

Hits

Date

Downloads

Your conditions: 马英博

1. ChinaXiv:202303.09564
Download

语义在人脑中的分布式表征：来自自然语言处理技术的证据

Subjects: Psychology >> Social Psychology submitted time 2023-03-28 Cooperative journals: 《心理科学进展》

蒋嘉浩赵国钰马英博丁国盛刘兰芳

Abstract： How semantics are represented in human brains is a central issue in cognitive neuroscience. Previous studies typically detect semantic information by manipulating the properties of stimuli or task demands, or by asking a group of participants to judge the stimuli according to several given dimensions or features. Despite having brought valuable insights into the neurobiology of language, these approaches have some limitations. First, the experimental approach may only provide a coarse depiction of semantic properties, while human judgment is time-consuming and the results may vary substantially across subjects. Second, the conventional approach has difficulty quantifying the effect of context on word meaning. Third, the conventional approach is unable to extract the topic information of discourses, the semantic relations between the different parts of a discourse, or the semantic distance between discourses. The recently-developed natural language processing (NLP) techniques provide a useful tool that may overcome the above-mentioned limitations. Grounded on the distributional hypothesis of semantics, NLP models represent meanings of words, sentences, or documents in the form of computable vectors, which can be derived from word-word or word-document co-occurrence relationships, and neural networks trained on language tasks. Recent studies have applied NLP techniques to model the semantics of stimuli and mapped the semantic vectors onto brain activities through representational similarity analyses or linear regression. Those studies have mainly examined how the brain (i) represents word semantics; (ii) integrates context information and represents sentence-level meanings; and (iii) represents the topic information and the semantic structure of discourses. Besides, a few studies have applied NLP to untangle sentences’ syntactic and semantic information and looked for their respective neural representations. A consistent finding across those studies is that, the representation of semantic information of words, sentences and discourses, as well as the syntactic information, seems to recruit a widely distributed network covering the frontal, temporal, parietal and occipital cortices. This observation is in contrast to the results from conventional imaging studies and lesions studies which typically report localized neural correlates for language processing. One possible explanation for this discrepancy is that NLP language models trained on large-scale text corpus may have captured multiple aspects of semantic information, while the conventional experimental approach may selectively activate a (or several) specific aspects of semantics and therefore only a small part of the brain can be detected. Though NLP techniques provide a powerful tool to quantify semantic information, they still face some limitations when being applied to investigate semantic representations in the brain. Firstly, embeddings from NLP models (especially those from a deep neural network) are uninterpretable. Secondly, models differ from each other in training material, network architecture, amount of parameters, training tasks and so on, which may lead to potential discrepancies among research results. Finally, model training procedures differ from how humans learn language and semantics, and the inner computational and processing mechanism may also be fundamentally different between NLP models and the human brain. Therefore, researchers need to select a proper model based on research questions, test the validity of models with experimental designs, and interpret results carefully. In the future, it is promising to (i) adopt more informational semantic representation methods such as knowledge-graph and multimodal models; (ii) apply NLP models to assess the language ability of patients; (iii) improve the interpretability and performance of models taking advantages of cognitive neuroscience findings about how human process language.

Hits 192 Downloads 95 Comment
2. ChinaXiv:202303.08241
Download

语义在人脑中的分布式表征：来自自然语言处理技术的证据

submitted time 2023-03-25 Cooperative journals: 《心理科学进展》

蒋嘉浩赵国钰马英博丁国盛刘兰芳

Abstract： How semantics are represented in human brains is a central issue in cognitive neuroscience. Previous studies typically detect semantic information by manipulating the properties of stimuli or task demands, or by asking a group of participants to judge the stimuli according to several given dimensions or features. Despite having brought valuable insights into the neurobiology of language, these approaches have some limitations. First, the experimental approach may only provide a coarse depiction of semantic properties, while human judgment is time-consuming and the results may vary substantially across subjects. Second, the conventional approach has difficulty quantifying the effect of context on word meaning. Third, the conventional approach is unable to extract the topic information of discourses, the semantic relations between the different parts of a discourse, or the semantic distance between discourses. The recently-developed natural language processing (NLP) techniques provide a useful tool that may overcome the above-mentioned limitations. Grounded on the distributional hypothesis of semantics, NLP models represent meanings of words, sentences, or documents in the form of computable vectors, which can be derived from word-word or word-document co-occurrence relationships, and neural networks trained on language tasks. Recent studies have applied NLP techniques to model the semantics of stimuli and mapped the semantic vectors onto brain activities through representational similarity analyses or linear regression. Those studies have mainly examined how the brain (i) represents word semantics; (ii) integrates context information and represents sentence-level meanings; and (iii) represents the topic information and the semantic structure of discourses. Besides, a few studies have applied NLP to untangle sentences’ syntactic and semantic information and looked for their respective neural representations. A consistent finding across those studies is that, the representation of semantic information of words, sentences and discourses, as well as the syntactic information, seems to recruit a widely distributed network covering the frontal, temporal, parietal and occipital cortices. This observation is in contrast to the results from conventional imaging studies and lesions studies which typically report localized neural correlates for language processing. One possible explanation for this discrepancy is that NLP language models trained on large-scale text corpus may have captured multiple aspects of semantic information, while the conventional experimental approach may selectively activate a (or several) specific aspects of semantics and therefore only a small part of the brain can be detected. Though NLP techniques provide a powerful tool to quantify semantic information, they still face some limitations when being applied to investigate semantic representations in the brain. Firstly, embeddings from NLP models (especially those from a deep neural network) are uninterpretable. Secondly, models differ from each other in training material, network architecture, amount of parameters, training tasks and so on, which may lead to potential discrepancies among research results. Finally, model training procedures differ from how humans learn language and semantics, and the inner computational and processing mechanism may also be fundamentally different between NLP models and the human brain. Therefore, researchers need to select a proper model based on research questions, test the validity of models with experimental designs, and interpret results carefully. In the future, it is promising to (i) adopt more informational semantic representation methods such as knowledge-graph and multimodal models; (ii) apply NLP models to assess the language ability of patients; (iii) improve the interpretability and performance of models taking advantages of cognitive neuroscience findings about how human process language.

Hits 101 Downloads 62 Comment
3. ChinaXiv:202301.00155
Download

Distributed representation of semantics in the human brain: Evidence from studies using natural language processing techniques

Subjects: Psychology >> Cognitive Psychology submitted time 2023-01-18

Jiang, Jiahao Zhao, Guoyu Ma, Yingbo Ding, Guosheng Liu, Lanfang

Abstract：
How semantics are represented in human brain is a central issue in cognitive neuroscience. Previous studies typically address this issue by artificially manipulating the properties of stimuli or task demands. Having brought valuable insights into the neurobiology of language, this psychological experimental approach may still fail to characterize semantic information with high resolution, and have difficulty quantifying context information and high-level concepts. The recently-developed natural language processing (NLP) techniques provide tools to represent the discrete semantics in the form of vectors, enabling automatic extraction of word semantics and even the information of context and syntax. Recent studies have applied NLP techniques to model the semantic of stimuli, and mapped the semantic vectors onto brain activities through representational similarity analyses or linear regression. A consistent finding is that the semantic information is represented by a vastly distributed network across the frontal, temporal and occipital cortices. Future studies may adopt multi-modal neural networks and knowledge graphs to extract richer information of semantics, apply NLP models to automatically assess the language ability of special groups, and improve the interpretability of deep neural network models with neurocognitive findings.

Peer Review Status:Awaiting Review

Hits 1379 Downloads 423 Comment

语义在人脑中的分布式表征：来自自然语言处理技术的证据

语义在人脑中的分布式表征：来自自然语言处理技术的证据

Distributed representation of semantics in the human brain: Evidence from studies using natural language processing techniques