Skip to content

Commit d83af46

Browse files
committed
code to read data
1 parent b8022c1 commit d83af46

File tree

3 files changed

+181035
-3
lines changed

3 files changed

+181035
-3
lines changed
Binary file not shown.

code/Readme.md

+39-3
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,45 @@ To read the dataset only use the loader_so.py file from `DataReader` folder as b
3535

3636
```
3737
import loader_so
38-
>>> path_to_file = "../../resources/annotated_ner_data/StackOverflow/train.txt"
39-
>>> all_sentneces = loader_so.loader_so_text(path_to_file)
38+
path_to_file = "../../resources/annotated_ner_data/StackOverflow/train.txt"
39+
all_sentneces = loader_so.loader_so_text(path_to_file)
4040
4141
```
4242

43-
By default the
43+
By default the `loader_so_text` function merges the following 6 entities to 3 as below:
44+
45+
```
46+
"Library_Function" -> "Function"
47+
"Function_Name" -> "Function"
48+
49+
"Class_Name" -> "Class"
50+
"Library_Class" -> "Class"
51+
52+
"Library_Variable" -> "Variable"
53+
"Variable_Name" -> "Variable"
54+
55+
"Website" -> "Website"
56+
"Organization" -> "Website"
57+
58+
```
59+
60+
To skip this merging, set `merge_tag= False` as below:
61+
62+
```
63+
import loader_so
64+
path_to_file = "../../resources/annotated_ner_data/StackOverflow/train.txt"
65+
all_sentneces = loader_so.loader_so_text(path_to_file,merge_tag=False)
66+
67+
```
68+
69+
70+
By default the `loader_so_text` function will convert the 5 low frequency enttiy as "O". To skip this conversion, set `replace_low_freq_tags= False` as below:
71+
72+
73+
74+
```
75+
import loader_so
76+
path_to_file = "../../resources/annotated_ner_data/StackOverflow/train.txt"
77+
all_sentneces = loader_so.loader_so_text(path_to_file, replace_low_freq_tags= False)
78+
79+
```

0 commit comments

Comments
 (0)